Class File_Namespace::CachingFileMgr

class CachingFileMgr : public File_Namespace::FileMgr

A FileMgr capable of limiting it’s size and storing data from multiple tables in a shared directory. For any table that supports DiskCaching, the CachingFileMgr must contain either metadata for all table chunks, or for none (the cache is either has no knowledge of that table, or has complete knowledge of that table). Any data chunk within a table may or may not be contained within the cache.

Public Functions

CachingFileMgr(const DiskCacheConfig &config)
~CachingFileMgr()
MgrType getMgrType()
std::string getStringMgrType()
size_t getDefaultPageSize()
size_t getMaxSize()
size_t getMaxDataFiles() const
size_t getMaxMetaFiles() const
size_t getMaxWrapperSize() const
size_t getDataFileSize() const
size_t getMetadataFileSize() const
size_t getNumDataFiles() const
size_t getNumMetaFiles() const
size_t getAvailableSpace()
size_t getAvailableWrapperSpace()
size_t getAllocated()
size_t getMaxDataFilesSize() const
void removeChunkKeepMetadata(const ChunkKey &key)

Free pages for chunk and remove it from the chunk eviction algorithm.

void clearForTable(int32_t db_id, int32_t tb_id)

Removes all data related to the given table (pages and subdirectories).

bool hasFileMgrKey() const

Query to determine if the contained pages will have their database and table ids overriden by the filemgr key (FileMgr does this).

void closeRemovePhysical()

Closes files and removes the caching directory.

size_t getChunkSpaceReservedByTable(int32_t db_id, int32_t tb_id) const

Set of functions to determine how much space is reserved in a table by type.

size_t getMetadataSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
size_t getTableFileMgrSpaceReserved(int32_t db_id, int32_t tb_id) const
size_t getSpaceReservedByTable(int32_t db_id, int32_t tb_id) const
std::string describeSelf() const

describes this FileMgr for logging purposes.

void checkpoint(const int32_t db_id, const int32_t tb_id)

writes buffers for the given table, synchronizes files to disk, updates file epoch, and commits free pages.

int32_t epoch(int32_t db_id, int32_t tb_id) const

obtain the epoch version for the given table.

FileBuffer *putBuffer(const ChunkKey &key, AbstractBuffer *srcBuffer, const size_t numBytes = 0)

deletes any existing buffer for the given key then copies in a new one.

putBuffer() needs to behave differently than it does in FileMgr. Specifically, it needs to delete the buffer beforehand and then append, rather than overwrite the existing buffer. This way we only store a single version of the buffer rather than accumulating versions that need to be rolled off.

CachingFileBuffer *allocateBuffer(const size_t page_size, const ChunkKey &key, const size_t num_bytes = 0)

allocates a new CachingFileBuffer and tracks it’s use in the eviction algorithms.

CachingFileBuffer *allocateBuffer(const ChunkKey &key, const std::vector<HeaderInfo>::const_iterator &headerStartIt, const std::vector<HeaderInfo>::const_iterator &headerEndIt)
bool updatePageIfDeleted(FileInfo *file_info, ChunkKey &chunk_key, int32_t contingent, int32_t page_epoch, int32_t page_num)

checks whether a page should be deleted.

bool failOnReadError() const

True if a read error should cause a fatal error.

void deleteBufferIfExists(const ChunkKey &key)

deletes a buffer if it exists in the mgr. Otherwise do nothing.

size_t getNumChunksWithMetadata() const

Returns the number of buffers with metadata in the CFM. Any buffer with an encoder counts.

size_t getNumDataChunks() const

Returns the number of buffers with chunk data in the CFM.

std::vector<ChunkKey> getChunkKeysForPrefix(const ChunkKey &prefix) const

Returns the keys for chunks with chunk data that match the given prefix.

std::unique_ptr<CachingFileMgr> reconstruct() const

Initializes a new CFM using the initialization values in the current CFM.

void deleteWrapperFile(int32_t db, int32_t tb)

Deletes the wrapper file from a table subdir.

void writeWrapperFile(const std::string &doc, int32_t db, int32_t tb)

Writes a wrapper file to a table subdir.

std::string getTableFileMgrPath(int32_t db, int32_t tb) const
size_t getFilesSize() const

Get the total size of page files (data and metadata files). This includes allocated, but unused space.

size_t getTableFileMgrsSize() const

Returns the total size of all subdirectory files. Each table represented in the CFM has a subdirectory for serialized data wrappers and epoch files.

std::optional<FileBuffer *> getBufferIfExists(const ChunkKey &key)

an optional version of get buffer if we are not sure a chunk exists.

void free_page(std::pair<FileInfo *, int32_t> &&page)

Unlike the FileMgr, the CFM frees pages immediately instead of holding them until the next checkpoint.

void getChunkMetadataVecForKeyPrefix(ChunkMetadataVector &chunkMetadataVec, const ChunkKey &keyPrefix)
std::string dumpKeysWithMetadata() const
std::string dumpKeysWithChunkData() const
std::string dumpTableQueue() const
std::string dumpEvictionQueue() const
std::string dump() const
void setMaxNumDataFiles(size_t max)
void setMaxNumMetadataFiles(size_t max)
void setMaxWrapperSpace(size_t max)
std::set<ChunkKey> getKeysWithMetadata() const
void setDataSizeLimit(size_t max)

Public Static Functions

static size_t getMinimumSize()

Public Static Attributes

constexpr char WRAPPER_FILE_NAME[] = "wrapper_metadata.json"
constexpr float METADATA_SPACE_PERCENTAGE = {0.1}
constexpr float METADATA_FILE_SPACE_PERCENTAGE = {0.01}

Private Functions

void incrementEpoch(int32_t db_id, int32_t tb_id)

Increments epoch for the given table.

void init(const size_t num_reader_threads)

Initializes a CFM, parsing any existing files and initializing data structures appropriately (currently not thread-safe).

void writeAndSyncEpochToDisk(int32_t db_id, int32_t tb_id)

Flushes epoch value to disk for a table.

void readTableFileMgrs()

Checks for any sub-directories containing table-specific data and creates epochs from found files.

FileBuffer *createBufferFromHeaders(const ChunkKey &key, const std::vector<HeaderInfo>::const_iterator &startIt, const std::vector<HeaderInfo>::const_iterator &endIt)

Creates a buffer and initializes it with info read from files on disk.

FileBuffer *createBufferUnlocked(const ChunkKey &key, size_t pageSize = 0, const size_t numBytes = 0)

Creates a buffer.

void createTableFileMgrIfNoneExists(const int32_t db_id, const int32_t tb_id)

Create and initialize a subdirectory for a table if none exists.

void incrementAllEpochs()

Increment epochs for each table in the CFM.

void removeTableFileMgr(int32_t db_id, int32_t tb_id)

Removes the subdirectory content for a table.

void removeTableBuffers(int32_t db_id, int32_t tb_id)

Erases and cleans up all buffers for a table.

void writeDirtyBuffers(int32_t db_id, int32_t tb_id)

helper function to flush all dirty buffers to disk.

Page requestFreePage(size_t pagesize, const bool isMetadata)

requests a free page similar to FileMgr, but this override will also evict existing pages to make space if there are none available.

void touchKey(const ChunkKey &key) const

Used to track which tables/chunks were least recently used.

void removeKey(const ChunkKey &key) const
std::vector<ChunkKey> getKeysForTable(int32_t db_id, int32_t tb_id) const

returns set of keys contained in chunkIndex_ that match the given table prefix.

FileInfo *evictMetadataPages()

evicts all metadata pages for the least recently used table. Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

FileInfo *evictPages()

evicts all data pages for the least recently used Chunk (metadata pages persist). Returns the first FileInfo that a page was evicted from (guaranteed to now have at least one free page in it).

void deleteCacheIfTooLarge()

When the cache is read from disk, we don’t know which chunks were least recently used. Rather than try to evict random pages to get down to size we just reset the cache to make sure we have space.

void setMaxSizes()

Sets the maximum number of files/space for each type of storage based on the maximum size.

FileBuffer *getBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const size_t numBytes = 0)
ChunkKeyToChunkMap::iterator deleteBufferUnlocked(const ChunkKeyToChunkMap::iterator chunk_it, const bool purge = true)

Private Members

mapd_shared_mutex table_dirs_mutex_
std::map<TablePair, std::unique_ptr<TableFileMgr>> table_dirs_
size_t max_num_data_files_
size_t max_num_meta_files_
size_t max_wrapper_space_
size_t max_size_
std::optional<size_t> limit_data_size_ = {}
LRUEvictionAlgorithm chunk_evict_alg_
LRUEvictionAlgorithm table_evict_alg_