Class foreign_storage::AbstractTextFileDataWrapper

class AbstractTextFileDataWrapper : public foreign_storage::AbstractFileStorageDataWrapper

Subclassed by foreign_storage::CsvDataWrapper, foreign_storage::RegexParserDataWrapper

Public Functions

AbstractTextFileDataWrapper()
AbstractTextFileDataWrapper(const int db_id, const ForeignTable *foreign_table)
AbstractTextFileDataWrapper(const int db_id, const ForeignTable *foreign_table, const UserMapping *user_mapping, const bool disable_cache)
void populateChunkMetadata(ChunkMetadataVector &chunk_metadata_vector)

Populates provided chunk metadata vector with metadata for table specified in given chunk key. Metadata scan for text file(s) configured for foreign table occurs in parallel whenever appropriate. Parallel processing involves the main thread creating ParseBufferRequest objects, which contain buffers with text content read from file and adding these request objects to a queue that is consumed by a fixed number of threads. After request processing, request objects are put back into a pool for reuse for subsequent requests in order to avoid unnecessary allocation of new buffers.

Parameters
  • chunk_metadata_vector: - vector to be populated with chunk metadata

void populateChunkBuffers(const ChunkToBufferMap &required_buffers, const ChunkToBufferMap &optional_buffers, AbstractBuffer *delete_buffer)

Populates given chunk buffers identified by chunk keys. All provided chunk buffers are expected to be for the same fragment.

Parameters
  • required_buffers: - chunk buffers that must always be populated

  • optional_buffers: - chunk buffers that can be optionally populated, if the data wrapper has to scan through chunk data anyways (typically for row wise data formats)

  • delete_buffer: - chunk buffer for fragment’s delete column, if non-null data wrapper is expected to mark deleted rows in buffer and continue processing

std::string getSerializedDataWrapper() const

Serialize internal state of wrapper into file at given path if implemented

void restoreDataWrapperInternals(const std::string &file_path, const ChunkMetadataVector &chunk_metadata)

Restore internal state of datawrapper

Parameters
  • file_path: - location of file created by serializeMetadata

  • chunk_metadata_vector: - vector of chunk metadata recovered from disk

bool isRestored() const
ParallelismLevel getCachedParallelismLevel() const

Gets the desired level of parallelism for the data wrapper when a cache is in use. This affects the optional buffers that the data wrapper is made aware of during data requests.

ParallelismLevel getNonCachedParallelismLevel() const

Gets the desired level of parallelism for the data wrapper when no cache is in use. This affects the optional buffers that the data wrapper is made aware of during data requests.

void createRenderGroupAnalyzers()

Create RenderGroupAnalyzers for poly columns.

Protected Functions

virtual const TextFileBufferParser &getFileBufferParser() const = 0

Private Functions

AbstractTextFileDataWrapper(const ForeignTable *foreign_table)
void populateChunks(std::map<int, Chunk_NS::Chunk> &column_id_to_chunk_map, int fragment_id, AbstractBuffer *delete_buffer)

Populates provided chunks with appropriate data by parsing all file regions containing chunk data.

Parameters
  • column_id_to_chunk_map: - map of column id to chunks to be populated

  • fragment_id: - fragment id of given chunks

  • delete_buffer: - optional buffer to store deleted row indices

void populateChunkMapForColumns(const std::set<const ColumnDescriptor *> &columns, const int fragment_id, const ChunkToBufferMap &buffers, std::map<int, Chunk_NS::Chunk> &column_id_to_chunk_map)
void updateMetadata(std::map<int, Chunk_NS::Chunk> &column_id_to_chunk_map, int fragment_id)

Private Members

std::map<ChunkKey, std::shared_ptr<ChunkMetadata>> chunk_metadata_map_
std::map<int, FileRegions> fragment_id_to_file_regions_map_
std::unique_ptr<FileReader> file_reader_
const int db_id_
const ForeignTable *foreign_table_
std::map<ChunkKey, std::unique_ptr<ForeignStorageBuffer>> chunk_encoder_buffers_
size_t num_rows_
size_t append_start_offset_
bool is_restored_
const UserMapping *user_mapping_
const bool disable_cache_
RenderGroupAnalyzerMap render_group_analyzer_map_