Class foreign_storage::ParquetImporter¶
-
class
ParquetImporter
: public foreign_storage::AbstractFileStorageDataWrapper¶ Public Functions
-
ParquetImporter
()¶
-
ParquetImporter
(const int db_id, const ForeignTable *foreign_table, const UserMapping *user_mapping)¶
-
void
populateChunkMetadata
(ChunkMetadataVector &chunk_metadata_vector)¶ Populates given chunk metadata vector with metadata for all chunks in related foreign table.
- Parameters
chunk_metadata_vector
: - vector that will be populated with chunk metadata
-
void
populateChunkBuffers
(const ChunkToBufferMap &required_buffers, const ChunkToBufferMap &optional_buffers, AbstractBuffer *delete_buffer)¶ Populates given chunk buffers identified by chunk keys. All provided chunk buffers are expected to be for the same fragment.
- Parameters
required_buffers
: - chunk buffers that must always be populatedoptional_buffers
: - chunk buffers that can be optionally populated, if the data wrapper has to scan through chunk data anyways (typically for row wise data formats)delete_buffer
: - chunk buffer for fragment’s delete column, if non-null data wrapper is expected to mark deleted rows in buffer and continue processing
-
std::string
getSerializedDataWrapper
() const¶ Serialize internal state of wrapper into file at given path if implemented
-
void
restoreDataWrapperInternals
(const std::string &file_path, const ChunkMetadataVector &chunk_metadata)¶ Restore internal state of datawrapper
- Parameters
file_path
: - location of file created by serializeMetadatachunk_metadata_vector
: - vector of chunk metadata recovered from disk
-
bool
isRestored
() const¶
-
ParallelismLevel
getCachedParallelismLevel
() const¶ Gets the desired level of parallelism for the data wrapper when a cache is in use. This affects the optional buffers that the data wrapper is made aware of during data requests.
-
ParallelismLevel
getNonCachedParallelismLevel
() const¶ Gets the desired level of parallelism for the data wrapper when no cache is in use. This affects the optional buffers that the data wrapper is made aware of during data requests.
-
std::unique_ptr<import_export::ImportBatchResult>
getNextImportBatch
()¶ Produce the next
ImportBatchResult
for import. This is the only functionality ofParquetImporter
that is required to be implemented.- Return
a
ImportBatchResult
for import.
-
std::vector<std::pair<const ColumnDescriptor *, StringDictionary *>>
getStringDictionaries
() const¶ Return string dictionaries that are used per column.
- Return
a vector of StringDictionary and ColumnDescriptor pairs
-
int
getMaxNumUsefulThreads
() const¶ Get the maximum number of threads that can do useful computation.
-
void
setNumThreads
(const int num_threads)¶ Set the number of threads to use internally when reading batches.
Private Functions
-
std::set<std::string>
getAllFilePaths
()¶
Private Members
-
const int
db_id_
¶
-
const ForeignTable *
foreign_table_
¶
-
int
num_threads_
¶
-
std::unique_ptr<AbstractRowGroupIntervalTracker>
row_group_interval_tracker_
¶
-
std::unique_ptr<ForeignTableSchema>
schema_
¶
-
std::shared_ptr<arrow::fs::FileSystem>
file_system_
¶
-
std::unique_ptr<FileReaderMap>
file_reader_cache_
¶
-
std::vector<std::pair<const ColumnDescriptor *, StringDictionary *>>
string_dictionaries_per_column_
¶
-
std::shared_mutex
row_group_interval_tracker_mutex_
¶
-
std::shared_mutex
string_dictionaries_per_column_mutex_
¶
-