Class foreign_storage::ParquetStringEncoder

template<typename V>
class ParquetStringEncoder : public foreign_storage::TypedParquetInPlaceEncoder<V, V>

Public Functions

ParquetStringEncoder(Data_Namespace::AbstractBuffer *buffer, StringDictionary *string_dictionary, ChunkMetadata *chunk_metadata)
void validateAndAppendData(const int16_t *def_levels, const int16_t *rep_levels, const int64_t values_read, const int64_t levels_read, int8_t *values, const SQLTypeInfo &column_type, InvalidRowGroupIndices &invalid_indices)
void appendDataTrackErrors(const int16_t *def_levels, const int16_t *rep_levels, const int64_t values_read, const int64_t levels_read, int8_t *values)
void appendData(const int16_t *def_levels, const int16_t *rep_levels, const int64_t values_read, const int64_t levels_read, int8_t *values)

Appends Parquet data to the buffer using an in-place algorithm. Any necessary transformation or validation of the data and decoding of nulls is part of appending the data. Each class inheriting from this abstract class must implement the functionality to copy, nullify and encode the data.

Note that the Parquet format encodes nulls using Dremel encoding.

Parameters
  • def_levels: - an array containing the Dremel encoding definition levels

  • rep_levels: - an array containing the Dremel encoding repetition levels

  • values_read: - the number of non-null values read

  • levels_read: - the total number of values (non-null & null) that are read

  • values: - values that are read

void encodeAndCopyContiguous(const int8_t *parquet_data_bytes, int8_t *omnisci_data_bytes, const size_t num_elements)
void encodeAndCopy(const int8_t *parquet_data_bytes, int8_t *omnisci_data_bytes)
std::shared_ptr<ChunkMetadata> getRowGroupMetadata(const parquet::RowGroupMetaData *group_metadata, const int parquet_column_index, const SQLTypeInfo &column_type)

Protected Functions

bool encodingIsIdentityForSameTypes() const

Private Functions

void updateMetadataStats(int64_t values_read, int8_t *values)

Private Members

StringDictionary *string_dictionary_
ChunkMetadata *chunk_metadata_
std::vector<int8_t> encode_buffer_
V min_
V max_
int64_t current_batch_offset_
InvalidRowGroupIndices *invalid_indices_