Class ColumnarResults

class ColumnarResults

Public Types

using ReadFunction = std::function<int64_t(const ResultSet&, const size_t, const size_t, const size_t)>
using WriteFunction = std::function<void(const ResultSet&, const size_t, const size_t, const size_t, const size_t, const ReadFunction&)>

Public Functions

ColumnarResults(const std::shared_ptr<RowSetMemoryOwner> row_set_mem_owner, const ResultSet &rows, const size_t num_columns, const std::vector<SQLTypeInfo> &target_types, const size_t executor_id, const size_t thread_idx, const bool is_parallel_execution_enforced = false)
ColumnarResults(const std::shared_ptr<RowSetMemoryOwner> row_set_mem_owner, const int8_t *one_col_buffer, const size_t num_rows, const SQLTypeInfo &target_type, const size_t executor_id, const size_t thread_idx)
const std::vector<int8_t *> &getColumnBuffers() const
const size_t size() const
const SQLTypeInfo &getColumnType(const int col_id) const
bool isParallelConversion() const
bool isDirectColumnarConversionPossible() const

Public Static Functions

std::unique_ptr<ColumnarResults> mergeResults(const std::shared_ptr<RowSetMemoryOwner> row_set_mem_owner, const std::vector<std::unique_ptr<ColumnarResults>> &sub_results)

Protected Attributes

std::vector<int8_t *> column_buffers_
size_t num_rows_

Private Functions

ColumnarResults(const size_t num_rows, const std::vector<SQLTypeInfo> &target_types)
void writeBackCell(const TargetValue &col_val, const size_t row_idx, const size_t column_idx)
void materializeAllColumnsDirectly(const ResultSet &rows, const size_t num_columns)

This function materializes all columns from the main storage and all appended storages and form a single continguous column for each output column. Depending on whether the column is lazily fetched or not, it will treat them differently.

NOTE: this function should only be used when the result set is columnar and completely compacted (e.g., in columnar projections).

void materializeAllColumnsThroughIteration(const ResultSet &rows, const size_t num_columns)

This function iterates through the result set (using the getRowAtNoTranslation and getNextRow family of functions) and writes back the results into output column buffers.

void materializeAllColumnsGroupBy(const ResultSet &rows, const size_t num_columns)

This function is to directly columnarize a result set for group by queries. Its main difference with the traditional alternative is that it directly reads non-empty entries from the result set, and then writes them into output column buffers, rather than using the result set’s iterators.

void materializeAllColumnsProjection(const ResultSet &rows, const size_t num_columns)

This function handles materialization for two types of columns in columnar projections:

  1. for all non-lazy columns, it directly copies the results from the result set’s storage into the output column buffers

  2. for all lazy fetched columns, it uses result set’s iterators to decode the proper values before storing them into the output column buffers

void materializeAllColumnsTableFunction(const ResultSet &rows, const size_t num_columns)
void copyAllNonLazyColumns(const std::vector<ColumnLazyFetchInfo> &lazy_fetch_info, const ResultSet &rows, const size_t num_columns)
void materializeAllLazyColumns(const std::vector<ColumnLazyFetchInfo> &lazy_fetch_info, const ResultSet &rows, const size_t num_columns)

For all lazy fetched columns, we should iterate through the column’s content and properly materialize it.

This function is parallelized through dividing total rows among all existing threads. Since there’s no invalid element in the result set (e.g., columnar projections), the output buffer will have as many rows as there are in the result set, removing the need for atomicly incrementing the output buffer position.

void locateAndCountEntries(const ResultSet &rows, ColumnBitmap &bitmap, std::vector<size_t> &non_empty_per_thread, const size_t entry_count, const size_t num_threads, const size_t size_per_thread) const

This function goes through all the keys in the result set, and count the total number of non-empty keys. It also store the location of non-empty keys in a bitmap data structure for later faster access.

void compactAndCopyEntries(const ResultSet &rows, const ColumnBitmap &bitmap, const std::vector<size_t> &non_empty_per_thread, const size_t num_columns, const size_t entry_count, const size_t num_threads, const size_t size_per_thread)

This function goes through all non-empty elements marked in the bitmap data structure, and store them back into output column buffers. The output column buffers are compacted without any holes in it.

TODO(Saman): if necessary, we can look into the distribution of non-empty entries and choose a different load-balanced strategy (assigning equal number of non-empties to each thread) as opposed to equal partitioning of the bitmap

void compactAndCopyEntriesWithTargetSkipping(const ResultSet &rows, const ColumnBitmap &bitmap, const std::vector<size_t> &non_empty_per_thread, const std::vector<size_t> &global_offsets, const std::vector<bool> &targets_to_skip, const std::vector<size_t> &slot_idx_per_target_idx, const size_t num_columns, const size_t entry_count, const size_t num_threads, const size_t size_per_thread)

This functions takes a bitmap of non-empty entries within the result set’s storage and compact and copy those contents back into the output column_buffers_. In this variation, multi-slot targets (e.g., AVG) are treated with the existing result set’s iterations, but everything else is directly columnarized.

void compactAndCopyEntriesWithoutTargetSkipping(const ResultSet &rows, const ColumnBitmap &bitmap, const std::vector<size_t> &non_empty_per_thread, const std::vector<size_t> &global_offsets, const std::vector<size_t> &slot_idx_per_target_idx, const size_t num_columns, const size_t entry_count, const size_t num_threads, const size_t size_per_thread)

This functions takes a bitmap of non-empty entries within the result set’s storage and compact and copy those contents back into the output column_buffers_. In this variation, all targets are assumed to be single-slot and thus can be directly columnarized.

template<typename DATA_TYPE>
void writeBackCellDirect(const ResultSet &rows, const size_t input_buffer_entry_idx, const size_t output_buffer_entry_idx, const size_t target_idx, const size_t slot_idx, const ReadFunction &read_function)

A set of write functions to be used to directly write into final column_buffers_. The read_from_function is used to read from the input result set’s storage NOTE: currently only used for direct columnarizations

std::vector<ColumnarResults::WriteFunction> initWriteFunctions(const ResultSet &rows, const std::vector<bool> &targets_to_skip = {})

Initialize a set of write functions per target (i.e., column). Target types’ logical size are used to categorize the correct write function per target. These functions are then used for every row in the result set.

template<QueryDescriptionType QUERY_TYPE, bool COLUMNAR_OUTPUT>
std::vector<ColumnarResults::ReadFunction> initReadFunctions(const ResultSet &rows, const std::vector<size_t> &slot_idx_per_target_idx, const std::vector<bool> &targets_to_skip = {})

Initializes a set of read funtions to properly access the contents of the result set’s storage buffer. Each particular read function is chosen based on the data type and data size used to store that target in the result set’s storage buffer. These functions are then used for each row in the result set.

std::tuple<std::vector<ColumnarResults::WriteFunction>, std::vector<ColumnarResults::ReadFunction>> initAllConversionFunctions(const ResultSet &rows, const std::vector<size_t> &slot_idx_per_target_idx, const std::vector<bool> &targets_to_skip = {})

This function goes through all target types in the output, and chooses appropriate write and read functions per target. The goal is then to simply use these functions for each row and per target. Read functions are used to read each cell’s data content (particular target in a row), and write functions are used to properly write back the cell’s content into the output column buffers.

template<>
void writeBackCellDirect(const ResultSet &rows, const size_t input_buffer_entry_idx, const size_t output_buffer_entry_idx, const size_t target_idx, const size_t slot_idx, const ReadFunction &read_from_function)
template<>
void writeBackCellDirect(const ResultSet &rows, const size_t input_buffer_entry_idx, const size_t output_buffer_entry_idx, const size_t target_idx, const size_t slot_idx, const ReadFunction &read_from_function)

Private Members

const std::vector<SQLTypeInfo> target_types_
bool parallel_conversion_
bool direct_columnar_conversion_
size_t thread_idx_
std::shared_ptr<Executor> executor_