Class RangeJoinHashTable

class RangeJoinHashTable : public OverlapsJoinHashTable

Public Functions

RangeJoinHashTable(const std::shared_ptr<Analyzer::BinOper> condition, const JoinType join_type, const Analyzer::RangeOper *range_expr, std::shared_ptr<Analyzer::ColumnVar> inner_col_expr, const std::vector<InputTableInfo> &query_infos, const Data_Namespace::MemoryLevel memory_level, ColumnCacheMap &column_cache, Executor *executor, const std::vector<InnerOuter> &inner_outer_pairs, const int device_count, HashtableAccessPathInfo hashtable_access_path_info, const TableIdToNodeMap &table_id_to_node_map)
~RangeJoinHashTable()
llvm::Value *codegenKey(const CompilationOptions &co, llvm::Value *offset)
HashJoinMatchingSet codegenMatchingSetWithOffset(const CompilationOptions &co, const size_t index, llvm::Value *range_offset)

Public Static Functions

std::shared_ptr<RangeJoinHashTable> getInstance(const std::shared_ptr<Analyzer::BinOper> condition, const Analyzer::RangeOper *range_expr, const std::vector<InputTableInfo> &query_infos, const Data_Namespace::MemoryLevel memory_level, const JoinType join_type, const int device_count, ColumnCacheMap &column_cache, Executor *executor, const HashTableBuildDagMap &hashtable_build_dag_map, const RegisteredQueryHint &query_hint, const TableIdToNodeMap &table_id_to_node_map)

NOTE(jclay): Handling Range Joins With Mixed Compression:

First, let’s take a concrete example of a query that is rewritten as a range join. Notice in the first code block, that the condition operator is an Overlaps operator. The LHS is a column, and the RHS is the range operator. In order to have the hash table build and probe work properly, we need to ensure that the approriate runtime functions are selected. The following breakdown is provided to help document how the appropriate runtime funditon is selected.

  • The LHS of the RangeOper is used to build the hash table

  • The LHS of the OverlapsOper + the RHS of the RangeOper is used as probe

SELECT count(*) FROM t1, t2 where ST_Distance(t1.p1_comp32, t2.p1) <= 6.3;

BinOper condition

((OVERLAPS) (ColumnVar table: (t1) column: (p1_comp32) GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32)) (RangeOper) (ColumnVar table: (t2) column: (p1) GEOMETRY(POINT, 4326) ENCODING NONE), (Const 6.330000))

RangeOper condition

[(ColumnVar table: 5 (t2) column: 1 rte: 1 GEOMETRY(POINT, 4326) ENCODING NONE), (Const 6.330000)]

Same example as above, annotated:

SELECT count(*) FROM t1, t2 where ST_Distance( t1.p1_comp32, << Overlaps Condition LHS t2.p1 << RangeOper LHS ) <= 6.3; << RangeOper RHS

In this case, we select the uncompressed runtime functions when building the hash table over t2.p1. When performing the probe, we must select the compressed runtime functions.

Protected Functions

void reifyWithLayout(const HashType layout)
void reifyForDevice(const ColumnsForDevice &columns_for_device, const HashType layout, const size_t entry_count, const size_t emitted_keys_count, const int device_id, const logger::ThreadId parent_thread_id)
std::shared_ptr<BaselineHashTable> initHashTableOnCpu(const std::vector<JoinColumn> &join_columns, const std::vector<JoinColumnTypeInfo> &join_column_types, const std::vector<JoinBucketInfo> &join_bucket_info, const HashType layout, const size_t entry_count, const size_t emitted_keys_count)
HashType getHashType() const
std::pair<size_t, size_t> approximateTupleCount(const std::vector<double> &inverse_bucket_sizes_for_dimension, std::vector<ColumnsForDevice> &columns_per_device, const size_t chosen_max_hashtable_size, const double chosen_bucket_threshold)
std::pair<size_t, size_t> computeRangeHashTableCounts(const size_t shard_count, std::vector<ColumnsForDevice> &columns_per_device)

Private Functions

bool isInnerColCompressed() const
bool isProbeCompressed() const

Private Members

const Analyzer::RangeOper *range_expr_
std::shared_ptr<Analyzer::ColumnVar> inner_col_expr_
const double bucket_threshold_ = {std::numeric_limits<double>::max()}
const size_t max_hashtable_size_ = {std::numeric_limits<size_t>::max()}