Class GpuSharedMemCodeBuilder¶
-
class
GpuSharedMemCodeBuilder
¶ This is a builder class for extra functions that are required to support GPU shared memory usage for GroupByPerfectHash query types.
This class does not own its own LLVM module and uses a pointer to the global module provided to it as an argument during construction
Public Functions
-
GpuSharedMemCodeBuilder
(llvm::Module *module, llvm::LLVMContext &context, const QueryMemoryDescriptor &qmd, const std::vector<TargetInfo> &targets, const std::vector<int64_t> &init_agg_values, const size_t executor_id)¶
-
void
codegen
()¶ generates code for both the reduction and initialization steps required for shared memory usage
-
void
injectFunctionsInto
(llvm::Function *query_func)¶ Once the reduction and init functions are generated, this function takes the main query function and replaces the previous placeholders, which were inserted in the query template, with these new functions.
-
llvm::Function *
getReductionFunction
() const¶
-
llvm::Function *
getInitFunction
() const¶
-
std::string
toString
() const¶
Protected Functions
-
void
codegenReduction
()¶ Generates code for the reduction functionality (from shared memory into global memory)
The reduction function is going to be used to reduce group by buffer stored in the shared memory, back into global memory buffer. The general procedure is very similar to the what we have ResultSetReductionJIT, with some major differences that will be discussed below:
The general procedure is as follows:
the function takes three arguments: 1) dest_buffer_ptr which points to global memory group by buffer (what existed before), 2) src_buffer_ptr which points to the shared memory group by buffer, exclusively accessed by each specific GPU thread-block, 3) total buffer size.
We assign each thread to a specific entry (all targets within that entry), so any thread with an index larger than max entries, will have an early return from this function
It is assumed here that there are at least as many threads in the GPU as there are entries in the group by buffer. In practice, given the buffer sizes that we deal with, this is a reasonable asumption, but can be easily relaxed in the future if needed to: threads can form a loop and process all entries until all are finished. It should be noted that we currently don’t use shared memory if there are more entries than number of threads.
We loop over all slots corresponding to a specific entry, and use ResultSetReductionJIT’s reduce_one_entry_idx to reduce one slot from the destination buffer into source buffer. The only difference is that we should replace all agg_* funcitons within this code with their agg_*_shared counterparts, which use atomics operations and are used on the GPU.
Once all threads are done, we return from the function.
-
void
codegenInitialization
()¶ Generates code for the shared memory buffer initialization
This function generates code to initialize the shared memory buffer, the way we initialize the group by output buffer on the host. Similar to the reduction function, it is assumed that there are at least as many threads as there are entries in the buffer. Each entry is assigned to a single thread, and then all slots corresponding to that entry are initialized with aggregate init values.
-
llvm::Function *
createReductionFunction
() const¶ Create the reduction function in the LLVM module, with predefined arguments and return type
-
llvm::Function *
createInitFunction
() const¶ Creates the initialization function in the LLVM module, with predefined arguments and return type
-
llvm::Function *
getFunction
(const std::string &func_name) const¶ Search for a particular funciton name in the module, and returns it if found
Protected Attributes
-
size_t
executor_id_
¶
-
llvm::Module *
module_
¶
-
llvm::LLVMContext &
context_
¶
-
llvm::Function *
reduction_func_
¶
-
llvm::Function *
init_func_
¶
-
const QueryMemoryDescriptor
query_mem_desc_
¶
-
const std::vector<TargetInfo>
targets_
¶
-
const std::vector<int64_t>
init_agg_values_
¶
-