7.4. Code Generation¶

OmniSciDB generates native code using the LLVM library. Code generation primarily makes calls into the LLVM IR Builder to build up LLVM IR according to the provided AST (see DAG Execution / Translation).

Code generation for a query is managed by the Executor object assigned to the query. Code generation state is stored within the QueryCompilationDescriptor. The compilation descriptor also initiates code generation via its compile method.

The outer-most function for a kernel is pre-defined in RuntimeFunctions.cpp. The generated code typically consists of two functions; the query_func and the row_func. The query_func loops over all input rows, while the row_func contains most of the logic for processing inputs, running expressions, and writing outputs.

7.4.1. Query Templates¶

The query_func is built and then bound to a query function template depending on query execution dispatch mode (see Execution Kernels for more on kernel execution modes). Query templates, specified in RuntimeFunctions.cpp, provide the top-level kernel function declaration and basic logic (i.e. iterate over input fragments for a multi-fragment kernel). The query_func is bound to the query_stub function, replacing its default implementation in the first step of code generation.

7.4.2. `CodeGenerator`¶

The CodeGenerator converts analyzer expressions into LLVM IR. The CodeGenerator constructor takes the code generation state and query plan state from the Executor and makes calls into the LLVM IR Builder, using the context and module attached to the code generation state.

The ScalarCodeGenerator is derived from the CodeGenerator for unit testing. With the ScalarCodeGenerator, the dependency on the executor is removed, and native code for either the CPU or GPU can be generated for most analyzer expressions.

7.4.3. Runtime Functions and Extension Functions¶

All query kernels are generated from LLVM IR using the CodeGenerator. However, for certain helper functions it is more convenient to implement required functionality in C. For example, the aggregate function for computing SUM of a not null column requires null sentinel awareness to ensure null values of the argument type are not accidentally added to the running value. The 32-bit integer implementation of this function is shown below:

extern "C" ALWAYS_INLINE int32_t agg_sum_int32_skip_val(int32_t* agg,
                                                        const int32_t val,
                                                        const int32_t skip_val) {
    const auto old = *agg;
    if (val != skip_val) {
        if (old != skip_val) {
        return agg_sum_int32(agg, val);
        } else {
        *agg = val;
        }
    }
    return old;
}

System functions like the example above are referred to as Runtime Functions. SQL functions are stored in a separate file and are referred to as Extension Functions. The extension functions are automatically registered with Calcite at startup, where runtime functions are helpers functions meant for use by the kernel.

The ALWAYS_INLINE decorator tells the LLVM compiler to inline the runtime functions, so the net effect is the same as generating the function using the LLVM IR Builder.

7.4.4. LLVM Optimization Passes¶

Optimization of generated code is primarily managed by the LLVM Pass Manager. The optimize_ir free function runs multiple LLVM passes over the IR (e.g. instruction combining, instruction simplification, etc). By using the pass manager, OmniSciDB can chose LLVM passes which will maximize query performance based on both query parameters and the target device for which code is being compiled.

OmniSciDB also includes a custom function for eliminating dead recursive function calls. Because OmniSciDB supplies runtime functions and extension functions in C++, functions can be pulled into the module which are not being called by the primary kernel function. These functions can be quickly pruned to prevent compiling all the runtime and extension functions with each query.

7.4.5. Native Codegen¶

Once the complete set of LLVM IR has been assembled for a query, generation of machine code can begin. OmniSciDB refers to machine code as native code (i.e. native machine code for a particular device). Below we describe CPU Native Code Generation and GPU Native Code Generation.

7.4.5.1. CPU Native Code Generation¶

CPU code generation uses the LLVM MCJIT to generate native code for the CPU. The code generator performs the following steps when generating native CPU code:

Optimizes input IR using the techniques described above.
Initializes the LLVM MCJIT Backend.
Takes ownership of the LLVM Module.
Creates a LLVM::ExecutionBuilder object wrapped in an ExecutionEngine object responsible for JIT runtime code generation.
Generates native code by calling finalizeObject() on the ExecutionEngine.

Native code generated for CPU can be called by getting the function pointer from the execution engine and calling the function.

7.4.5.2. GPU Native Code Generation¶

GPU code generation uses LLVM to generate nVidia PTX and then converts the PTX to machine code using the nVidia CUDA driver API. The following intermediate steps are performed during this process:

Updates LLVM Module target details to target nVidia GPU
Optimizes input IR using the techniques described above.
Generates PTX using the LLVM Pass Manager.
Converts the PTX to a cubin binary machine code file using the nVidia CUDA driver API.
Copies the cubin binary to the relevant GPUs (typically all available GPUs)
Stores a function pointer to the copied binary in GPU memory, to be passed to the nVidia CUDA driver API for kernel launch.

7.4.5.3. Code Cache¶

Both CPU and GPU generated code is cached in a code cache per query. The cache uses a LRU eviction mechanism to ensure large numbers of queries do not fill up CPU or GPU memory. The key for the code cache is the serialized LLVM representation of the query_func.

7.4.6. Troubleshooting¶

7.4.6.1. Log Files¶

Sometimes it can be invaluable to see the text IR, PTX, and/or assembly code generated by the JIT complier in OmniSciDB.

The SQL commands EXPLAIN and EXPLAIN OPTIMIZED are available to show LLVM IR code for a query.

The logging system can produce log files ending in .IR, .PTX, and .ASM when enabled with the --log-channels option. See also: ../components/logger.rst

7.4.6.2. Automatic LLVM IR Metadata¶

Finding the C++ code that generated each line of LLVM IR can be hard. Take this simple example of LLVM IR code:

br label %singleton_true_

There are at least 27 files with 11,000+ lines of C++ involved the OmniSciDB codegen system. Consecutive lines of IR can be generated by completely different C++ files. Where did this one branch instruction come from?

LLVM provides metadata on it’s instructions that can help us connect IR instructions to the C++ code that generated them. For debug builds, only, OmniSciDB will automatically add helpful metadata to each IR instruction so developers can find the C++ that generated the IR more easily:

br label %singleton_true_, !IRCodegen.cpp !22

!22 = !{!"Omnisci Debugging Info: compileWorkUnit near NativeCodegen.cpp line #1986, codegenJoinLoops near IRCodegen.cpp line #563"}

A “footnote” at the bottom shows more location detail if needed, the !22. The detailed footnote contains an approximation of the C++ call stack at the time the line of IR code was generated.

A more complex example:

%16 = call i32 @row_func(...), !NativeCodegen.cpp !20
%17 = call i32 @record_error_code(i32 %16, i32* %error_code), !NativeCodegen.cpp !21

!20 = !{!"Omnisci Debugging Info: compileWorkUnit near NativeCodegen.cpp line #1986"}
!21 = !{!"Omnisci Debugging Info: compileWorkUnit near NativeCodegen.cpp line #1986, codegenJoinLoops near IRCodegen.cpp line #563, codegen near LoopControlFlow/JoinLoop.cpp line #53, createErrorCheckControlFlow near NativeCodegen.cpp line #1453"}

LLVM encodes metadata very efficiently, using a string dictionary to reduce strings like IRCodegen.cpp to a single integer stored on the llvm::Instruction.