1. OmniSciDB at 30,000 Feet

OmniSciDB is made up of several high level components. A diagram illustrating most of the components of the system is displayed below, with a short description of each major section of the documentation following.

1.1. High Level Diagram

../_images/platform_overview.png

The major components in the above diagram and their respective reference pages are listed in the table below.

Component Reference

Component

Reference Page

Thrift Interface

External API

Calcite Server

Calcite Parser

Catalog

Catalog

Executor

Overview

LLVM JIT

Code Generation

CPU / GPU Kernels

Execution Kernels

Database Files, Metadata Files, Dictionary Files

Physical Data Layout

1.2. Data Model

The Data Model section provides an overview of the data formats and data types supported by OmniSciDB. A brief overview of various storage layer components is also included.

1.3. Data Flow

The Data Flow section ties together the Data Model and Query Execution sections, providing information about the complete flow of data from the input columns for a query to its projected outputs.

1.4. Query Execution

The Query Execution section provides an overview of how a query is executed inside OmniSciDB.

At a high-level, all SQL queries made to the server pass through the Thrift sql_execute endpoint. The query string is passed to Apache Calcite for parsing and cost-based optimization, yielding an optimized relational algebra tree. This relational algebra tree is then passed through OmniSci-specific optimization passes and translated into an OmniSCi-specific abstract syntax tree (AST). The AST provides all the information necessary to generate native machine code for query execution on the target device. Execution then occurs in parallel on the target device, with device results being aggregated and reduced into a final ResultSet for each query step.

The sections following provide in-depth details on each of the stages involved in executing a query.

1.5. Simplified Execution Model

 @startuml

 start

 :Parse and Validate SQL;

 :Generate Optimized
  Relational Algebra Sequence;

 :Prepare Execution Environment;

 repeat
     fork
         :Data Ownership,
          Identification,
          Load (as required);
         :Execute Query Kernel
          on Target Device;
     fork again
         :Data Ownership,
          Identification,
          Load (as required);
         :Execute Query Kernel
          on Target Device;
     fork again
         :Data Ownership,
          Identification,
          Load (as required);
         :Execute Query Kernel
          on Target Device;
     end fork
     :Reduce Result;

 while (Query Completed?)

 :Return Result;

 stop

 @enduml