.. OmniSciDB Query Execution ================================== DAG Builder / Optimizer ================================== The :cpp:class:`RelAlgDagBuilder` and ``RelAlgOptimizer`` are responsible for deserializing the relational algebra tree returned from Calcite (see :doc:`../calcite/calcite_parser`) and building a relational algebra DAG (Directed Acyclic Graph) using OmniSciDB specific data structures. The OmniSciDB RA DAG then passes through several optimization passes. The process of building the OmniSciDB DAG and a description of the optimization passes currently employed follows. DAG Builder =========== The class :cpp:class:`RelAlgDagBuilder` deserializes the JSON string containing the optimized relational algebra tree from Calcite. The interpreter file also includes a class hierarchy for defining relational algebra nodes (``RelAlgNode``) and relational algebra node expressions (``Rex``). The RelAlgDagBuilder constructor takes a JSON ra tree from calcite as an argument and calls a deserialization function for each node in the tree. The deserialization function returns a class derived from ``RelAlgNode`` -- the type of the class corresponds to the type of relational algebra node. A complete list of currently supported relational algebra nodes is available in the table below. ===================== ======================= ================================================================== Calcite Node Type ``RelAlgNode`` Subclass Calcite Description ===================== ======================= ================================================================== EnumerableTableScan RelScan Returns the contents of a table through sequential traversal. LogicalProject RelProject Computes a set of 'select expressions' from its input relational :term:`expression`. LogicalFilter RelFilter Iterates over its input and returns elements for which condition evaluates to true. LogicalAggregate RelAggregate Eliminates duplicates and computes totals. It corresponds to the `GROUP BY` operator in a SQL query statement, together with the aggregate functions in the `SELECT` clause. LogicalJoin RelJoin Combines two relational expressions according to some condition. Each output row has columns from the left and right inputs. LogicalSort RelSort Imposes a particular sort order on its input without otherwise changing its content. Also used to represent a `LIMIT` or `OFFSET`. LogicalValues RelLogicalValues Relational expression whose value is a sequence of zero or more literal row values. LogicalTableModify RelModify Modifies a table. Expression occurs as a result of `DELETE`, or `UPDATE` SQL statements. ===================== ======================= ================================================================== ``RelAlgNode`` -------------- The ``RelAlgNode`` base class stores the relationship between RA nodes in the RA tree. Additionally, the ``RelAlgNode`` stores a pointer to its execution results after the step described by the node has been executed, as well as the final type information for the nodes targets. Derived classes may store additional information (e.g. the ``RelScan`` object contains a pointer to the ``TableDescriptor`` for the table being scanned). ``Rex`` -------- The ``Rex`` class is a pure virtual base class ensuring all ``Rex`` derived objects can be serialized to a string. ``Rex`` derived objects must implement the `toString()` method, but are otherwise free to include their own expression-specific member variables and methods. The main ``Rex`` objects are described in the table below. =================== ============================================================== Object Name Description =================== ============================================================== RexAbstractInput Placeholder for a ``RexInput`` object, resolved after all node objects have been created. RexInput Input expression. Directly references a source ``RelAlgNode`` object. RexLiteral Literal expression. RexAgg Aggregate expression. RexOperator Unary or binary operator expression (e.g. `CAST` or `AND`). ``RexOperator`` will have one or more operands, which are owned by the operator object. RexCase Case expression. Stores operands for `when`, `then`, and `else` similarly to ``RexOperator``. RexSubQuery Scalar subquery expression. ``RexSubQuery`` nodes are dispatched prior to all other nodes and the subquery result is replaced prior to step execution. RexFunctionOperator Function expression. RexRef Synthetic expression node allowing a column in a group by to be referenced directly by the projection. =================== ============================================================== OmniSciDB Specific Query Optimization ===================================== Once all node objects have been built, the OmniSciDB relational algebra tree passes through several optimization phases. Each optimization is described below. Mark Noops ---------- Aggregate RA nodes represent group by operations. In the case where a group by operation is repeated, subsequent aggregate nodes can be marked as a `noop`. If a node is marked as a `noop`, the result from the node immediately preceding the aggregate is simply forwarded during query execution. Below is an example SQL query which contains a `noop` node: .. code-block:: sql SELECT first_name FROM (SELECT first_name FROM Users GROUP BY first_name) GROUP BY first_name; Calcite generates the following RA tree. .. code-block:: Explanation LogicalAggregate(group=[{0}]) LogicalAggregate(group=[{0}]) LogicalProject(first_name=[$0]) EnumerableTableScan(table=[[mapd, Users]]) The second aggregate node is clearly a duplicate of the first. Therefore, the second aggregate node is marked as a `noop` and the result from the first node (after the project) is forwarded. .. note:: The RA tree inspected by prefixing any `SELECT` query with ``explain calcite``. Eliminate Identical Copies -------------------------- Detects and removes project nodes which forward targets to other project nodes higher in the relational algebra tree. Eliminating identical copies reduces intermediate buffer use. But, more importantly, this optimization step prepares the RA tree for further manipulation in the steps below. Fold Filters ------------ Detects two identical filter nodes and folds the filters into a single filter node using the binary operator `AND`. Calcite typically folds most filters automatically during query operation. However, if an intermediate buffer is removed during the eliminate identical copies step, additional opportunities for filter folding may exist. Consider the following example: .. code-block:: sql SELECT x, y FROM (SELECT a.str AS str, b.x AS x, a.y AS y FROM test a, join_test b WHERE a.x = b.x) WHERE str = 'foo' ORDER BY x LIMIT 1; The generated RA from Calcite is: .. code-block:: :linenos: LogicalSort(sort0=[$0], dir0=[ASC], fetch=[1]) LogicalProject(x=[$1], y=[$2]) LogicalFilter(condition=[=($0, 'foo')]) LogicalProject(str=[$10], x=[$35], y=[$1]) LogicalFilter(condition=[=($0, $35)]) LogicalJoin(condition=[true], joinType=[inner]) EnumerableTableScan(table=[[omnisci, test]]) EnumerableTableScan(table=[[omnisci, join_test]]) The project node on line 4 can be removed, with its projected targets pushed to the project node on line 2 (see `eliminate identical copies`). Once the project node on line 4 is removed, the filters on lines 3 and 5 can be folded. Eliminate Dead Columns ---------------------- Intermediate projection nodes may sometimes load more inputs than are actually required by subsequent nodes. In OmniSciDB, each projection node input is loaded into memory and projected through as an output :term:`target`. The dead columns elimination step ensures that only columns that are be used in subsequent projections are loaded into memory. As an example, consider the following table and query: .. code-block:: sql CREATE TABLE dead_cols_test (x INT, y FLOAT); SELECT t1.x FROM dead_cols_test t1 JOIN (SELECT * FROM dead_cols_test) t2 ON t1.x = t2.x; The Calcite generated plan is: .. code-block:: :linenos: LogicalProject(x=[$0]) LogicalJoin(condition=[=($0, $3)], joinType=[inner]) EnumerableTableScan(table=[[mapd, dead_cols_test]]) LogicalProject(x=[$0], y=[$1], rowid=[$2]) EnumerableTableScan(table=[[mapd, dead_cols_test]]) Note that the projection on line 4 is projecting the `y` column (as part of the `select *` expansion in the right hand side join condition). But, `y` is not used in a subsequent projection; therefore, we can eliminate `y` and avoid loading the column for `y` into the memory hierarchy. Separate Window Function Expressions ------------------------------------ OmniSciDB currently requires window function expressions to be top-level expressions on their corresponding RA node. To ensure window function expressions can be embedded in other expressions, we detect the presence of embedded window function expressions and "split" the expression, pushing the top-level expression up to a new projection node, replacing the window function expression in the top-level expression with an input expression, and finally replacing the existing top-level expression in the current projection node with the window function expression. For example, consider the simple expression `LAG() - 1`. The top level expression is a ``BinaryOperator`` (with the op type of `-`, left hand side `LAG()`, and right hand side `1`). We detect the presence of the embedded window function expression, make a copy of the binary operator expression, forward it to a new projection node, and replace the window function expression with an input. Now we have ``BinaryOperator`` with the op type `-`, the left hand side an input target from the previous projection, and the right hand side `1` as before. Finally, we replace the existing binary operator with the top-level window function expression on the existing project node. Coalesce Nodes -------------- Calcite generates canonical relational algebra. Sometimes, executing operations in the canonical order would involve redundant intermediate buffers. The use of intermediate buffers must be minimized to ensure the highest utilization of available compute and memory resources. Therefore, we walk the RA DAG looking for patterns to be coalesced into a synthetic node to be executed without intermediate buffers while preserving the observable effects. For example, we coalesce the `Filter`, `Project`, `Aggregate` chain into a single synthetic node, which we call ``RelCompound``. The compound node evaluates the filter and the aggregate on the fly and avoids the intermediate buffers for `Filter` and `Project` outputs. Consider the following example: .. code-block:: sql SELECT A.x, COUNT(*) FROM test JOIN B ON A.x = B.x WHERE A.y > 41 GROUP BY A.x; The RA tree before and after the coalesce nodes optimization is depicted before (nodes before optimization are drawn with dashed lines). The Compound node now contains all information needed to evaluate the filter and (potentially grouped) aggregates using just the memory buffer required for the final result. .. image:: ../img/dag_optimization.png :align: center Create Left Deep Join --------------------- The ``RelLeftDeepJoin`` node is another synthetic node, similar to ``RelCompound``, which allows multiple join loops to be nested in join loops to be nested in a single operator. Consider the following example: .. code-block:: sql SELECT COUNT(*) FROM a JOIN b ON a.x = b.x JOIN c ON b.x = c.x; Calcite generates the following RA tree: .. code-block:: :linenos: LogicalAggregate(group=[{}], EXPR$0=[COUNT()]) LogicalProject($f0=[0]) LogicalJoin(condition=[=($2, $4)], joinType=[inner]) LogicalJoin(condition=[=($0, $2)], joinType=[inner]) EnumerableTableScan(table=[[omnisci, a]]) EnumerableTableScan(table=[[omnisci, b]]) EnumerableTableScan(table=[[omnisci, c]]) The join nodes on lines 3 and 4 can be coalesced into a single join node, as long as the join condition on line 4 is checked before line 3. The ``RelLeftDeepJoin`` node coalesces multiple join conditions into a single node, and dictates the ordering of the join conditions during code generation.