3. Catalog

3.1. High Level Overview

OmniSciDB stores database, table, and user information in a centralized Catalog. Each database has its own Catalog, with the top-level System Catalog (SysCatalog) managing catalogs across databases. Each catalog (including the system catalog) is backed by a SQLite database in the mapd_catalogs directory (a top-level directory in the OmniSciDB data directory). In addition to managing catalogs for individual databases, the SysCatalog manages users, roles, and higher-level object permissions.

Both SysCatalog and Catalog classes are defined in files of the same name in the Catalog source directory. All catalog-related code lives inside the Catalog_Namespace.

../_images/catalog_overview.png

3.2. System Schema (SysCatalog)

SysCatalog is a singleton class responsible for managing top-level OmniSciDB server metadata, including:

  • Users

  • Roles

  • Databases

  • Permissions

Data is persisted using a SQLite database at <path_to_db/mapd_catalogs/omnisci_system_catalog>. When the SysCatalog is created, the contents of the SQLite database is read and stored in memory. Subsequent reads only reference in-memory data structures. All writes are immediately flushed to SQLite.

Below is a UML representation of the schema of the SQLite database backing the system catalog:

../_images/syscat_schema.png

3.2.1. Users

User accounts in OmniSciDB are global (that is, one user account can have access to multiple databases). The users login information is stored in the mapd_users system catalog table. When a user is authenticated, a UserMetadata struct is created to provide an in-memory representation of the user.

3.2.2. Roles

Roles are used to grant permissions to groups of users. Roles are stored in the mapd_roles table. Each user also has a role corresponding to the name of the user, e.g. adding a user Max will also add a corresponding role called Max automatically.

3.2.3. Databases

The mapd_databases table functions as a lookup table for mapping a database name (a string) to the internal database identifier (an integer).

3.2.4. Permissions

All system-level object privilege grants are stored and accessed from the mapd_object_permissions table. An object permission can be assigned to a role, and specifies what level of access a role has to various parts of the system. See DB Object Privileges for more information about interfacing with OmniSciDB’s security model.

The granteeMap_ in the SysCatalog class stores Grantee objects, representing object level permissions grants. The granteeMap_ is built at startup.

The objectDescriptorMap_ in the SysCatalog class stores ObjectRoleDescriptor``s, relating a role to a specific privilege grant. The  ``objectDescriptorMap_ is also built when Together, the granteeMap_ and objectDescriptorMap_ are referenced when determining if a user has access to perform an action in the database.

3.3. Database Schema (Catalog)

The Catalog_Namespace::Catalog is the top-level container for schemas associated with an individual database. When the OmniSciDB data directory is initialized, an initial “default” database is created. OmniSciDB always has at least one catalog at startup. The Catalog maintains the following objects:

  • Tables

  • Columns

  • Views

  • Dashboards

  • mapd_dictionaries

Data is persisted using a SQLite database per catalog in the mapd_catalogs directory, with the name of the SQLite database file corresponding to the database name. Just like the SysCatalog, a Catalog loads all data from SQLite on creation and flushes writes to SQLite immediately.

Below is a UML representation of the schema of the SQLite database backing a catalog:

../_images/cat_schema.png

3.3.1. Tables

Tables metadata for all tables in a given database is stored in the mapd_tables SQLite table and read by the Catalog at startup. Table metadata is stored in the TableDescriptor object for access by other objects in the system. For a given table, the TableDescriptor both holds table metadata (id, fragment size, etc) and is responsible for instantiating the Fragmenter for the table. That is, the TableDescriptor is the primary interface to locating storage for a given table.

3.3.2. Columns

Column metadata is stored in the mapd_columns table. Each column is uniquely identified by an ID pair, mapping to a unique table ID and a unique column ID within that table. ColumnDescriptor objects pass column metadata (including the SQL Type of a column) throughout the system.

3.3.3. Dictionaries

Columns of type dictionary encoded string require a separate string dictionary. The string dictionary metadata is stored in the mapd_dictionaries SQLite table. DictDescriptor objects encapsulate string dictionary information. Like the TableDescriptor, DictDescriptor is the primary interface for accessing physical string dictionary storage.

3.3.4. Sharded Tables

OmniSciDB supports sharded tables – that is, a table partitioned by a predefined column (called the shard key) into a pre-determined number of physical tables, more commonly referred to as shards. Sharding is used for increasing performance (by grouping like data on like devices) and for support complex queries in distributed mode (Enterprise Edition only). Internally, a sharded table is represented as a top-level logical table and several underlying physical tables (with the number of physical tables corresponding to the total shard count). This mapping is stored in the mapd_logical_to_physical SQLite table.

3.3.5. Views

Views in OmniSciDB are currently not materialized during creation. Instead, the definition for the view is stored in the mapd_views SQLite table, and the view is materialized lazily when the view is queried. By using lazy materialization, OmniSciDB fuse the query on the view with the underlying table(s) backing the view, reducing the number of intermediate projections required and/or the amount of data that must be loaded and processed. Views are represented by a TableDescriptor object with the isView boolean member set to true.

3.3.6. Temporary Tables

OmniSciDB supports temporary tables. Temporary tables can be created using CREATE TEMPORARY TABLE. All types (except geospatial types) are supported in temporary tables. Temporary tables store their data in CPU memory and persist until the server is restarted.

Temporary tables are identified by checking the persistenceLevel property of the TableDescriptor. A free function, table_is_temporary, is available for convenience.

Note that calcite_parser relies on reading the sqlite catalog files directly for unit testing. To test temporary tables, a separate JSON file is created in the catalog directory for each database containing temporary table metadata. See calcite_omniscidb_comms for more details.

3.3.7. Dashboards

OmniSciDB holds the dashboard state for OmniSci Immerse (Enterprise Edition only), our web-based data exploration tool. Dashboards are serialized and stored in the mapd_dashboards table. Because dashboards are built over a database and typically contain many database objects, they are considered a top-level object in the OmniSciDB catalog and object permissions model.

3.4. Migration

OmniSciDB includes a schema migration service for migrating user catalog information between versions. The schema migration service can also drive storage level migrations. The mapd_version_history table enables tracking whether or not a migration has been performed. When changing SysCatalog or Catalog SQLite schema, migrations must be used to ensure previous data directories work properly.