3. Catalog¶
3.1. High Level Overview¶
OmniSciDB stores database, table, and user information in a centralized Catalog. Each database has its own Catalog, with the top-level System Catalog (SysCatalog) managing catalogs across databases. Each catalog (including the system catalog) is backed by a SQLite database in the mapd_catalogs directory (a top-level directory in the OmniSciDB data directory). In addition to managing catalogs for individual databases, the SysCatalog manages users, roles, and higher-level object permissions.
Both SysCatalog and Catalog classes are defined in files of the same name in the Catalog source directory. All catalog-related code lives inside the Catalog_Namespace
.
3.2. System Schema (SysCatalog)¶
SysCatalog
is a singleton class responsible for managing top-level OmniSciDB server metadata, including:
Users
Roles
Databases
Permissions
Data is persisted using a SQLite database at <path_to_db/mapd_catalogs/omnisci_system_catalog>. When the SysCatalog is created, the contents of the SQLite database is read and stored in memory. Subsequent reads only reference in-memory data structures. All writes are immediately flushed to SQLite.
Below is a UML representation of the schema of the SQLite database backing the system catalog:
3.2.1. Users¶
User accounts in OmniSciDB are global (that is, one user account can have access to multiple databases). The users login information is stored in the mapd_users
system catalog table. When a user is authenticated, a UserMetadata
struct is created to provide an in-memory representation of the user.
3.2.2. Roles¶
Roles are used to grant permissions to groups of users. Roles are stored in the mapd_roles
table. Each user also has a role corresponding to the name of the user, e.g. adding a user Max will also add a corresponding role called Max automatically.
3.2.3. Databases¶
The mapd_databases
table functions as a lookup table for mapping a database name (a string) to the internal database identifier (an integer).
3.2.4. Permissions¶
All system-level object privilege grants are stored and accessed from the mapd_object_permissions
table. An object permission can be assigned to a role, and specifies what level of access a role has to various parts of the system. See DB Object Privileges for more information about interfacing with OmniSciDB’s security model.
The granteeMap_
in the SysCatalog
class stores Grantee
objects, representing object level permissions grants. The granteeMap_
is built at startup.
The objectDescriptorMap_
in the SysCatalog
class stores ObjectRoleDescriptor``s, relating a role to a specific privilege grant. The ``objectDescriptorMap_
is also built when Together, the granteeMap_
and objectDescriptorMap_
are referenced when determining if a user has access to perform an action in the database.
3.3. Database Schema (Catalog)¶
The Catalog_Namespace::Catalog
is the top-level container for schemas associated with an individual database. When the OmniSciDB data directory is initialized, an initial “default” database is created. OmniSciDB always has at least one catalog at startup. The Catalog
maintains the following objects:
Tables
Columns
Views
Dashboards
mapd_dictionaries
Data is persisted using a SQLite database per catalog in the mapd_catalogs directory, with the name of the SQLite database file corresponding to the database name. Just like the SysCatalog, a Catalog loads all data from SQLite on creation and flushes writes to SQLite immediately.
Below is a UML representation of the schema of the SQLite database backing a catalog:
3.3.1. Tables¶
Tables metadata for all tables in a given database is stored in the mapd_tables
SQLite table and read by the Catalog
at startup. Table metadata is stored in the TableDescriptor
object for access by other objects in the system. For a given table, the TableDescriptor
both holds table metadata (id
, fragment size
, etc) and is responsible for instantiating the Fragmenter
for the table. That is, the TableDescriptor
is the primary interface to locating storage for a given table.
3.3.2. Columns¶
Column metadata is stored in the mapd_columns
table. Each column is uniquely identified by an ID pair, mapping to a unique table ID and a unique column ID within that table. ColumnDescriptor
objects pass column metadata (including the SQL Type of a column) throughout the system.
3.3.3. Dictionaries¶
Columns of type dictionary encoded string require a separate string dictionary. The string dictionary metadata is stored in the mapd_dictionaries
SQLite table. DictDescriptor
objects encapsulate string dictionary information. Like the TableDescriptor
, DictDescriptor
is the primary interface for accessing physical string dictionary storage.
3.3.4. Sharded Tables¶
OmniSciDB supports sharded tables – that is, a table partitioned by a predefined column (called the shard key) into a pre-determined number of physical tables, more commonly referred to as shards. Sharding is used for increasing performance (by grouping like data on like devices) and for support complex queries in distributed mode (Enterprise Edition only). Internally, a sharded table is represented as a top-level logical table and several underlying physical tables (with the number of physical tables corresponding to the total shard count). This mapping is stored in the mapd_logical_to_physical
SQLite table.
3.3.5. Views¶
Views in OmniSciDB are currently not materialized during creation. Instead, the definition for the view is stored in the mapd_views
SQLite table, and the view is materialized lazily when the view is queried. By using lazy materialization, OmniSciDB fuse the query on the view with the underlying table(s) backing the view, reducing the number of intermediate projections required and/or the amount of data that must be loaded and processed. Views are represented by a TableDescriptor
object with the isView boolean member set to true.
3.3.6. Temporary Tables¶
OmniSciDB supports temporary tables. Temporary tables can be created using CREATE TEMPORARY TABLE
. All types (except geospatial types) are supported in temporary tables. Temporary tables store their data in CPU memory and persist until the server is restarted.
Temporary tables are identified by checking the persistenceLevel
property of the TableDescriptor
. A free function, table_is_temporary
, is available for convenience.
Note that calcite_parser relies on reading the sqlite catalog files directly for unit testing. To test temporary tables, a separate JSON file is created in the catalog directory for each database containing temporary table metadata. See calcite_omniscidb_comms for more details.
3.3.7. Dashboards¶
OmniSciDB holds the dashboard state for OmniSci Immerse (Enterprise Edition only), our web-based data exploration tool. Dashboards are serialized and stored in the mapd_dashboards
table. Because dashboards are built over a database and typically contain many database objects, they are considered a top-level object in the OmniSciDB catalog and object permissions model.
3.4. Migration¶
OmniSciDB includes a schema migration service for migrating user catalog information between versions. The schema migration service can also drive storage level migrations. The mapd_version_history
table enables tracking whether or not a migration has been performed. When changing SysCatalog or Catalog SQLite schema, migrations must be used to ensure previous data directories work properly.