Introduction ============ Metadata for Quark is represented with JSON. This file describes the structure of the JSON. Elements ======== Root ---- { version: '1.0', dataSources: [ DataSource... ], relSchema: { RelSchema ... } } `version` (optional) if present should be equal to `1.0`. `dataSources` is a list of DataSource elements. Each element describes attributes about a `dataSource`. `relSchema` captures the relationships between tables in `DataSources`. `Cubes` and `Views` are supported right now. DataSource ---------- Occurs within `root.dataSources`. { name: 'MYSQLDB' factory: 'com.qubole.quark.plugins.jdbc.JdbcFactory' url: 'jdbc://..../' default: 'true' } `name` Name of the DataSource. `name` is used as the wrapper schema for all schemas and tables in this DataSource. `factory` Factory class to create DataSources. The class should implement `com.qubole.quark.DataSourceFactory`. Factories available out of the box are: * `com.qubole.quark.plugins.jdbc.JdbcFactory` - Creates data sources that connect using a JDBC driver. * `com.qubole.quark.plugins.qubole.QuboleFactory` - Creates data sources that are hosted by QDS `url` URL of the data source. `default` The default data source is used to determine the default schema. JdbcDataSource -------------- Like `DataSource` occurs within `root.DataSources` { type: 'MYSQL' username: 'user' password: 'pwd' } `type` Type of database. Supported databases out of the box are: * EMR (Apache Hive on EMR) * H2 * MYSQL * REDSHIFT `username` Username `password` Password QuboleDataSource ---------------- TODO RelSchema ========= Contains relationships between tables. The tables may be hosted in different data sources. Two types of relationships are supported. { views: [View ... ] cubes: [Cube ... ] } `views` Describes materialized views on a table in one of the data sources. The materialized view maybe stored in a different data source. `cubes` Describes cubes generated on star schema join among tables in one of the data sources. The cube maybe stored in a different data source. Views ----- Occurs in `root.relSchema`. Similar to materialized views in databases. { name: 'warehouse_big` query: 'select * from hive.tpcds.warehouse as wr where wr.w_warehouse_sq_ft > 100' dataSource: 'VIEWS' schema: 'PUBLIC' table: 'WAREHOUSE_PARTITION' } `name` Name of the view. `query` Query that describes the materialized view. `dataSource` Data source where the materialized view is stored. `schema` Schema of the table where the materialized view is stored. `table` Name of the table where the materialized view is stored. Cubes ----- Occurs in `root.relSchema`. { name: 'web_returns_cube` query: 'select 1 from canonical.public.web_returns as w join canonical.public.item ...' destination: 'CUBES' schema: 'PUBLIC' table: 'WEB_RETURNS_CUBE' groupingColumn: 'GROUPING__ID' dimensions: [Dimension ...] measures: [Measure ...] groups: [Group ...] } `name` Name of the cube. `query` Query that describes the cube. `destination` Data source where the cube is stored. `schema` Schema of the table where the cube is stored. `table` Name of the table where the cube is stored. `groupingColumn` Column that stores the number corresponding to the GROUPING bit vector associated with the row. Dimension --------- Occurs in `root.relSchema.cubes`. { schema: '', table: 'i' column: 'i_item_id', cubeColumn: 'I_ITEM_ID', dimensionOrder: 0, name: 'Item Id', parent: null, } `schema` Schema of the source table. `table` Table name or alias of the source table. `column` Column name in the source table. `cubeColumn` Column in the cube table. `dimensionOrder` Ordinal number in the dimension list. `name` A descriptive name for the dimension. It should be unique. The name is used to identify parents and children in a dimension hierarchy. `parent` Cube Column Name of the parent dimension if its part of a hierarchy. Measure ------- Occurs in `root.relSchema.cubes` { column: 'wr_net_loss', cubeColumn: 'TOTAL_NET_LOSS', function: 'sum', } `column` Name of the column in the fact table. `cubeColumn` Name of the column in the cube table. `function` Aggregate function on the column in the fact table. Supported Group ----- Occurs in `root.relSchema.cubes`