Introduction

Metadata for Quark is represented with JSON. This file describes the structure of the JSON.

Elements

Root

{
    version: '1.0',
    dataSources: [ DataSource... ],
    relSchema: { RelSchema ... }
}

version (optional) if present should be equal to 1.0.dataSources is a list of DataSource elements. Each element describes attributes about a dataSource.relSchema captures the relationships between tables in DataSources. Cubes and Views are supported right now.

DataSource

Occurs within root.dataSources.

{
    name:    'MYSQLDB'
    factory: 'com.qubole.quark.plugins.jdbc.JdbcFactory'
    url:     'jdbc://..../'
    default: 'true'
}

name Name of the DataSource. name is used as the wrapper schema for all schemas and tables in this DataSource.factory Factory class to create DataSources. The class should implement com.qubole.quark.DataSourceFactory. Factories available out of the box are:

  • com.qubole.quark.plugins.jdbc.JdbcFactory - Creates data sources that connect using a JDBC driver.
  • com.qubole.quark.plugins.qubole.QuboleFactory - Creates data sources that are hosted by QDS url URL of the data source.default The default data source is used to determine the default schema.

JdbcDataSource

Like DataSource occurs within root.DataSources

{
    type: 'MYSQL'
    username: 'user'
    password: 'pwd'
}

type Type of database. Supported databases out of the box are:

  • EMR (Apache Hive on EMR)
  • H2
  • MYSQL
  • REDSHIFT

username Username password Password

QuboleDataSource

TODO

RelSchema

Contains relationships between tables. The tables may be hosted in different data sources. Two types of relationships are supported.

{
    views: [View ... ]
    cubes: [Cube ... ]
}

views Describes materialized views on a table in one of the data sources. The materialized view maybe stored in a different data source.cubes Describes cubes generated on star schema join among tables in one of the data sources. The cube maybe stored in a different data source.

Views

Occurs in root.relSchema. Similar to materialized views in databases.

{
    name: 'warehouse_big`
    query: 'select * from hive.tpcds.warehouse as wr where wr.w_warehouse_sq_ft > 100'
    dataSource: 'VIEWS'
    schema: 'PUBLIC'
    table: 'WAREHOUSE_PARTITION'
}

name Name of the view.query Query that describes the materialized view.dataSource Data source where the materialized view is stored.schema Schema of the table where the materialized view is stored. table Name of the table where the materialized view is stored.

Cubes

Occurs in root.relSchema.

{
    name: 'web_returns_cube`
    query: 'select 1 from canonical.public.web_returns as w join canonical.public.item ...'
    destination: 'CUBES'
    schema: 'PUBLIC'
    table: 'WEB_RETURNS_CUBE'
    groupingColumn: 'GROUPING__ID'
    dimensions: [Dimension ...]
    measures: [Measure ...]
    groups:   [Group ...]
}

name Name of the cube.query Query that describes the cube.destination Data source where the cube is stored.schema Schema of the table where the cube is stored. table Name of the table where the cube is stored. groupingColumn Column that stores the number corresponding to the GROUPING bit vector associated with the row.

Dimension

Occurs in root.relSchema.cubes.

{
    schema: '',
    table: 'i'
    column: 'i_item_id',
    cubeColumn: 'I_ITEM_ID',
    dimensionOrder: 0,
    name: 'Item Id',
    parent: null,
}

schema Schema of the source table.table Table name or alias of the source table.column Column name in the source table.cubeColumn Column in the cube table.dimensionOrder Ordinal number in the dimension list.name A descriptive name for the dimension. It should be unique. The name is used to identify parents and children in a dimension hierarchy.parent Cube Column Name of the parent dimension if its part of a hierarchy.

Measure

Occurs in root.relSchema.cubes

{
    column: 'wr_net_loss',
    cubeColumn: 'TOTAL_NET_LOSS',
    function: 'sum',
}

column Name of the column in the fact table. cubeColumn Name of the column in the cube table. function Aggregate function on the column in the fact table. Supported

Group

Occurs in root.relSchema.cubes