Installation¶

If you haven’t already installed Java8 HOTSPOT and maven, please install them.

Install Quark JDBC from Source¶

mkdir quark
cd quark
git clone git@bitbucket.org:qubole/quark.git quark-src
cd quark-src
mvn package
ls -l quark-jdbc/target/quark-jdbc-*.jar
cd -

Install Quark JDBC from Maven¶

Find the latest version from Maven Central

mvn org.apache.maven.plugins:maven-dependency-plugin:2.1:get \
    -DrepoUrl=-DrepoUrl=http://repo1.maven.org/maven2 \
    -Dartifact=com.qubole:quark-jdbc:<version>

Get jar from mvn repository on the local drive ~/.m2/repositories/com/qubole/quark-jdbc.

Configure Quark¶

Quark is configured through a JSON Model. The model definition is available in JsonModel.md

A simple example to access Redshift and Mysql is shown below. You can replace the dataSources with any JDBC driver.

There are more examples in examples/dataSources.json

{
  "dataSources":[
    {
      "type":"REDSHIFT",
      “url":"redshift_db_url",
      “username":"xyz",
      “password":"xyz",
      "name":"redshift",
      "default":"true",
      "factory":"com.qubole.quark.plugins.jdbc.JdbcFactory"
    },
    {
      "type":"MYSQL",
      "url”:"mysql_db_url",
      "username”:"xyz",
      "password”:"xyz",
      "name":"mysql",
      "default":"false",
      "factory":"com.qubole.quark.plugins.jdbc.JdbcFactory"
    }
   ],
}

Connect through SQLLine¶

# Copy jline-xxx.jar, sqlline-xxx.jar, and quark jdbc driver jar into directory.
cd quark
mkdir sqlline
cd sqlline
wget http://central.maven.org/maven2/sqlline/sqlline/1.1.9/sqlline-1.1.9.jar
wget http://central.maven.org/maven2/jline/jline/2.13/jline-2.13.jar
cp ../quark-src/quark-jdbc/target/quark-jdbc-*.jar .
cd -

#Start SQLLine

java -classpath "/home/user/sqlLine/*" sqlline.SqlLine

# Connect to Quark

!connect jdbc:quark:model.json com.qubole.quark.jdbc.QuarkDriver

Note: no password is required

Run Queries¶

Quark organizes schemas and tables from each database in its own namespace. The name of the namespace is the same as the name attribute. Using the example above, all tables in REDSHIFT database are available as redshift.<schema>.<table>. Similarly, a table in MYSQL database is available as mysql.<schema>.<table>.

You can run SQL queries on these tables using SQLLine prompt.

Next Steps¶

To really appreciate the utility of Quark, define relationships between tables in different databases. A table could be a materialized view or a cube that stores aggregations. In the examples directory, there are instructions to define relationships between tables in Hive (EMR) and Redshift or between SQL engines provided by Qubole.