Access Keys:
Skip to content (Access Key - 0)

Release Notes

Getting Started

Concepts

Developer's Guide

Tools and Utilities

Reference

Toggle Sidebar
You are viewing information for a product in a pre-release state. The released product and the documentation may differ significantly from the version described here

SQLFabric Quick Start

Quick Start

This Quick Start guide will first walk through an Java example illustrating how to embed an SQLFabric data container in a appliction JVM, then show how to get a SQLFabric distributed system (partitioned tables in memory) running across multiple JVMs. It demonstrates the creation of tables and associated schema, population of data, and some example queries. The command-line tool ij is demonstrated for convenience, but notes are given on how to get started using the JDBC API in a Java program as well.

The sample database we use is called toursDB, which manages information about air transportation for scheduling flights.

This example is designed to get you up and running, but it does not go into detail on partitioning and colocation strategies for your data. Read this introduction and Key concepts if you are completely new to the idea of a data fabric or data grid.

It is assumed that you are already familiar with relational data modeling and the Java JDBC API

Download the product

You can download the product from Download.

Set Up

  • Create a new directory and change to that directory, then copy all the files from the sqlfabric_dir/quickstart/ directory to this directory.
  • Make sure the appropriate java executable is in your PATH environment variable. The Java VM must be version 1.6 or higher. (SQLFabric supports two drivers - a thin JDBC driver and a more efficient peer JDBC driver. The peer driver only supports JDK 1.6. SQLFabric servers require JDK 1.6).

Managing Tours database in partitioned tables

We start our walkthrough by first demonstrating creation and population of partitioned tables managed in heap memory of 2 JVMs.

Start two fabric servers

Each fabric server requires a separate directory to which it writes its log files and a status file (named .fabricserver.ser).

$ mkdir fabricserver1
$ mkdir fabricserver2

Before you start any GemFire SQLFabric server, you decide how the servers will discover each other dynamically. Even though GemFire supports both a TCP and a Multicast based mechanism for discovery, we will just use Multicast discovery in these examples. You can understand more details about member discovery here. All you have to do is decide on a port number (Multicast channel) that all servers will use to discover each other. All fabric servers are connected in a peer-2-peer network.
If multicast is not supported, you can switch to use the GemFire locator service (TCP based) for discovery. See details below.

Note

sqlf is a command line tool used to start/stop fabric servers among other things. This tool is available in sqlfabric_dir/bin. If you add sqlfabric_dir/bin to your PATH environment variable, you can just type:
$ sqlf <rest of command line> instead of using a path to the <sqlfabric_dir>/bin/sqlf executable.

if you are running on windows you would use sqlf.bat ij
The rest of this Quick Start assumes you have done this.

To start two fabric servers in the same distributed system (mcast-port), each one accepting thin client connections on a different port:
$ sqlf fabricserver start -dir=fabricserver1 -port=1527 -mcast-port=12333 &
$ sqlf fabricserver start -dir=fabricserver2 -port=1528 -mcast-port=12333&

Notes

The servers will start in the background and output the message:
Starting Fabric Server with pid: process id
Fabric Server pid: process id status: running


Note the following:
1) if there are other developers running SQLFabric in your network and if they are using the same mcast-port, your servers will automatically join their distributed system. So, it might be a good idea to choose a different port number or use the GemFire TCP based locator service


2) When starting fabricserver using the command line, you should always start it in the background (using "&"). This is required because, SQLFabric is a distributed system and may wait for other members that were present in a previous run to become available.


You can find out all the members participating in your distributed system by looking at the log file (called fabricserver.log) in either fabricserver1 or fabricserver2 directory. For instance, after starting both servers, you will see messages like this in fabricserver2/fabricserver.log:


info 2009/04/14 13:02:59.906 ACT <main> tid=0x1] Initial (membershipManager) view = 10.80.10.209(4652):4665/4662, 10.80.10.209(4208):4671/4668
info 2009/04/14 13:02:59.906 ACT <main> tid=0x1] Admitting member <10.80.10.209(4652):4665/4662>. Now there are 1 non-admin member(s).
info 2009/04/14 13:02:59.906 ACT <main> tid=0x1] Admitting member <10.80.10.209(4208):4671/4668>. Now there are 2 non-admin member(s).


If Multicast is not supported, you can use the TCP based discovery service (the GemFire locator).
You can start the fabricservers like this:
sqlf fabricserver start -dir=fabricserver1 -port=1527 mcast-port=0 start-locator=localhost[4444] &
sqlf fabricserver start -dir=fabricserver2 -port=1528 mcast-port=0 locators=localhost[4444] &


Finally, note that each server will by default also open a TCP server port to accept incoming thin client connections. Unless otherwise specified with the -port=<port_num> option, a fabric server listens on port 1527. Each fabric server process must listen on a different port (or else not open a server port at all by using the option -run-netserver=false).

sqlf ij

sqlf ij is a command-line tool that you can use to run scripts or interactive queries against an SQL Fabric distributed system. You will use sqlf ij in this example to run a SQLFabric client.

sqlf ij is the SQLFabric analog of Derby's ij tool

To run sqlf ij from the command line use:

$ sqlf ij

This starts the interactive shell and prints the prompt ij>.

To print a brief list of the sqlf ij commands, type:

ij> help;

More information on using the ij tool can be found in the Derby documentation at Derby Tools and Utilities Guide. Note that the primary difference between connecting to Derby vs. SQLFabric is that SQLFabric does not have the notion of a "database". When you connect to SQLFabric, the distributed system you connect to is defined by either the mcast-port or the locators. More on the use of Derby capabilities in GemFire SQLFabric is explained here .

Start a peer or thin client in ij.

Java clients can connect using either of the following two options:

  1. SQLFabric JDBC peer (or embedded) driver
    Use of this driver provides the fastest access to data. For instance, this driver upon bootstrap automatically caches all the meta data on table partitions providing single hop access to data that can be pruned to a single partition. Optionally, peer clients can also host data in local JVM heap. The disadvantage with this driver is that it has a larger footprint compared to the thin client driver and will create connections to each distributed peer member(data stores). This driver is bundled in sqlfabric.jar (<product>/lib directory). The URL for this driver and other connection properties is described here.
  2. SQLFabric JDBC thin client driver
    This driver uses system fewer resources (memory and socket connections) and therefore a greater number of these clients can be connected to the system as compared to the peer driver. This driver is packaged as sqlfabricclient.jar and is about 600KB in size. We will use this driver for this exercise.

$ sqlf ij
ij> connect 'jdbc:sqlfabric://localhost:1527';

Notes
  1. To get a SQLFabric JDBC Connection in a similar way in Java:
    1. To work around a known defect in JDK 1.6, make sure the driver classes are pre-loaded with this code:
      java.sql.DriverManager.getDrivers();
      This is only required when there are multiple threads getting Connections, since this is a race condition in the DriverManager implementation in the VM from Sun.
    2. Use the above URL in the call to DriverManager.getConnection .

Default hash partitioning of all tables

Create the tables using the ToursDB_schema.sql script and populate the tables using the loadTables.sql script

ij> run 'ToursDB_schema.sql';
ij> run 'loadTables.sql';

When creating the tables, you will see DDL output like ...

ij> ...
CREATE TABLE AIRLINES 
   (  
      AIRLINE CHAR(2) NOT NULL , 
      AIRLINE_FULL VARCHAR(24), 
      BASIC_RATE DOUBLE PRECISION, 
   ... 
0 rows inserted/updated/deleted .... This messages indicates the DDL was processed as expected ...
   ... Other output messages not shown ... 

At this point the tables are created and the data is loaded, all in memory, partitioned across two servers and accessed by one client running ij.
You can check if all the tables are created using ij.

ij> show tables;

To show just the tables in the default schema for applications:

ij> show tables in APP;

At this point the tables are created and the data is loaded, all in memory, partitioned across 2 fabric servers.

For reference the schema of the sample data can be viewed at ToursDB Schema.

Queries can be executed within ij, for example:

ij> SELECT city_name, country, language FROM cities WHERE language LIKE '%ese';

CITY_NAME               |COUNTRY                   |LANGUAGE        
--------------------------------------------------------------------
Rio de Janeiro          |Brazil                    |Portuguese      
Sao Paulo               |Brazil                    |Portuguese      
Hong Kong               |China                     |Chinese         
Shanghai                |China                     |Chinese         
Osaka                   |Japan                     |Japanese        
Tokyo                   |Japan                     |Japanese        
Lisbon                  |Portugal                  |Portuguese

For a reference to the SQL dialect implemented in SQLFabric, see SQLFabric Language Reference.

Example join:

ij> SELECT city_name, countries.country, region, language
FROM cities, countries
WHERE cities.country_iso_code = countries.country_iso_code AND language LIKE '%ese';

CITY_NAME               |COUNTRY                   |REGION                    |LANGUAGE        
-----------------------------------------------------------------------------------------------
Rio de Janeiro          |Brazil                    |South America             |Portuguese      
Sao Paulo               |Brazil                    |South America             |Portuguese      
Hong Kong               |China                     |Asia                      |Chinese         
Shanghai                |China                     |Asia                      |Chinese         
Osaka                   |Japan                     |Asia                      |Japanese        
Tokyo                   |Japan                     |Asia                      |Japanese        
Lisbon                  |Portugal                  |Europe                    |Portuguese

By default, the tables are hash partitioned on the primary key. Queries such as the ones above are parallely executed on all the data nodes and the results unioned.
Queries that operate on a single row (i.e. primary key in their 'where clause') will automatically be pruned to the data node owning the data.

Implementing a partitioning and replication strategy

Let us take another look at the schema. The schema looks like a 'STAR' schema with a few fact tables and several dimension tables. For this exercise, we will assume that tables like 'FLIGHTS' and 'FLIGHTAVAILABILITY' are fact tables and the rest like 'CITIES' and 'COUNTRIES' are dimension tables. Dimension tables tend to be small, infrequently change and are frequently used in join queries. So, we will now change these to be replicated and keep all fact tables as partitioned tables. Tables 'FLIGHTS', 'FLIGHTAVAILABILITY' and 'FLIGHTS_HISTORY' will be managed as partitioned tables.

We will also assume that the application will want to frequently join all these related tables based on 'FLIGHT_ID' and use this as the partitioning column. And, we will also colocate all these partitioned tables. Essentially, we will now ensure that all related rows associated with any FLIGHT_ID is always maintained in a single partition. This step ensure that the frequent join queries based on a certain flight are pruned to a single node and executed efficiently.

Drop and re-create some of the tables as partitioned (by 'FLIGHT_ID') and the rest as replicated tables

ij> run 'create_colocated_schema.sql';
ij> run 'loadTables.sql';

Here is the DDL for 'FLIGHTAVAILABILITY' partitioned on 'FLIGHT_ID' and colocated with the 'FLIGHTS' table.

CREATE TABLE FLIGHTAVAILABILITY
   (
      FLIGHT_ID CHAR(6) NOT NULL ,
      SEGMENT_NUMBER INTEGER NOT NULL ,
      FLIGHT_DATE DATE NOT NULL ,
      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0,
      BUSINESS_SEATS_TAKEN INTEGER DEFAULT 0,
      FIRSTCLASS_SEATS_TAKEN INTEGER DEFAULT 0
   )
   PARTITION BY COLUMN (FLIGHT_ID)
   COLOCATE WITH (FLIGHTS);

And, a replicated table is created simply using the 'REPLICATE' keyword

CREATE TABLE COUNTRIES
   (
      COUNTRY VARCHAR(26) NOT NULL,
      COUNTRY_ISO_CODE CHAR(2) NOT NULL ,
      REGION VARCHAR(26)
   ) REPLICATE;

You can check if all the tables are created using ij.

ij> show tables in APP;

Now queries that join Table FLIGHTS with FLIGHTAVAILABILITY can be executed like this

ij> select * from flights f, flightavailability fa where f.flight_id = fa.flight_id;
This join will be executed on both partition nodes in parallel and the unioned results returned.

ij> select * from flights f, flightavailability fa where f.flight_id = fa.flight_id and f.flight_id = 'AA1116';
Here, as the table is partitioned by 'flight_id', the execution will be pruned to just the partition that owns 'AA1116'.

When you are ready to exit out of the sqlf ij tool, type:

ij> exit;

When you are ready to stop the servers, execute in the command shell:

$ sqlf fabricserver stop -dir=fabricserver1
$ sqlf fabricserver stop -dir=fabricserver2

  • You can further expand on this example and do things like add expiry, eviction, overflow to disk or persistence to the tables, or do things like add more servers to increase capacity and throughput, etc.

You can go through the SQLFabric Language Reference for examples on the DDL synatx. Make the desired changes to the 'create_colocated_schema.sql' and try again.


Child Pages

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
Adaptavist Theme Builder (3.4.0-M2-conf210) Powered by Atlassian Confluence 2.10.3, the Enterprise Wiki.