VoltDB

VoltDB
fraimed
Developer(s) VoltDB Inc.
Repository {{#property:P1324}}
Written in Java,C++
Operating system Linux, Mac OS X
Platform Java
Type Database
License GNU Affero General Public License v3, VoltDB Proprietary License
Website official site

VoltDB is a relational new SQL in-mtmory DBMS, designed for real-time transaction processing (OLTP), that uses shared-nothing architecture. Because of storing all the data in memory, VoltDB provides very high performance. It also support ACID model. DBMS is developed by a group of database researchers, including M.Stonebraker, S.Madden and D.Abadi.

About

VoltDB is a high-performance business DBMS, that uses newSQL, in-memory (all the data is stored in memory, which provide extra-high speed of work) approaches and shared-nothing architecture (all the nodes work independently, which allows unlimited scaling). VoltDB supports SQL queries and guarantee transactional data integrity (ACID model). It uses stored Java procedures for the work. Copies of the procedures are being attached to each section. If a query affects more than one section, the procedure is awaken in each section and the resulting data is being aggregated afterwards. DBMS has two versions: enterprise and community, that is licensed under AGPLv3.

Working principle

Partitioning

VoltDB uses stored procedures. Each procedure is acting like an atomic operation, which allows to keep data integrity. By analizing the stored procedures logic VoltDB divides them to partitions and distribute between servers. Each part has a "slice" of data with the procedures for working with it. Such an organization allows a parallel data processing.

Single-Threaded processing

A call to a stored procedure is passed to a partition, which data it needs. If all the needed data present at a single partition, the server hadle a call by itself, letting other server work in parallel. Due to the single-treadness of the work with a single partition, data integrity is guarateed. This saves VoltDB from the need of overheads on blocking, operation fixation and transaction logging, while DB work remains multi-thread.

If an operation needs data from more than one partition, then one node acts like a coordinator. It sends requests to the others and process the data, recieved from them. All other nodes continue their work independently.

Replication

VoltDB allows to increase the performance by duplicating all the read-only data to all the servers. This accelerates the work significanlty, because all the read queries, requiring these data, can be processed within a single thread.

Field of use

VoltDB is oriented to a special segment of busyness computing. The main function of VoltDB is a fast processing of data, which is needed by applications requiring fast processing of big data flows. Such applications like financial applications, media, Internet of Things, or other developing field. VoltDB satisfies their main requirements: scalability, high accessibility and performance.

VoltDB doesn't fit all of the standard query types. It is not suitable for collecting and processing big amouts of data from multiple tables.

Nowadays VoltDB is being used in applications, that require high performance, ones like financial trading, streaming, recording, online gaming, fraud detection, sensory systems distribution etc.

Work with VoltDB

Installation

VoltDB is being distributed in enterprise and community versions. Enterprise version is provided as a pre-built distribution, and the community version comes as source code.

Installing

VoltDB is distributed as a compressed tar archive. To use it, you need to unpack it to any directory.

$ tar -zxvf voltdb-ent-6.0.tar.gz -C $HOME/

It's recommended to use user's home directory, because he would get the full access to the application then.

If it's needed to install VoltDB into some standard system location, it can be done as follows:

$ sudo tar -zxvf voltdb-ent-6.0.tar.gz -C /opt
$ cd /opt
$ sudo mv voltdb-ent-6.0 voltdb

Note, that in this case the application folders would be read-only for non-privileged accounts.

Building from the source code

VoltDB source code is stored at GitHub repository. After downloading the files it's needed to build a distribution out of them. It's recommended to use Ant of version 1.7 or newer.

$ ant dist

As a result, a tar a archive is being built. These archive is similar to the one from the previous section, so will need to unpack it into the working folder.

Upgrading

The recommended step for the upgrading:

  1. Switch VoltDB into the admin mode (voltadmin pause).
  2. Create a snapshot of the database (voltadmin save --blocking).
  3. Shutdown the database (voltadmin shutdown).
  4. Upgrade VoltDB.
  5. Create a new database and start it in the admin mode.
  6. Restore the snapshot, created at step 2 (voltadmin restore).
  7. Return VoltDB to a normal mode (voltadmin resume).

Database creation

To start working, you need to create a database and its tables. The create command is used for this:

$ voltdb create

If creation was successful, VoltDB shows the message:

Server completed initialization.

Tables processing

VoltDB support SQL queries. To start sending SQL queries, you need to start sqlcmd utility:

$ sqlcmd

This command starts a numbered prompt, letting you to input SQL queries. To finish work and return to shell prompt, it's needed to use exit command. Queries results are being output immidiately:

3> select town, state from towns ORDER BY town;
TOWN         STATE 
------------ ------
Bay View     OH    
Billerica    MA    
Buffalo      NY    

(3 row(s) affected)

Sqlcmd utility uses FILE directive, which lets not to input queries from the keyboard, but to read them from file.

1> FILE towns.sql;

Data loading

Manual data input is not the only way for database filling. At the time of a table creation, VoltDB automatically creates a stored procedure for data input. The command csvloader allows to read a file (CSV, comma-separated value, for instance) and to insert each entry as a new row into the database, using the INSERT query. For example:

$ csvloader --separator "|" --file towns.txt towns

Here, --separator parameter sets the sepatator for entries inside the file, --file parameter sets the file, from which the data should be loaded, and the next parameter defines a table, to which the data have to be inserted.

Stored procedures

Stored procedures are a simple and useful tool for working with the database.

Simple stored procedures

Simple procedures are being created in the sqlcmd utility prompt via the CREATE PROCEDURE AS command. These procedures are SQL queries which allow to specify some parameters at the time of invocation. Each procedure has a unique name, the post-defined data is being replaced with the question marks. For example:

$sqlcmd
1>CREATE PROCEDURE leastpopulated AS 
2>SELECT TOP 1 county, abbreviation, population
3>   FROM people, states WHERE people.state_num=?
4>   AND people.state_num=states.state_num
5>   ORDER BY population ASC;

Complex stored procedures

VoltDB also allows to create stored procedures on Java, which allows to create more complex queries and process the data. Read more at the official documentation[1].

Partitioning

Partitioning in VoltDB is organized via partitioning columns. At the time of the table creation, one column may be marked as a partitioning column. After that, VoltDB decides which partition the row goes into based on the value of the partitioning column. The row, that have the same value of the partitioning column, end up in the same partition. To mark a column as a partitioning one, the PARTITION TABLE ... ON COLUMN directive is used. For example:

$ sqlcmd
PARTITION TABLE towns ON COLUMN state_num;

Note, that non-partitioned tables become replicated by default. This means, that they are being copied to all the servers. That's useful and speeds up the work, if there are not many data in the table and it is particularly read-only. In other case it's better to make the table partitioned.

Advantages

  • High speed due to storing all the data in memory. If database is too big, a partitioning mechanism is being used. Transferring to disk is being done through the snapshot mechanism. Snapshot represents a data slice, up-to-date at the time of creation.
  • Easy scaling due to nodes independence. Simple adding of new servers into the cluster allows to raise the performance linearly.
  • Unlimited performance and scaling possibilities. Due to the nodes independance, there is no scaling limit: a new server can be added at any time.
  • Automatic replication of data inside the cluster, that provides high accessibility and rejects the need of journal.
  • Due to autonomous work of the sections, there is no need in such slow operations like blocking and operation fixation.

Notes

  1. VoltDB stored procedures

Authors

Roman Chistiakov, Bauman Moscow State Technical University, IU-8, E-mail: romchezz@gmail.com

References

Read in another language