# VoltDB

VoltDB is a relational new SQL in-mtmory DBMS, designed for real-time transaction processing (OLTP), that uses shared-nothing architecture. Because of storing all the data in memory, VoltDB provides very high performance. It also support ACID model. DBMS is developed by a group of database researchers, including M.Stonebraker, S.Madden and D.Abadi.

VoltDB is a high-performance business DBMS, that uses newSQL, in-memory (all the data is stored in memory, which provide extra-high speed of work) approaches and shared-nothing architecture (all the nodes work independently, which allows unlimited scaling). VoltDB supports SQL queries and guarantee transactional data integrity (ACID model). It uses stored Java procedures for the work. Copies of the procedures are being attached to each section. If a query affects more than one section, the procedure is awaken in each section and the resulting data is being aggregated afterwards. DBMS has two versions: enterprise and community, that is licensed under AGPLv3.

## Working principle

### Partitioning

VoltDB uses stored procedures. Each procedure is acting like an atomic operation, which allows to keep data integrity. By analizing the stored procedures logic VoltDB divides them to partitions and distribute between servers. Each part has a "slice" of data with the procedures for working with it. Such an organization allows a parallel data processing.

A call to a stored procedure is passed to a partition, which data it needs. If all the needed data present at a single partition, the server hadle a call by itself, letting other server work in parallel. Due to the single-treadness of the work with a single partition, data integrity is guarateed. This saves VoltDB from the need of overheads on blocking, operation fixation and transaction logging, while DB work remains multi-thread.

If an operation needs data from more than one partition, then one node acts like a coordinator. It sends requests to the others and process the data, recieved from them. All other nodes continue their work independently.

### Replication

VoltDB allows to increase the performance by duplicating all the read-only data to all the servers. This accelerates the work significanlty, because all the read queries, requiring these data, can be processed within a single thread.

## Field of use

VoltDB is oriented to a special segment of busyness computing. The main function of VoltDB is a fast processing of data, which is needed by applications requiring fast processing of big data flows. Such applications like financial applications, media, Internet of Things, or other developing field. VoltDB satisfies their main requirements: scalability, high accessibility and performance.

VoltDB doesn't fit all of the standard query types. It is not suitable for collecting and processing big amouts of data from multiple tables.

Nowadays VoltDB is being used in applications, that require high performance, ones like financial trading, streaming, recording, online gaming, fraud detection, sensory systems distribution etc.

## Work with VoltDB

### Installation

VoltDB is being distributed in enterprise and community versions. Enterprise version is provided as a pre-built distribution, and the community version comes as source code.

#### Installing

VoltDB is distributed as a compressed tar archive. To use it, you need to unpack it to any directory.

$tar -zxvf voltdb-ent-6.0.tar.gz -C$HOME/


It's recommended to use user's home directory, because he would get the full access to the application then.

If it's needed to install VoltDB into some standard system location, it can be done as follows:

$sudo tar -zxvf voltdb-ent-6.0.tar.gz -C /opt$ cd /opt
$sudo mv voltdb-ent-6.0 voltdb  Note, that in this case the application folders would be read-only for non-privileged accounts. #### Building from the source code VoltDB source code is stored at GitHub repository. After downloading the files it's needed to build a distribution out of them. It's recommended to use Ant of version 1.7 or newer. $ ant dist


As a result, a tar a archive is being built. These archive is similar to the one from the previous section, so will need to unpack it into the working folder.

The recommended step for the upgrading:

2. Create a snapshot of the database (voltadmin save --blocking).
3. Shutdown the database (voltadmin shutdown).
5. Create a new database and start it in the admin mode.
6. Restore the snapshot, created at step 2 (voltadmin restore).
7. Return VoltDB to a normal mode (voltadmin resume).

### Database creation

To start working, you need to create a database and its tables. The create command is used for this:

$voltdb create  If creation was successful, VoltDB shows the message: Server completed initialization.  ### Tables processing VoltDB support SQL queries. To start sending SQL queries, you need to start sqlcmd utility: $ sqlcmd


This command starts a numbered prompt, letting you to input SQL queries. To finish work and return to shell prompt, it's needed to use exit command. Queries results are being output immidiately:

3> select town, state from towns ORDER BY town;
TOWN         STATE
------------ ------
Bay View     OH
Billerica    MA
Buffalo      NY

(3 row(s) affected)


Sqlcmd utility uses FILE directive, which lets not to input queries from the keyboard, but to read them from file.

1> FILE towns.sql;


Manual data input is not the only way for database filling. At the time of a table creation, VoltDB automatically creates a stored procedure for data input. The command csvloader allows to read a file (CSV, comma-separated value, for instance) and to insert each entry as a new row into the database, using the INSERT query. For example:

$csvloader --separator "|" --file towns.txt towns  Here, --separator parameter sets the sepatator for entries inside the file, --file parameter sets the file, from which the data should be loaded, and the next parameter defines a table, to which the data have to be inserted. ### Stored procedures Stored procedures are a simple and useful tool for working with the database. #### Simple stored procedures Simple procedures are being created in the sqlcmd utility prompt via the CREATE PROCEDURE AS command. These procedures are SQL queries which allow to specify some parameters at the time of invocation. Each procedure has a unique name, the post-defined data is being replaced with the question marks. For example: $sqlcmd
1>CREATE PROCEDURE leastpopulated AS
2>SELECT TOP 1 county, abbreviation, population
3>   FROM people, states WHERE people.state_num=?
4>   AND people.state_num=states.state_num
5>   ORDER BY population ASC;


#### Complex stored procedures

VoltDB also allows to create stored procedures on Java, which allows to create more complex queries and process the data. Read more at the official documentation[1].

### Partitioning

Partitioning in VoltDB is organized via partitioning columns. At the time of the table creation, one column may be marked as a partitioning column. After that, VoltDB decides which partition the row goes into based on the value of the partitioning column. The row, that have the same value of the partitioning column, end up in the same partition. To mark a column as a partitioning one, the PARTITION TABLE ... ON COLUMN directive is used. For example:

\$ sqlcmd
PARTITION TABLE towns ON COLUMN state_num;


Note, that non-partitioned tables become replicated by default. This means, that they are being copied to all the servers. That's useful and speeds up the work, if there are not many data in the table and it is particularly read-only. In other case it's better to make the table partitioned.

• High speed due to storing all the data in memory. If database is too big, a partitioning mechanism is being used. Transferring to disk is being done through the snapshot mechanism. Snapshot represents a data slice, up-to-date at the time of creation.
• Easy scaling due to nodes independence. Simple adding of new servers into the cluster allows to raise the performance linearly.
• Unlimited performance and scaling possibilities. Due to the nodes independance, there is no scaling limit: a new server can be added at any time.
• Automatic replication of data inside the cluster, that provides high accessibility and rejects the need of journal.
• Due to autonomous work of the sections, there is no need in such slow operations like blocking and operation fixation.

## Authors

Roman Chistiakov, Bauman Moscow State Technical University, IU-8, E-mail: romchezz@gmail.com