Tarantool

From Bauman National Library
This page was last modified on 10 May 2016, at 15:20.
Tarantool
Initial release 2008
Repository {{#property:P1324}}
Written in C
Operating system Linux, FreeBSD, Mac OS X
Platform x64
Available in English
Type NoSQL database
License Simplified BSD
Website http://tarantool.org

Tarantool is a Lua application server integrated with a database management system[1]. It has a "fiber" model which means that many Tarantool applications can run simultaneously on a single thread, while the Tarantool server itself can run multiple threads for input-output and background maintenance. It incorporates the LuaJIT -- "Just In Time" -- Lua compiler, Lua libraries for most common applications, and the Tarantool Database Server which is an established NoSQL DBMS. Thus Tarantool serves all the purposes that have made node.js and Twisted popular, plus it supports data persistence.

Overview

An Application Server Together With A Database Manager

Tarantool's creator and biggest user is Mail.Ru, the largest internet company in Russia, with 30 million users, 25 million emails per day, and a web site whose Alexa global rank is in the top 40 worldwide. Tarantool services Mail.Ru's hottest data, such as the session data of online users, the properties of online applications, the caches of the underlying data, the distribution and sharding algorithms, and much more. Outside Mail.Ru the software is used by a growing number of projects in online gaming, digital marketing, and social media industries. Although Mail.Ru is the sponsor for product development, the roadmap and the bugs database and the development process are fully open. The software incorporates patches from dozens of community contributors. The Tarantool community writes and maintains most of the drivers for programming languages. The greater Lua community has hundreds of useful packages most of which can become Tarantool extensions.

Users can create, modify and drop Lua functions at runtime. Or they can define Lua programs that are loaded during startup for triggers, background tasks, and interacting with networked peers. Unlike popular application development frameworks based on a "reactor" pattern, networking in server-side Lua is sequential, yet very efficient, as it is built on top of the cooperative multitasking environment that Tarantool itself uses.

One of the built-in Lua packages is an API for the Database Management System. Thus some developers see Tarantool as a DBMS with a popular stored procedure language, while others see it as a Lua interpreter, while still others see it as a replacement for many components of multi-tier Web applications. Performance can be a few hundred thousand transactions per second on a laptop, scalable upwards or outwards to server farms.

Database Features

Tarantool can run without it, but "The Box" -- the DBMS server -- is a strong distinguishing feature.

The database API allows for permanently storing Lua objects, managing object collections, creating or dropping secondary keys, making changes atomically, configuring and monitoring replication, performing controlled fail-over, and executing Lua code triggered by database events. Remote database instances are accessible transparently via a remote-procedure-invocation API.

Tarantool's DBMS server uses the storage engine concept, where different sets of algorithms and data structures can be used for different situations. Two storage engines are built-in: an in-memory engine which has all the data and indexes in RAM, and a two-level B-tree engine for data sets whose size is 10 to 1000 times the amount of available RAM. All storage engines in Tarantool support transactions and replication by using a common write ahead log (WAL). This ensures consistency and crash safety of the persistent state. Changes are not considered complete until the WAL is written. The logging subsystem supports group commit.

Tarantool's in-memory storage engine (memtx) keeps all the data in random-access memory, and therefore has very low read latency. It also keeps persistent copies of the data in non-volatile storage, such as disk, when users request "snapshots". If a server stops and the random-access memory is lost, then restarts, it reads the latest snapshot and then replays the transactions that are in the log -- therefore no data is lost.

Tarantool's in-memory engine is lock-free in typical situations. Instead of the operating system's concurrency primitives, such as mutexes, Tarantool uses cooperative multitasking to handle thousands of connections simultaneously. There is a fixed number of independent execution threads. The threads do not share state. Instead they exchange data using low-overhead message queues. While this approach limits the number of cores that the server will use, it removes competition for the memory bus and ensures peak scalability of memory access and network throughput. CPU utilization of a typical highly-loaded Tarantool server is under 10%. Searches are possible via secondary index keys as well as primary keys.

Tarantool's disk-based storage engine is a fusion of ideas from modern filesystems, log-structured merge trees and classical B-trees. All data is organized into branches. Each branch is represented by a file on disk. Branch size is a configuration option and normally is around 64MB. Each branch is a collection of pages, serving different purposes. Pages in a fully merged branch contain non-overlapping ranges of keys. A branch can be partially merged if there were a lot of changes in its key range recently. In that case some pages represent new keys and values in the branch. The disk-based storage engine is append only: new data never overwrites old data. Tarantool supports multi-part index keys. The possible index types are HASH, TREE, BITSET, and RTREE.

Tarantool supports asynchronous replication, locally or to remote hosts. The replication architecture can be master-master, that is, many nodes may both handle the loads and receive what others have handled, for the same data sets.


Getting started

Downloading and installing a binary package

Binary packages for the stable 1.6.x release are provided at http://tarantool.org/download.html. An automatic build system creates, tests and publishes packages for every push into the 1.6 branch.

To download and install the package that's appropriate for your OS, start a shell (terminal) and enter the command-line instructions provided for your OS at http://tarantool.org/download.html.

Starting Tarantool and making your first database

Here is how to create a simple test database after installing.

Create a new directory. It's just for tests, you can delete it when the tests are over.

$ mkdir ~/tarantool_sandbox
$ cd ~/tarantool_sandbox

Here is how to create a simple test database after installing.

Start the server. The server name is tarantool.

$ # if you downloaded a binary with apt-get or yum, say this:
$ /usr/bin/tarantool
$ # if you downloaded and untarred a binary
$ # tarball to ~/tarantool, say this:
$ ~/tarantool/bin/tarantool
$ # if you built from a source download, say this:
$ ~/tarantool/src/tarantool

The server starts in interactive mode and outputs a command prompt. To turn on the database, configure it. This minimal example is sufficient:

tarantool> box.cfg{listen = 3301}

If all goes well, you will see the server displaying progress as it initializes, something like this:

tarantool> box.cfg{listen = 3301}
2015-08-07 09:41:41.077 ... version 1.6.7-439-g7e1011b
2015-08-07 09:41:41.077 ... log level 5
2015-08-07 09:41:41.078 ... mapping 1073741824 bytes for a shared arena...
2015-08-07 09:41:41.079 ... initialized
2015-08-07 09:41:41.081 ... initializing an empty data directory
2015-08-07 09:41:41.095 ... creating './00000000000000000000.snap.inprogress'
2015-08-07 09:41:41.095 ... saving snapshot './00000000000000000000.snap.inprogress'
2015-08-07 09:41:41.127 ... done
2015-08-07 09:41:41.128 ... primary: bound to 0.0.0.0:3301
2015-08-07 09:41:41.128 ... ready to accept requests

Now that the server is up, you could start up a different shell and connect to its primary port with:

$ telnet 0 3301

but for example purposes it is simpler to just leave the server running in "interactive mode". On production machines the interactive mode is just for administrators, but because it's convenient for learning it will be used for most examples in this manual. Tarantool is waiting for the user to type instructions.

To create the first space and the first index, try this:

tarantool> s = box.schema.space.create('tester')
tarantool> s:create_index('primary', {
         >   type = 'hash',
         >   parts = {1, 'NUM'}
         > })

To insert three “tuples” (our name for “records”) into the first “space” of the database try this:

tarantool> t = s:insert({1})
tarantool> t = s:insert({2, 'Music'})
tarantool> t = s:insert({3, 'Length', 93})

To select a tuple from the first space of the database, using the first defined key, try this:

tarantool> s:select{3}

Your terminal screen should now look like this:

tarantool> s = box.schema.space.create('tester')
2015-06-10 12:04:18.158 ... creating './00000000000000000000.xlog.inprogress'
---
...
tarantool>s:create_index('primary', {type = 'hash', parts = {1, 'NUM'}})
---
...
tarantool> t = s:insert{1}
---
...
tarantool> t = s:insert{2, 'Music'}
---
...
tarantool> t = s:insert{3, 'Length', 93}
---
...
tarantool> s:select{3}
---
- - [3, 'Length', 93]
...
tarantool>

Now, to prepare for the example in the next section, try this:

tarantool> box.schema.user.grant('guest', 'read,write,execute', 'universe')

Starting another Tarantool instance and connecting remotely

In the previous section the first request was with box.cfg{listen = 3301}. The listen value can be any form of URI (uniform resource identifier); in this case it's just a local port: port 3301. It's possible to send requests to the listen URI via (a) telnet, (b) a connector (which will be the subject of Chapter 8), or (c) another instance of Tarantool. Let's try (c).

Switch to another terminal. On Linux, for example, this means starting another instance of a Bash shell. There is no need to use cd to switch to the ~/tarantool_sandbox directory.

Start the second instance of Tarantool. The server name is tarantool.

$ # if you downloaded a binary with apt-get or yum, say this:
$ /usr/bin/tarantool
$ # if you downloaded and untarred a
$ # binary tarball to ~/tarantool, say this:
$ ~/tarantool/bin/tarantool
$ # if you built from a source download, say this:
$ ~/tarantool/src/tarantool

Try these requests:

tarantool> console = require('console')
tarantool> console.connect('localhost:3301')
tarantool> box.space.tester:select{2}

The requests are saying "use the console package to connect to the Tarantool server that's listening on localhost:3301, send a request to that server, and display the result." The result in this case is one of the tuples that was inserted earlier. Your terminal screen should now look like this:

tarantool> console = require('console')
---
...
tarantool> console.connect('localhost:3301')
tarantool: connected to localhost:3301
---
- true
...
localhost:3301> box.space.tester:select{2}
---
- - [2, 'Music']
...
localhost:3301>

You can repeat box.space...:insert{} and box.space...:select{} indefinitely, on either Tarantool instance. When the testing is over: To drop the space: s:drop(). To stop tarantool: Ctrl+C. To stop tarantool (an alternative): os.exit(). To stop tarantool (from another terminal): sudo pkill -f tarantool. To destroy the test: rm -r ~/tarantool_sandbox.

To review ... If you followed all the instructions in this chapter, then so far you have: installed Tarantool from either a binary or a source repository, started up the Tarantool server, inserted and selected tuples.

Database

Document data model

If you tried out the Starting Tarantool and making your first database exercise from the last chapter, then your database looks like this:

+--------------------------------------------+
|                                            |
| SPACE 'tester'                             |
| +----------------------------------------+ |
| |                                        | |
| | TUPLE SET 'tester'                     | |
| | +-----------------------------------+  | |
| | | Tuple: [ 1 ]                      |  | |
| | | Tuple: [ 2, 'Music' ]             |  | |
| | | Tuple: [ 3, 'length', 93 ]        |  | |
| | +-----------------------------------+  | |
| |                                        | |
| | INDEX 'primary'                        | |
| | +-----------------------------------+  | |
| | | Key: 1                            |  | |
| | | Key: 2                            |  | |
| | | Key: 3                            |  | |
| | +-----------------------------------+  | |
| |                                        | |
| +----------------------------------------+ |
+--------------------------------------------+

Space

A space -- 'tester' in the example -- is a container.

When Tarantool is being used to store data, there is always at least one space. There can be many spaces. Each space has a unique name specified by the user. Each space has a unique numeric identifier which can be specified by the user but usually is assigned automatically by Tarantool. Spaces always contain one tuple set and one or more indexes.

Tuple Set

A tuple set -- 'tester' in the example -- is a group of tuples.

There is always one tuple set in a space. The identifier of a tuple set is the same as the space name -- 'tester' in the example.

A tuple fills the same role as a “row” or a “record”, and the components of a tuple (which we call “fields”) fill the same role as a “row column” or “record field”, except that: the fields of a tuple can be composite structures, such as arrays or maps and don't need to have names. That's why there was no need to pre-define the tuple set when creating the space, and that's why each tuple can have a different number of elements. Tuples are stored as MsgPack arrays.

Any given tuple may have any number of fields and the fields may have a variety of types. The identifier of a field is the field's number, base 1. For example “1” can be used in some contexts to refer to the first field of a tuple. When Tarantool returns a tuple value, it surrounds strings with single quotes, separates fields with commas, and encloses the tuple inside square brackets. For example: [ 3, 'length', 93 ].

Index

An index -- 'primary' in the example -- is a group of key values and pointers.

In order for a tuple set to be useful, there must always be at least one index in a space. There can be many. As with spaces, the user can and should specify the index name, and let Tarantool come up with a unique numeric identifier (the "index id"). In our example there is one index and its name is “primary”.

An index may be multi-part, that is, the user can declare that an index key value is taken from two or more fields in the tuple, in any order. An index may be unique, that is, the user can declare that it would be illegal to have the same key value twice. An index may have one of four types: HASH which is fastest and uses the least memory but must be unique, TREE which allows partial-key searching and ordered results, BITSET which can be good for searches that contain '=' and multiple ANDed conditions, and RTREE for spatial coordinates. The first index is called the “primary key” index and it must be unique; all other indexes are called “secondary” indexes.

An index definition may include identifiers of tuple fields and their expected types. The allowed types for indexed fields are NUM (unsigned integer between 0 and 18,446,744,073,709,551,615), or STR (string, any sequence of octets), or ARRAY (a series of numbers for use with RTREE indexes. Take our example, which has the request:

tarantool> i = s:create_index('primary', {type = 'hash', parts = {1, 'NUM'}})

The effect is that, for all tuples in tester, field number 1 must exist and must contain an unsigned integer.

Space definitions and index definitions are stored permanently in system spaces. It is possible to add, drop, or alter the definitions at runtime, with some restrictions. The syntax details for defining spaces and indexes are in section The box library.

Data types

Tarantool can work with numbers, strings, booleans, tables, and userdata.

General type Specific type What Lua type() would return Example
scalar number "number" 12345
scalar string "string" 'A B C'
scalar boolean "boolean" true
scalar nil "nil" nil
compound Lua table "table" table: 0x410f8b10
compound tuple "Userdata" 12345: {'A B C'}

In Lua a number is double-precision floating-point, but Tarantool allows both integer and floating-point values. Tarantool will try to store a number as floating-point if the value contains a decimal point or is very large (greater than 100 quadrillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers will be treated as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix, and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

For database storage Tarantool uses MsgPack rules. Storage is variable-length, so the smallest number requires only one byte but the largest number requires nine bytes. When a field has a 'NUM' index, all values must be unsigned integers between 0 and 18,446,744,073,709,551,615.

A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes.

A boolean is either true or false.

A nil type has only one possible value, also called nil, but often displayed as null. Nils may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other operations will not work. Nils may not be used in Lua tables; the workaround is to use yaml.NULL or json.NULL or msgpack.NULL.

A tuple is returned in YAML format like - [120, 'a', 'b', 'c']. A few functions may return tables with multiple tuples. A scalar may be converted to a tuple with only one field. A Lua table may contain all of a tuple's fields, but not nil.

For more tuple examples see box.tuple.

Operations

The basic operations are: the five data-change operations (insert, update, upsert, delete, replace), and the data-retrieval operation (select). There are also minor operations like “ping” which can only be used with the binary protocol. Also, there are index iterator operations, which can only be used with Lua code. (Index iterators are for traversing indexes one key at a time, taking advantage of features that are specific to an index type, for example evaluating Boolean expressions when traversing BITSET indexes, or going in descending order when traversing TREE indexes.)

Six examples of basic operations:

-- Add a new tuple to tuple set tester.
-- The first field, field[1], will be 999 (type is NUM).
-- The second field, field[2], will be 'Taranto' (type is STR).
tarantool> box.space.tester:insert{999, 'Taranto'}

-- Update the tuple, changing field field[2].
-- The clause "{999}", which has the value to look up in
-- the index of the tuple's primary-key field, is mandatory
-- because update() requests must always have a clause that
-- specifies the primary key, which in this case is field[1].
-- The clause "{{'=', 2, 'Tarantino'}}" specifies that assignment
-- will happen to field[2] with the new value.
tarantool> box.space.tester:update({999}, {{'=', 2, 'Tarantino'}})

-- Upsert the tuple, changing field field[2] again.
-- The syntax of upsert is the same as the syntax of update,
-- but the return value will be different.
tarantool> box.space.tester:upsert({999}, {{'=', 2, 'Tarantism'}})

-- Replace the tuple, adding a new field.
-- This is also possible with the update() request but
-- the update() request is usually more complicated.
tarantool> box.space.tester:replace{999, 'Tarantella', 'Tarantula'}

-- Retrieve the tuple.
-- The clause "{999}" is still mandatory, although it does not have to
-- mention the primary key.
tarantool> box.space.tester:select{999}

-- Delete the tuple.
-- Once again the clause to identify the primary-key field is mandatory.
tarantool> box.space.tester:delete{999}

How does Tarantool do a basic operation? Let's take this example:

tarantool> box.space.tester:update({3}, {{'=', 2, 'size'}, {'=', 3, 0}})

which, for those who know SQL, is equivalent to a statement like

UPDATE tester SET "field[2]" = 'size', "field[3]" = 0 WHERE "field[[1]" = 3

STEP #1: if this is happening on a remote client, then the client parses the statement and changes it to a binary-protocol instruction which has already been checked, and which the server can understand without needing to parse everything again. The client ships a packet to the server.

STEP #2: the server's “transaction processor” thread uses the primary-key index on field[1] to find the location of the tuple in memory. It determines that the tuple can be updated (not much can go wrong when you're merely changing an unindexed field value to something shorter).

STEP #3: the transaction processor thread sends a message to the write-ahead logging (WAL) thread.

At this point a yield takes place. To know the significance of that -- and it's quite significant -- you have to know a few facts and a few new words.

FACT #1: there is only one transaction processor thread. Some people are used to the idea that there can be multiple threads operating on the database, with (say) thread #1 reading row #x while thread#2 writes row#y. With Tarantool no such thing ever happens. Only the transaction processor thread can access the database, and there is only one transaction processor thread for each instance of the server.

FACT #2: the transaction processor thread can handle many fibers. A fiber is a set of computer instructions that may contain "yield" signals. The transaction processor thread will execute all computer instructions until a yield, then switch to execute the instructions of a different fiber. Thus (say) the thread reads row#x for the sake of fiber#1, then writes row#y for the sake of fiber#2.

FACT #3: yields must happen, otherwise the transaction processor thread would stick permanently on the same fiber. There are implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the tarantool client causes an implicit yield. And there are explicit yields: in a Lua function one can and should add “yield” statements to prevent hogging. This is called cooperative multitasking.

Since all data-change operations end with an implicit yield and an implicit commit, and since no data-change operation can change more than one tuple, there is no need for any locking. Consider, for example, a Lua function that does three Tarantool operations:

s:select{999}             -- this does not yield and does not commit
s:update({...},{{...}})   -- this yields and commits
s:select{999}             -- this does not yield and does not commit

Data manipulation

The basic data-manipulation requests are: insert, replace, update, upsert, delete, select. All of them are part of the box library. Most of them may return data. Usually both inputs and outputs are Lua tables.

The Lua syntax for data-manipulation functions can vary. Here are examples of the variations with select requests; the same rules exist for the other data-manipulation functions. Every one of the examples does the same thing: select a tuple set from a space named tester where the primary-key field value equals 1. For the examples there is an assumption that the numeric id of 'tester' is 512, which happens to be the case in our sandbox example only.

First, there are five object reference variations:

-- #1 package . sub-package . name
tarantool> box.space.tester:select{1}
-- #2 replace name with a literal in square brackets
tarantool> box.space['tester']:select{1}
-- #3 replace name with a numeric id in square brackets
tarantool> box.space[512]:select{1}
-- #4 use a variable instead of a literal for the name
tarantool> variable = 'tester'
tarantool> box.space[variable]:select{1}
-- #5 use a variable for the entire object reference
tarantool> s = box.space.tester
tarantool> s:select{1}

Later examples in this manual will usually have the "box.space.tester:" form (#1); however, this is a matter of user preference and all the variations exist in the wild.

Later descriptions in this manual will use the syntax "space_object:" for references to objects which are spaces as in the above examples, and "index_object:" for references to objects which are indexes (for example box.space.tester.index.primary:).

Then, there are six parameter variations:

-- #1
tarantool> box.space.tester:select{1}
-- #2
tarantool> box.space.tester:select({1})
-- #3
tarantool> box.space.tester:select(1)
-- #4
tarantool> box.space.tester:select({1},{iterator='EQ'})
-- #5
tarantool> variable = 1
tarantool> box.space.tester:select{variable}
-- #6
tarantool> variable = {1}
tarantool> box.space.tester:select(variable)

The primary-key value is enclosed in braces, and if it was a multi-part primary key then the value would be multi-part, for example ...select{1,2,3}. The braces can be enclosed inside parentheses — ...select({...}) — which is optional unless it is necessary to pass something besides the primary-key value, as in the fourth example. Literal values such as 1 (a scalar value) or {1} (a Lua table value) may be replaced by variable names, as in examples #5 and #6. Although there are special cases where braces can be omitted, they are preferable because they signal "Lua table". Examples and descriptions in this manual have the "{1}" form; however, this too is a matter of user preference and all the variations exist in the wild.

All the data-manipulation functions operate on tuple sets but, since primary keys are unique, the number of tuples in the tuple set is always 0 or 1. The only exception is box.space...select, which may accept either a primary-key value or a secondary-key value.

Index operations

Index operations are automatic: if a data-manipulation request changes a tuple, then it also changes the index keys defined for the tuple. Therefore the user only needs to know how and why to define.

The simple index-creation operation which has been illustrated before is: box.space.space-name:create_index('index-name')

By default, this creates a unique "tree" index on the first field of all tuples (often called "Field#1"), which is assumed to be numeric.

These variations exist:

  • A indexed field may be a string rather than a number: box.space.space-name:create_index('index-name',{parts = {1, 'STR'}}) For an ordinary index, the two possible data types are 'NUM' = numeric = any positive integer, or 'STR' ='string' = any series of bytes. Numbers are ordered according to their point on the number line -- so 2345 is greater than 500 -- while strings are ordered according to the encoding of the first byte then the encoding of the second byte then the encoding of the third byte and so on -- so '2345' is less than '500'.
  • There may be more than one field: box.space.space-name:create_index('index-name', {parts = {3, 'NUM', 2, 'STR'}}). For an ordinary index, the maximum number of parts is 255. The specification of each part consists of a field number and a type.
  • The index does not have to be unique: box.space.space-name:create_index('index-name', {unique = false}) . The first index of a tuple set must be unique, but other indexes ("secondary" indexes) may be non-unique.
  • The index does not have to be a tree: box.space.space-name:create_index('index-name', {type = 'hash'}) . The two ordinary index types are 'tree' which is the default, and 'hash' which must be unique and which may be faster or smaller. The third type is 'bitset' which is not unique and which works best for combinations of binary values. The fourth type is 'rtree' which is not unique and which, instead of 'STR' or 'NUM' values, works with arrays.

The existence of indexes does not affect the syntax of data-change requests, but does cause select requests to have more variety.

The simple select request which has been illustrated before is: box.space.space-name:select(value) . By default, this looks for a single tuple via the first index. Since the first index is always unique, the maximum number of returned tuples will be: one.

Packages of the box library

The contents of the box library can be inspected at runtime with box, with no arguments. The packages inside the box library are: box.schema, box.tuple, box.space, box.index, net.box, box.cfg, box.info, box.slab, box.stat. Every package contains one or more Lua functions. A few packages contain members as well as functions. The functions allow data definition (create alter drop), data manipulation (insert delete update upsert select replace), and introspection (inspecting contents of spaces, accessing server configuration).

Index size The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys then the lookup time will be greater, although of course the effect is not linear. For a HASH index, if there are more keys then there is more RAM use, but the number of low-level steps tends to remain constant.
Index type Typically a HASH index is faster than a TREE index if the number of tuples in the tuple set is greater than one.
Number of indexes accessed Ordinarily only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the tuple set has N different indexes.
Number of tuples accessed A few requests, for example select, can retrieve multiple tuples. This factor is usually less important than the others.
WAL settings The important setting for the write-ahead log is wal_mode. If the setting causes no writing or delayed writing, this factor is unimportant. If the setting causes every data-change request to wait for writing to finish on a slow device, this factor is more important than all the others.

References

  1. http://tarantool.org/doc/