MemSQL

From Bauman National Library
This page was last modified on 29 November 2016, at 17:02.
MemSQL
fraimed
Developer(s) MemSQL, Inc.
Initial release 18 June 2012
Repository {{#property:P1324}}
Written in C++
Operating system Linux
Platform x64
Available in English
Type RDBMS
License Proprietary software
Website http://www.memsql.com

MemSQL is a distributed, relational real-time database for simultaneous transactions and analytics at scale with an in-memory, distributed, relational architecture[1]. Querying is done through standard SQL drivers and syntax, leveraging a broad ecosystem of drivers and applications.[2] MemSQL enables high volume, high velocity Big Data processing so organizations can extract more value, more quickly from their data. MemSQL opens the doors for real-time operational analytics by enabling concurrent transactional and analytical workloads in a single database on commodity hardware, deployed in your data center or in the cloud. In 2016 MemSQL raises $36M Series C round for its in-memory database platform. [3]

Technology

MemSQL is built with technology specifically designed for a distributed in-memory architecture. In terms of performance, these features set MemSQL apart from other in-memory offerings.[4]

Code Generation and Compiled Query Plans

Figure 1

With disk I/O bottlenecks removed from an in-memory system, queries execute so quickly that dynamic SQL interpretation can impact peak performance. MemSQL addresses this by interpreting SQL statements and into a compiled query execution plan.

With each new query, MemSQL automatically removes the parameters and generates an execution plan, written in C++, which is compiled to machine code. MemSQL stores compiled query plans in a repository called the plan cache. When future queries match an existing parameterized query plan template, MemSQL bypasses code generation and executes the query immediately using the cached plan. Executing a compiled query plan is much faster than interpreting SQL thanks to low level optimizations and the inherent performance advantage of executing compiled versus interpreted code.

Compiled query plans provide performance advantages during mixed read and write workloads. Some companies use a caching layer on top of their RDBMS, usually a key-value store with SQL statements mapped to query results. This strategy may improve performance for queries on immutable datasets, but this approach runs into problems with frequently updated data. When the dataset changes, the cache must be repopulated with updated query results, a process ratelimited by the underlying database. In addition to the performance degradation, synchronizing the state across multiple data stores in invariably a difficult engineering problem. Query planning with MemSQL provides an advantage by executing a query on the in-memory database directly, rather than fetching cached results. This helps MemSQL maintain remarkable query performance even with frequently changing data.

Lock-free Data Structures and Multiversion Concurrency Control

MemSQL achieves high throughput using lock-free data structures and multiversion concurrency control (MVCC), which allows the database to avoid locking on both reads and writes. Traditional databases manage concurrency with locks, which results in some processes blocking others until they complete and release the lock. In MemSQL, writes never block reads and vice versa.

MemSQL uses a lock-free skiplist as the primary index-backing data structure. Skiplists deliver concurrency and performance benefits. Lock free skiplists are an efficient technique for searching and manipulating data. This is in marked contrast to databases that use B-Trees to store indexes for disk-based databases.

Replication

MemSQL supports a native replication protocol that ships its transactional log to slaves. MemSQL currently supports master-slave replication.

MemSQL replication allows database replication between MemSQL instances, executed at the partition level. It is simple, robust, and fast. This topic describes how to use replication in MemSQL.[5]

MemSQL replication is fully online. In the middle of a continuous write workload, you can start replication to a secondary (replica) cluster without pausing the primary (source) cluster. Replication then creates a read-only database replica that can be used for disaster recovery or to serve additional reads.

Replication across clusters, which includes cross datacenter replication, only supports asynchronous mode. In asynchronous mode, writes on the primary cluster will never wait to be replicated to the secondary cluster. Furthermore, secondary cluster failures will never block the master.

Replication within a cluster, used for maintaining MemSQL high availability, can be either asynchronous or synchronous. In synchronous mode, writes to any master partition will be replicated to its corresponding slave partition first before an acknowledgement is returned to the user.

Distributed architecture

MemSQL is a distributed database that works by the concept of aggregators and leaf nodes[6]. An aggregator is responsible for breaking up the query across the relevant leaf nodes and aggregating results back to the client. A leaf node is a MemSQL database. MemSQL uses hash partitioning to distribute data uniformly across the number of leaf nodes[7]. MemSQL made the distributed version of its system generally available on April 23, 2013[8] with a trial edition available for download on their website[9].

Version history

  • MemSQL 1b - first general availability in June, 2012.
  • MemSQL 1c - minor feature update, released July 2012.
  • MemSQL 1.8 - replication and expanded SQL surface area, released December 2012[10]
  • MemSQL 2.0 - general availability of distributed system[11]. First release of MemSQL Watch operational dashboard[12].
  • MemSQL 2.5 - JSON Data type
  • MemSQL 3.0 - Columnar data store
  • MemSQL 3.1 - Views, Cross-Datacenter replication[13]
  • MemSQL 3.2 - Improvements to column store engine[14]
  • MemSQL 4.0 - Geospatial support, distributed joins[15]
  • MemSQL 4.1 - Integration with Spark, CTEs[16]
  • MemSQL 5.0 - New MemSQL code generation architecture, window functions, released March 2016[17]

 Getting started

System requirements

  • 64-bit Linux
  • Ideal for machines with multi-core processors and 8 GB of RAM.
  • The minimum version of OS: Amazon AMI (2012.03), CentOS (6.0), Debian (6.0), Fedora (15), OpenSUSE (11.1), Red Hat (6.1), Ubuntu (10.04)

Installing

After entering the name and email address on the download page, the user receives a license.[18].

Also if you want to connect to MemSQL, you need to have a SQL client installed. For example: MySQL client. To handle this:

$ sudo apt-get install mysql-client

You can use the instructions to download and install the distribution:

$ wget download.memsql.com/3678fc4a65244b83a200911ef0e936f4/memsqlbin_amd64.tar.gz
$ tar -xzf memsqlbin_amd64.tar.gz
$ cd memsqlbin
$ chmod +x check_system
$ ./check_system
$ ./memsqld --port 3307

~ 120 MB file size.

By default MemSQL starts on port 3306 (like MySQL). In example, we set port to 3307 to avoid possible conflicts with the possible running MySQL.

Working with the database

To work with MemSQL, you can use mysql:

$ mysql -u root -h 127.0.0.1 -P 3307 --prompt="memsql> "

To complete the work:

$ killall memsqld

References

  1. http://www.memsql.com/content/architecture/
  2. http://docs.memsql.com/4.1/
  3. http://techcrunch.com/2016/04/21/memsql-raises-36m-series-c-round-for-its-in-memory-database-platform/
  4. http://www.memsql.com/content/architecture/
  5. http://docs.memsql.com/4.1/admin/replication/
  6. http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-and.html
  7. http://www.dbms2.com/2012/06/18/introduction-to-memsql/
  8. http://www.dbms2.com/2013/04/23/memsql-scales-out/
  9. http://www.memsql.com/download
  10. http://docs.memsql.com/4.1/archive/1.8releasenotes/
  11. http://www.dbms2.com/2013/04/23/memsql-scales-out/
  12. http://docs.memsql.com/4.1/archive/2.0releasenotes/
  13. http://docs.memsql.com/4.1/release_notes/3.1releasenotes/
  14. http://docs.memsql.com/4.1/release_notes/3.2releasenotes/
  15. http://docs.memsql.com/4.1/release_notes/4.0releasenotes/
  16. http://docs.memsql.com/4.1/release_notes/4.1releasenotes/
  17. http://docs.memsql.com/v5.0/docs/release-notes
  18. https://habrahabr.ru/post/146023/