Riak

From Bauman National Library
This page was last modified on 17 December 2016, at 17:40.
</td></tr>
Riak
Riaklogo.png
Developer(s) Basho Technologies [1]
Stable release
2.1.0 / February 11, 2015
Repository {{#property:P1324}}
Platform x64
Website http://basho.com/products/

Riak is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability.[2] In addition to the open-source version, it comes in a supported enterprise version and a cloud storage version.[2] Riak implements the principles from Amazon's Dynamo paper[3] with heavy influence from the CAP Theorem. Written in Erlang, Riak has fault tolerance data replication and automatic data distribution across the cluster for performance and resilience.[4]

Riak is licensed using a freemium model: open source versions of Riak and Riak CS are available, but end users can pay for additional features and support.[4]

Riak has a pluggable backend for its core storage, with the default storage backend being Bitcask.[5] LevelDB is also supported.

Main features

Fault-tolerant availability
Riak replicates key/value stores across a cluster of nodes with a default n_val of three. In the case of node outages due to network partition or hardware failures, data can still be written to a neighboring node beyond the initial three, and read-back due to its "masterless" peer-to-peer architecture.
Queries
Riak provides a REST-ful API through HTTP and Protocol Buffers for basic PUT, GET, POST, and DELETE functions. More complex queries are also possible, including secondary indexes, search (via Apache Solr), and MapReduce. MapReduce has native support for both JavaScript (using the SpiderMonkey runtime) and Erlang.
Predictable latency
Riak distributes data across nodes with hashing and can provide latency profile, even in the case of multiple node failures.
Storage options
Keys/values can be stored in memory, disk, or both.
Multi-datacenter replication
In multi-datacenter replication, one cluster acts as a "primary cluster." The primary cluster handles replication requests from one or more "secondary clusters" (generally located in other regions or countries). If the datacenter with the primary cluster goes down, a second cluster can take over as the primary cluster.
There are two primary modes of operation: fullsync and realtime. In fullsync mode, a complete synchronization occurs between primary and secondary cluster(s), by default every six hours. In real-time mode,replication to the secondary data center(s) is triggered by updates to the primary data center. All multi-datacenter replication occurs over multiple concurrent TCP connections to maximize performance and network utilization.
Note that multi-datacenter replication is not a part of open source Riak.
Tunable consistency
Option to choose between eventual and strong consistency for each bucket.

Architecture overview

A Riak cluster is a group of nodes that are in constant communication to ensure data availability and partition tolerance.

A Riak node is not quite the same as a server, but in a production environment the two should be equivalent. A developer may run multiple nodes on a single laptop, but this would never be advisable in a real production cluster.

Each node in a Riak cluster is equivalent, containing a complete, independent copy of the whole Riak package. There is no “master” node; no node has more responsibilities than others; and no node has special tasks not performed by other nodes. This uniformity provides the basis for Riak's fault tolerance and scalability.

Each node is responsible for multiple data partitions. When you add (or remove) machines, data is rebalanced automatically with no downtime. New machines claim data until ownership is equally spread around the cluster, with the resulting cluster status updates shared to every node via a gossip protocol and used to route requests. This is what makes it possible for any node in the cluster to receive requests. The end result is that developers don't need to deal with the underlying complexity of where data lives.

Concepts

Buckets
Buckets are used to define a virtual keyspace for storing Riak objects. They enable you to define non-default configurations over that keyspace concerning replication properties and other parameters.
In certain respects, buckets can be compared to tables in relational databases or folders in filesystems, respectively. From the standpoint of performance, buckets with default configurations are essentially “free,” while non-default configurations, defined using bucket types, will be gossiped around the ring using Riak's cluster metadata subsystem.
Clusters
Riak's default mode of operation is to work as a cluster consisting of multiple nodes, i.e. multiple well-connected data hosts.
Each host in the cluster runs a single instance of Riak, referred to as a Riak node. Each Riak node manages a set of virtual nodes, or vnodes, that are responsible for storing a separate portion of the keys stored in the cluster.
In contrast to some high-availability systems, Riak nodes are not clones of one another, and they do not all participate in fulfilling every request. Instead, you can configure, at runtime or at request time, the number of nodes on which data is to be replicated, as well as when replication occurs and which merge strategy and failure model are to be followed.
Replication
Data replication is a core feature of Riak's basic architecture. Riak was designed to operate as a clustered system containing multiple Riak nodes, which allows data to live on multiple machines at once in case a node in the cluster goes down.
Replication is fundamental and automatic in Riak, providing security that your data will still be there if a node in your Riak cluster goes down. All data stored in Riak will be replicated to a number of nodes in the cluster according to the N value (n_val) property set in a bucket's bucket type.

When to use Riak

If your data does not fit on a single server and demands a distributed database architecture, you should take a close look at Riak as a potential solution to your data availability issues. Getting distributed databases right is very difficult, and Riak was built to address the problem of data availability with as few trade-offs and downsides as possible.

Riak's focus on availability makes it a good fit whenever downtime is unacceptable. No one can promise 100% uptime, but Riak is designed to survive network partitions and hardware failures that would significantly disrupt most databases. An exception to Riak's high availability approach is the optional strong consistency feature, which can be applied on a selective basis.

A less-heralded feature of Riak is its predictable latency. Because its fundamental operations—read, write, and delete—do not involve complex data joins or locks, it services those requests promptly. Thanks to this capability, Riak is often selected as a data storage backend for data management software from a variety of paradigms, such as Datomic.

From the standpoint of the actual content of your data, Riak might also be a good choice if your data can be modeled as one of Riak's currently available Data Types: flags, registers, counters, sets, or maps. These Data Types enable you to take advantage of Riak's high availability approach while simplifying application development.

Licensing and support

Riak is available for free under the Apache 2 License. In addition, Basho Technologies offers two options for its commercial software, Riak Enterprise and Riak Enterprise Plus. Riak Enterprise Plus adds baseline and annual system health checks to ensure long-term platform stability and performance.

Language support

Riak has official drivers for Ruby, Java, Erlang and Python. There are also numerous community-supported drivers for other programming languages.

History

Riak was originally written by Andy Gross and others at Basho Technologies to power a web Sales Force Automation application by former engineers and executives from Akamai Technologies. There was more interest in the datastore technology than the applications built on it, so the company decided to build a business around Riak itself, gaining adoption throughout the Fortune 100 and becoming a foundation to many of the world’s fastest-growing Web-based, mobile and social networking applications, as well as cloud service providers. Releases after graduation include

  • 1.1, released Feb 21 2012, added Riaknostic, enhanced error logging and reporting, improved resiliency for large clusters, and a new graphical operations and monitoring interface called Riak Control.
  • 1.4, released July 10, 2013, added counters, secondary indexing improvements, reduced object storage overhead, handoff progress reporting, and enhancements to MDC replication.
  • 2.0, released September 2, 2014, added new data types including sets, maps, registers, and flags simplifying application development. Strong consistency by bucket, full-text integration with Apache Solr, Security, and reduced replicas for Secondary sites.
  • 2.1, released April 16, 2015, added an optimization for many write-heavy workloads – “write once” buckets – buckets whose entries are intended to be written exactly once, and never updated or over-written.

Users

Notable users include AT&T, Comcast, GitHub, Best Buy, National Health Service, and The Weather Channel.

Installation

curl -s https://packagecloud.io/install/repositories/basho/riak/script.deb.sh | sudo bash
sudo apt-get install riak

On Debian curl and sudo (for user-mode installation) is required.

References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />

External links

  • http://basho.com
  • 2.0 2.1 [1]
  • [2]
  • 4.0 4.1 [3]
  • [4]