NoSQL

From Bauman National Library
This page was last modified on 28 June 2016, at 09:40.

NoSQL - a term denoting a number of approaches to the implementation of the database, with significant differences from the models used in the traditional relational database which access data using SQL. It applies to databases in which an attempt is made to solve the scalability and availability problem at the expense of atomicity and consistency of data. The term NoSQL hides a large number of products with a completely different design and sometimes, when discussing, people may talk about different systems.

History

This term has been used since the late '90s, but the real meaning in the form in which it is used nowadays, it gained only in the middle of 2009. Initially, open-source database created by Carlo Strozzi, which stores all data as ASCII files and used shell instead of SQL to access the data, called such a way. The term "NoSQL" has spontaneous origin and there is no generally accepted definition or scientific institution behind. This name probably characterizes the vector of development of IT from relational databases. It stands for Not Only SQL, though there are supporters of direct determination No SQL.

Causes
  1. The first trend - increase of amount of stored data.. Only in 2009 and 2010 in the databases more information than in all previous human history was stored.
  2. The second trend - data interrelationship. Information can no longer be isolated. Every piece of knowledge is related with the data in other storage media. Pages on the Internet link to other pages. Tags relate marked information from different sources. Ontologies establish relationships between different terms, etc.
  3. The third trend - the use of semistructured information. Simple example: the description of the goods in the store. If earlier it was enough to have 5-6 fields to describe the man's shirt (size, color, material, product photo, ...), now the number of parameters can be up to several dozens. Moreover, a different sets of parameters are used for different shirts. In such circumstances, it becomes very difficult to determine in advance the structure of the table, which stores the item description.
  4. The fourth trend - architecture. In the 80s years of the last century the typical architecture used one big computer (mainframe) and one database. In the 90s, there was a spread of received client-server architecture. In the new century web-based applications are widely used, each with their own backend and other distributed solutions.

In such circumstances, the relational database performance falls sharply. And if for the majority of web-sites performance is still good enough, then applications such as advanced social networks and search services found SQL database inappropriate.

Basic features

Traditional relational database management system is based on the principles of ACID:

  1. Atomicity
  2. Consistency
  3. Isolation
  4. Durability

NoSQL is based on the principles of BASE (this term has been proposed by Eric Brewer):

  1. Basic Availability - each request is guaranteed to be completed (successfully or unsuccessfully).
  2. Soft State - state of the system can change over time, even without the input of new data, in order to achieve harmonization of data.
  3. Eventual Consistency - the data can be misaligned for a while, but come to an agreement after a while.
Means of accessing the data
  1. RESTful Interfaces. This approach assumes that every object we can manipulate has an unique address. Referring to this address we can get, create, edit or delete the specified object. In this case the server does not save any state, ie, each request is processed independently of other requests.
  2. Non-SQL query languages: GQL (SQL-like language for Google BigTable), SPARQL (the Semantic Web query language), Gremlin (graph traversal language), Sones Graph Query Language (query language to Sones Graph)
  3. API Requests: Google BigTable DataStore API, Neo4j Traversal API

Other typical NoSQL features are:

  1. Using multiprocessing
  2. Linear scalability (adding processors improves performance)
  3. Innovation - opens up many possibilities for data storage and processing
  4. The use of different types of storage
  5. The possibility of developing a database without specifying a schema
  6. Reduce development time
  7. Speed: even with a small amount of data, end users can estimate the decrease in response time of the system with hundreds of milliseconds to milliseconds

Basic data types

Key-value Storage

Basically, it uses associative array, data is presented as a set of key-value pairs. It is the simplest model, the keys can be sorted in lexicographical order, which improves performance. Such storage is used for storing images, creating specialized file systems, caches for objects as well as systems designed with an eye on scalability. Examples - Oracle NoSQL Database, Redis, dbm.

Document-oriented database

These databases are used to store hierarchical data structures. The basic idea is the introduction of a "document" concept. Although all of the database have something different definitions, they all believed that the document encapsulates and stores the data in any standard formats, such as XML or JSON. Each document is assigned a unique key, there is a query language or API for each database of this type of data for data access. Examples - CouchDB, Couchbase, MarkLogic, MongoDB, eXist.

Based on graphs

This type of database is used to model the social graph (social networking), in bioinformatics, as well as for the Semantic Web. In the graph database storage and handling mechanism should be allocated. Examples - AllegroGraph, ArangoDB, Sqrrl.

Bigtable

In this storage data is stored as a sparse matrix, whose rows and columns are used as keys. Along with document-oriented, this database type have similar usage scenarios: content management systems, blogs, event registration. Examples - Apache HBase, Apache Cassandra, Apache Accumulo

Links

  1. Wiki
  2. List of NoSQL Databases
  3. NoSQL vs SQL