MongoDB

From Bauman National Library
This page was last modified on 2 December 2016, at 10:22.
MongoDB
MongoDB Logo.png
Licence GNU AGPL v3.0 (drivers: Apache license)
Web site http://www.mongodb.org


MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software.

First developed by the software company MongoDB Inc. in October 2007 as a component of a planned platform as a service product, the company shifted to an open source development model in 2009, with MongoDB offering commercial support and other services[1]. Since then, MongoDB has been adopted as backend software by a number of major websites and services, including Craigslist, eBay, and Foursquare among others. As of July 2015, MongoDB is the fourth most popular type of database management system, and the most popular for document stores[2].

Main features

Document-oriented
Instead of taking a business subject and breaking it up into multiple relational structures, MongoDB can store the business subject in the minimal number of documents. For example, instead of storing title and author information in two distinct relational structures, title, author, and other title-related information can all be stored in a single document called Book.

Ad hoc queries
MongoDB supports search by field, range queries, regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions.

Indexing
Any field in a MongoDB document can be indexed (indices in MongoDB are conceptually similar to those in RDBMSes). Secondary indices are also available.

Self-Healing MongoDB Replica Sets for High Availability

Replication
MongoDB provides high availability with replica sets. A replica set consists of two or more copies of the data. Each replica set member may act in the role of primary or secondary replica at any time. The primary replica performs all writes and reads by default. Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can also perform read operations, but the data is eventually consistent by default.

Load balancing
MongoDB scales horizontally using sharding. The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.) MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure. Automatic configuration is easy to deploy, and new machines can be added to a running database.

File storage
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.

This function, called Grid File System, is included with MongoDB drivers and available for development languages (see "Language Support" for a list of supported languages). MongoDB exposes functions for file manipulation and content to developers. GridFS is used, for example, in plugins for NGINX and lighttpd. Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document.

In a multi-machine MongoDB system, files can be distributed and copied multiple times between machines transparently, thus effectively creating a load-balanced and fault-tolerant system.

Aggregation
MapReduce can be used for batch processing of data and aggregation operations. The aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY clause is used.

Server-side JavaScript execution
JavaScript can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.

Capped collections
MongoDB supports fixed-size collections called capped collections. This type of collection maintains insertion order and, once the specified size has been reached, behaves like a circular queue.

Architecture

Data Model

Data As Documents
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). The BSON encoding extends the popular JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents.

Documents that tend to share a similar structure are organized as collections. It may be helpful to think of collections as being analogous to a table in a relational database: documents are similar to rows, and fields are similar to columns.

For example, consider the data model for a blogging application. In a relational database the data model would comprise multiple tables. To simplify the example, assume there are tables for Categories, Tags, Users, Comments and Articles.

In MongoDB the data could be modeled as two collections, one for users, and the other for articles. In each blog document there might be multiple comments, multiple tags, and multiple categories, each expressed as an embedded array.

Example relational data model for a blogging application

As this example illustrates, MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables. With the MongoDB document model, data is more localized, which significantly reduces the need to JOIN separate tables. The result is dramatically higher performance and scalability across commodity hardware as a single read to the database can retrieve the entire document containing all related data.

In addition, MongoDB BSON documents are more closely aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.

Dynamic Schema
MongoDB documents can vary in structure. For example, all documents that describe users might contain the user id and the last date they logged into the system, but only some of these documents might contain the user’s identity for one or more third party applications. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline.

Developers can start writing code and persist the objects as they are created. And when developers add more features, MongoDB continues to store the updated objects without the need for performing costly ALTER_TABLE operations, or worse - having to re-design the schema from scratch.

Schema Design
Although MongoDB provides schema flexibility, schema design is still important. Developers and DBAs should consider a number of topics, including the types of queries the application will need to perform, how objects are managed in the application code, and how documents will change over time. Schema design is an extensive topic that is beyond the scope of this document. For more information, please see Data Modeling Considerations.

Query Model

Idiomatic Drivers
MongoDB provides native drivers for all popular programming languages and frameworks to make development natural. Supported drivers include Java, .NET, Ruby, PHP, JavaScript, node.js, Python, Perl, PHP, Scala and others. MongoDB drivers are designed to be idiomatic for the given language.

One fundamental difference as compared to relational databases is that the MongoDB query model is implemented as methods or functions within the API of a specific programming language, as opposed to a completely separate language like SQL. This, coupled with the affinity between MongoDB’s JSON document model and the data structures used in object-oriented programming, makes integration with applications simple. For a complete list of drivers see the MongoDB Drivers page.

Mongo Shell
The mongo shell is a rich, interactive JavaScript shell that is included with all MongoDB distributions. Nearly all commands supported by MongoDB can be issued through the shell, including administrative operations. The mongo shell is a popular way to interact with MongoDB for ad hoc operations. All examples in the MongoDB Manual are based on the shell. For more on the mongo shell, see the corresponding page in the MongoDB Manual.

Query Types
Unlike NoSQL databases, MongoDB is not limited to simple Key-Value operations. Developers can build rich applications using complex queries and secondary indexes that unlock the value in structured, semi-structured and unstructured data.

A key element of this flexibility is MongoDB's support for many types of queries. A query may return a document, a subset of specific fields within the document or complex aggregations against many documents:

  • Key-value queries return results based on any field inthe document, often the primary key.
  • Range queries return results based on values defined as inequalities (e.g, greater than, less than or equal to, between).
  • Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
  • Text Search queries return results in relevance order based on text arguments using Boolean operators (e.g., AND, OR, NOT).
  • Aggregation Framework queries return aggregations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement).
  • MapReduce queries execute complex data processing that is expressed in JavaScript and executed across data in the database.

Indexing
Indexes are a crucial mechanism for optimizing system performance and scalability while providing flexible access to the data. Like most database management systems, while indexes will improve the performance of some operations by orders of magnitude, they incur associated overhead in write operations, disk usage, and memory consumption. By default, the WiredTiger storage engine compresses indexes in RAM, freeing up more of the working set for documents.

Data as documents: simpler for developers, faster for users.

MongoDB includes support for many types of secondary indexes that can be declared on any field in the document, including fields within arrays:

  • Unique Indexes. By specifying an index as unique, MongoDB will reject inserts of new documents or the update of a document with an existing value for the field for which the unique index has been created. By default all indexes are not set as unique. If a compound index is specified as unique, the combination of values must be unique.
  • Compound Indexes. It can be useful to create compound indexes for queries that specify multiple predicates For example, consider an application that stores data about customers. The application may need to find customers based on last name, first name, and city of residence. With a compound index on last name, first name, and city of residence, queries could efficiently locate people with all three of these values specified. An additional benefit of a compound index is that any leading field within the index can be used, sofewer indexes on single fields may be necessary: this compound index would also optimize queries looking for customers by last name.
  • Array Indexes. For fields that contain an array, each array value is stored as a separate index entry. For example, documents that describe products might include a field for components. If there is an index on the component field, each component is indexed and queries on the component field can be optimized by this index. There is no special syntax required for creating array indexes – if the field contains an array, it will be indexed as a array index.
  • TTL Indexes. In some cases data should expire out of the system automatically. Time to Live (TTL) indexes allow the user to specify a period of time after which the data will automatically be deleted from the database. A common use of TTL indexes is applications that maintain a rolling window of history (e.g., most recent 100 days) for user actions such as clickstreams.
  • Geospatial Indexes. MongoDB provides geospatial indexes to optimize queries related to location within a two dimensional space, such as projection systems for the earth. These indexes allow MongoDB to optimize queries for documents that contain points or a polygon that are closest to a given point or line; that are within a circle, rectangle, or polygon; or that intersect with a circle, rectangle, or polygon.
  • Sparse Indexes. Sparse indexes only contain entries for documents that contain the specified field. Because the document data model of MongoDB allows for flexibility in the data model from document to document, it is common for some fields to be present only in a subset of all documents. Sparse indexes allow for smaller, more efficient indexes when fields are not present in all documents.
  • Text Search Indexes. MongoDB provides a specialized index for text search that uses advanced, language-specific linguistic rules for stemming, tokenization and stop words. Queries that use the text search index will return documents in relevance order. One or more fields can be included in the text index.

Query Optimization
MongoDB automatically optimizes queries to make evaluation as efficient as possible. Evaluation normally includes selecting data based on predicates, and sorting data based on the sort criteria provided. The query optimizer selects the best index to use by periodically running alternate query plans and selecting the index with the best response time for each query type. The results of this empirical test are stored as a cached query plan and are updated periodically. Developers can pre-review and optimize plans using the powerful explain method and index filters.

Index intersection provides additional flexibility by enabling MongoDB to use more than one index to optimize an ad-hoc query at run-time.

Covered Queries
Queries that return results containing only indexed fields are called covered queries. These results can be returned without reading from the source documents. With the appropriate indexes, workloads can be optimized to use predominantly covered queries.

Data Management

Auto-Sharding
MongoDB provides horizontal scale-out for databases on low cost, commodity hardware or cloud infrastructure using a technique called sharding, which is transparent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB automatically balances the data in the sharded cluster as the data grows or the size of the cluster increases or decreases.

Unlike relational databases, sharding is automatic and built into the database. Developers don't face the complexity of building sharding logic into their application code, which then needs to be updated as shards are migrated. Operations teams don't need to deploy additional clustering software to manage process and data distribution.

Unlike other distributed databases, multiple sharding policies are available that enable developers and administrators to distribute data across a cluster according to query patterns or data locality. As a result, MongoDB delivers much higher scalability across a diverse set of workloads:

  • Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize rangebased queries.
  • Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
  • Location-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with specific shards and hardware. Users can continuously refine the physical location of documents for application requirements such as locating data in specific data centers or on multi-temperature storage (i.e. SSDs for the most recent data, and HDDs for older data).

Query Router
Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards.

For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will broadcast the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.

MongoDB base commands

Check the version of MongoDB installed

   
$ mongo --version
  MongoDB shell version: 2.6.4
$

Connect to the database from the shell

   
$ mongo
  MongoDB shell version: 2.6.4
  connecting to: test
  >

By default, mongo looks for a database server running on port 27017 at the localhost server. We can change this settings by using the –host and –port options.

Close this connection

  > ^C
  bye
$

or:

  > exit
  bye
$

or using the kill command:

$ kill 4599

Выключить базу данных
Завершить процесс:

$ ps -ef | grep mongo
  mongodb 1339 1 0 16:23 ? 00:00:18 /usr/bin/mongod --config /etc/mongod.conf
$ kill 1339

We must be carefull when using kill -9 <pid>, this command will cause a sudden drop of the database, and it can result in inconsistency and data loss.

Consult the list of databases already created

    > show dbs
    admin (empty)
    local 0.078GB
    test 3.952GB
    >

We have checked that we are connected to the database we have asked for, however, it has not been created! MongoDB creates both the database and the collection when we insert our first document.

    > db.teams.insert( { teamName : "Boca Juniors" } )
    WriteResult({ "nInserted" : 1 })
    > show dbs
    admin   (empty)
    dbtest  0.078GB
    local   0.078GB
    test    3.952GB
    >

We already know that MongoDB does not have scheme, that is, we do not need to declare the structure and the characteristics of our collections before using them. Therefore, the collection will be created when we insert the first document in it, as we have seen before.

MongoDB creates a set of files per database. These files are stored in the directory we have specified when we have started the mongod service or, by default, at the /var/lib/mongodb/ directory.

Delete a database

By using the dropDatabase() command we will delete the whole current database, including metadata and also the physic files. The “db” object will still be pointing at the same database.

    > db
    dbtest
    > db.dropDatabase()
    { "dropped" : "dbtest", "ok" : 1 }
    > show dbs
    admin   (empty)
    local   0.078GB
    test    3.952GB
    > db
    dbtest
    >

MongoDB Drupal

Ubuntu 14.04 operating system is used to install

$ sudo apt-get update 
$ sudo -s
# apt-get install apache2 -y
# apt-get install mysql-server php5-mysql
# mysql_install_db
# apt-get install php5 libapache2-mod-php5 php5-mcrypt php-dev
# nano /etc/apache2/apache.conf
# service apache2 restart
# mysql -u root -p
# CREATE DATABASE drupal;
# exit
# apt-get install php5-gd php5-curl libssh2-php php-pear
# nano /etc/php5/apache2/php.ini
# a2enmod rewrite
# nano /etc/apache2/sites-enabled/000-default.conf
# service apache2 restart
# cd ~
# wget http://ftp.drupal.org/files/projects/drupal-8.2.3.tar.gz
# tar xzvf drupal*
# cd drupal*
# rsync -avz . /var/www/html
# cd /var/www/html
# mkdir /var/www/html/sites/default/files
# cp /var/www/html/sites/default/default.settings.php /var/www/html/sites/default/settings.php
# chmod 777 /var/www/html/sites/default/settings.php
# chown -R :www-data /var/www/html/*
# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
# echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
# apt-get update
# apt-get install -y mongodb-org
# mkdir /data/db
# chown mongodb:mongodb /data/db
# chmod 777 /data/db
# pecl install mongo
# apt-get install build-essential
# nano /etc/php5/apache2/php.ini
# etc/init.d/apache2 restart
# wget https://ftp.drupal.org/files/projects/mongodb-8.x-1.x.dev.tar.gz
# tar -xf ~/mongodb-8.x-1.x.dev.tar.gz
# /etc/init.d/apache2 restart

MongoDB C# ORM

For there is MongoDB driver MongoDB C #, which allows you to write your own ORM. Consider working with mongo using ORM in the .net environment.

What you need:

1. Download (http://www.mongodb.org/downloads), unzip and run mongod (a server)

2. Driver (https://github.com/mongodb/mongo-csharp-driver/downloads), dll must be connected to the project.

3. Come

Creating a database.

By default, the database is in the folder c:/data/db Run mongo.exe and create a new database:

$ use mongoblog

MongoDb peculiarity is that it is a document-oriented database, and does not contain information about the structure, so we ended up here. Immediately go to the description of data models.

We will have one entity (the collection) unicorns.

public partial class unicorns
{           
    [BsonId]
    public ObjectId Id { get; set; } // indicates that this property - id field of the object 
    public string name { get; set; }
    public string gender { get; set; }
    public DateTime dob { get; set; }
    public List<string> loves { get; set; }
    public int weight { get; set; }
    public int vampires { get; set; }
    public unicorns(string Name, string Gender, DateTime Dob, List<string> Loves, int Weight, int Vampires) { name = Name; gender = Gender; dob = Dob; loves = Loves; weight = Weight; vampires = Vampires; }
}

Next, consider working with MongoDB: 1) connection to Mongo 2) adding a recording 3) a change records 4) indexing 5) remove record 6) the output records (filtering / paging / sorting) 7) Search 8) database backup

Connecting to a database.

Default base covers 27017 port on the server (like we usually localhost). Connect to the database:

var ConnectionString = "mongodb://localhost:27017";
string User = "user", Pwd = "pwd", Database = "mongoblog";
var client = new MongoClient(ConnectionString);
MongoServer server = client.GetServer();
MongoCredentials credentials = new MongoCredentials(User, Pwd);  // if you use db.mongoblog.auth (...);
MongoDatabase database = server.GetDatabase(Database);

Add / Edit Recording

// To add an item to the collection is necessary to obtain a collection of named
string nameCollection = "unicorns";
var collection = database.GetCollection<unicorns>(nameCollection);
unicorns obj = new unicorns ( "Aurora", "f", new DateTime (2015, 7, 20), new List <string> (), 450, 43);
// To add, you can run the command:
collection.Insert<unicorns>(obj);
// Or (as to change)
collection.Save<unicorns>(obj);

MongoDb alone adds _id field - a unique setting. If when the Save command from the Object Id is already present in the collection, perform the update of the object.

Indexing

For a quick search, we can add an index on any field. For indexing, use the command

collection.EnsureIndex(IndexKeys.Descending("weight"));

This will speed up the sorting on this field.

Removal

To remove the id is created query (Query). In this case

var query = Query.EQ("_id", id)

and run the command:

collection.Remove(query);

Output records

To display the values of the filter and sorted, so even with paging using the cursor. For example, to select from a collection of undeleted (IsDeleted = false) sorted descending by date (AddedDate desc) 10yu page (skip 90 items and print the following 10) compiled a cursor:

var fields = collection.Find(Query.EQ("weight", 600));
foreach (var obje in fields)
{
    Console.WriteLine("{0}", obje);
}

Search.

To find the element containing the given substring must specify the following query:

Console.WriteLine("{0}", collection.Find(Query.Matches("name", "Hor")));

There are 2 problems as the search is case-sensitive, ie, Peter! = Peter, and besides, we have a lot of fields.

Backup database.

That backup instead of replication. To have access to the database file you need to perform a lock (the base will continue working, but at this point all write commands are cached, then to be executed). After that, the database can be copied, archived and put on ftp. After this procedure, it is necessary to unlock the base (Unlock). Teams:

public void Lock()
{
    var command = new CommandDocument();
    command.Add("fsync", 1);
    command.Add("lock", 1);
    var collectiono = database.RunCommand(command);
}

public void Unlock()
{
    collectiono = database.RunCommand("$cmd.sys.unlock");
    collection.FindOne();
}

If something goes wrong, the database will need to restart the server with the command:

$ mongod --repair

MongoDB Links

References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />
  1. "10gen embraces what it created, becomes MongoDB Inc.". Gigaom. Retrieved 27 August 2013.
  2. "Popularity ranking of database management systems". db-engines.com. Solid IT. Retrieved 2015-07-04.