FlockDB

From Bauman National Library
This page was last modified on 18 June 2016, at 14:39.
</td></tr>
FlockDB
Developer(s) Twitter, Inc.
Stable release
1.8.5 / February 23, 2012 (2012-02-23)
Repository {{#property:P1324}}
Development status Abandoned
Written in Java,Scala,Ruby
Operating system Windows,Linux,OS X,Ubuntu
Available in English
Type Graph Database
Website github.com/twitter/flockdb

FlockDB is an open source, distributed, fault-tolerant graph database for managing data at scale. It initially started out for Twitter to store its database of users and their relationships.

Description

FlockDB is a distributed graph database for storing adjancency lists[1], with goals of supporting:

  • a high rate of add/update/remove operations
  • complex set arithmetic queries
  • paging through query result sets containing millions of entries
  • ability to "archive" and later restore archived edges
  • horizontal scaling including replication
  • online data migration
FlockDB.png

Non-goals include:

  • multi-hop queries (or graph-walking queries)
  • automatic shard migrations

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites.

How it Works

If, for example, you're storing a social graph (user A follows user B), and it's not necessarily symmetrical (A can follow B without B following A), then FlockDB can store that relationship as an edge: node A points to node B. It stores this edge with a sort position, and in both directions, so that it can answer the question "Who follows A?" as well as "Whom is A following?"

This is called a directed graph. (Technically, FlockDB stores the adjacency lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit destination ID, a state (normal, removed, archived), and a 32-bit position used for sorting. The edges are stored in both a forward and backward direction, meaning that an edge can be queried based on either the source or destination ID.

For example, if node 134 points to node 90, and its sort position is 5, then there are two rows written into the backing store:

    forward: 134 -> 90 at position 5
    backward: 90 <- 134 at position 5

If you're storing a social graph, the graph might be called "following", and you might use the current time as the position, so that a listing of followers is in recency order. In that case, if user 134 is Nick, and user 90 is Robey, then FlockDB can store:

    forward: Nick follows Robey at 9:54 today
    backward: Robey is followed by Nick at 9:54 today

The (source, destination) must be unique: only one edge can point from node A to node B, but the position and state may be modified at any time. Position is used only for sorting the results of queries, and state is used to mark edges that have been removed or archived (placed into cold sleep).

Building

Building is as simple as:

$ sbt clean update package-dost

but there are some pre-requisites. You need:

  • java 1.6
  • sbt 0.7.4
  • thrift 0.5.0

Running

Now run setup-env.sh:

$ ./dist/flockdb/scripts/setup-env.sh

It kills and restarts flockdb, creates the flockdb_development database if necessary, and runs gizzmo to create shard configurations for graphs 1-15. You can tell flockdb is running because it will create a flock.log file in the current folder, and it will respond to server_info queries:

    $ curl localhost:9990/server_info.txt
    build: 20100713-165811
    build_revision: 4b2443968d131b7967885b0b0cb62dde04ab5455
    name: flockdb
    start_time: Tue Jun 13 17:01:33 PDT 2016
    uptime: 440837
    version: 1.0.4

References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />

Notes

  • An adjacency list is a collection of unordered lists used to represent a finite graph. Each list describes the set of neighbors of a vertex in the graph. This is one of several commonly used representations of graphs for use in computer programs.