This page was last modified on 18 June 2016, at 14:39.
1.8.5 / February 23, 2012
|Operating system||Windows,Linux,OS X,Ubuntu|
FlockDB is an open source, distributed, fault-tolerant graph database for managing data at scale. It initially started out for Twitter to store its database of users and their relationships.
FlockDB is a distributed graph database for storing adjancency lists, with goals of supporting:
- a high rate of add/update/remove operations
- complex set arithmetic queries
- paging through query result sets containing millions of entries
- ability to "archive" and later restore archived edges
- horizontal scaling including replication
- online data migration
- multi-hop queries (or graph-walking queries)
- automatic shard migrations
FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites.
How it Works
If, for example, you're storing a social graph (user A follows user B), and it's not necessarily symmetrical (A can follow B without B following A), then FlockDB can store that relationship as an edge: node A points to node B. It stores this edge with a sort position, and in both directions, so that it can answer the question "Who follows A?" as well as "Whom is A following?"
This is called a directed graph. (Technically, FlockDB stores the adjacency lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit destination ID, a state (normal, removed, archived), and a 32-bit position used for sorting. The edges are stored in both a forward and backward direction, meaning that an edge can be queried based on either the source or destination ID.
For example, if node 134 points to node 90, and its sort position is 5, then there are two rows written into the backing store:
forward: 134 -> 90 at position 5 backward: 90 <- 134 at position 5
If you're storing a social graph, the graph might be called "following", and you might use the current time as the position, so that a listing of followers is in recency order. In that case, if user 134 is Nick, and user 90 is Robey, then FlockDB can store:
forward: Nick follows Robey at 9:54 today backward: Robey is followed by Nick at 9:54 today
The (source, destination) must be unique: only one edge can point from node A to node B, but the position and state may be modified at any time. Position is used only for sorting the results of queries, and state is used to mark edges that have been removed or archived (placed into cold sleep).
Building is as simple as:
$ sbt clean update package-dost
but there are some pre-requisites. You need:
- java 1.6
- sbt 0.7.4
- thrift 0.5.0
It kills and restarts flockdb, creates the flockdb_development database if necessary, and runs gizzmo to create shard configurations for graphs 1-15. You can tell flockdb is running because it will create a flock.log file in the current folder, and it will respond to server_info queries:
$ curl localhost:9990/server_info.txt build: 20100713-165811 build_revision: 4b2443968d131b7967885b0b0cb62dde04ab5455 name: flockdb start_time: Tue Jun 13 17:01:33 PDT 2016 uptime: 440837 version: 1.0.4
Cite error: Invalid
parameter "group" is allowed only.
<references />, or
<references group="..." />