Hypertable is an open source database system inspired by publications on the design of Google's BigTable. The project is based on experience of engineers who were solving large-scale data-intensive tasks for many years. Hypertable runs on top of a distributed file system such as the Apache HDFS, GlusterFS or the Kosmos File System (KFS). It is written almost entirely in C++ as the developers believe it has significant performance advantages over Java. The main differences:
- The keys are UTF-8 strings
- No support for data types, values are treated as sequences of bytes
- No JOIN operations
- No transaction support
A relational database assumes that each column defined in the table schema will have a value for each row that is present in the table. NULL values are usually represented by a special marker (e.g. \N). Primary key and column identifier are implicitly associated with each cell based on its physical position in the layout.
The following figure shows how a relational database table might look on disk.
Hypertable (and Bigtable) operate on the principle of "Log Structured Merge Tree". This approach smooths out the table structure into an ordered list of pairs key/value, each of which represents a table cell. The key includes the full row identifier and a column that provides complete address information. NULL cells are simply not included in the list that allows you to store sparse data.
The following figure illustrates how Hypertable represents data on disk.
Hypertable can be installed in three different configurations, so the diversity was due with multiple ways of storing data.
This method of work is targeted at applications that need to work Hypertable , but do not require horizontal scalability and support for MapReduce. Hypertable is configured to run on a single computer and uses the OS file system. To improve performance, on these machines I suggest to use SSD drives.
Hadoop is actually an open implementation of the Google Filesystem and MapReduce. It contains all the features necessary for effective work Hypertable.
MapR is a scalable file system written in C++, modeled after Google File System and is 100% API compatible with Apache Hadoop. Has a built-in Hypertable MapR broker that allows it directly interacts with the MapR servers. Hypertable in conjunction with MapR achieves maximum performance.
We use the git version control system to manage the source code for Hypertable. To obtain a local copy of the repository, install the latest version of git (at least 184.108.40.206) and then issue the following commands to configure your identity in commit messages. To obtain the source:
cd ~/src git clone git://github.com/hypertable/hypertable.git
Here is a list of all modules:
- AsyncComm - Network communication library
- Common - General purpose utility library
- FsBroker - File system broker
- Hyperspace - Hyperspace library
- Hypertable - Hypertable servers and client library
- ThriftBroker - Thrift broker
- Tools - Command-line tools
- Bigtable is a compressed, high performance, and proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies. On May 6, 2015, a public version of Bigtable was launched as Google Cloud Bigtable