Apache Giraph
This page was last modified on 24 December 2016, at 11:23.
Developer(s) | Apache Software Foundation |
---|---|
Stable release |
1.1.0
/ 2014/11/19 |
Repository |
{{ |
Development status | Active |
Written in | Java |
Operating system | Cross-platform |
Type | graph processing |
License | Apache 2.0 Licence |
Website |
giraph |
Apache Giraph is an iterative graph processing system built on top of Apache Hadoop for high scalability. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale.
Description
The input to a Giraph computation is a graph composed of vertices and directed edges, see Figure 1. For example vertices can represent people, and edges friend requests. Each vertex stores a value, so does each edge. The input, thus, not only determines the graph topology, but also the initial values of vertices and edges.
As an example, consider a computation that finds the distance from a predetermined source person s to any person in the social graph. In this computation, the value of an edge E is a floating point number denoting distance between adjacent people. The value of a vertex V is also a floating point number, representing an upper bound on the distance along a shortest path from the predetermined vertex s to v. The initial value of the predetermined source vertex s is 0, and the initial value for any other vertex is infinity.

Computation proceeds as a sequence of iterations, called supersteps in BSP. Initially, every vertex is active. In each superstep each active vertex invokes the Compute method provided by the user. The method implements the graph algorithm that will be executed on the input graph. Intuitively, you should think like a vertex when designing a Giraph algorithm, it is vertex oriented. A graph oriented approach is discussded in GIRAPH-818.
The Compute method:
- receives messages sent to the vertex in the previous superstep,
- computes using the messages, and the vertex and outgoing edge values, which may result in modifications to the values, and
- may send messages to other vertices.
The Compute method does not have direct access to the values of other vertices and their outgoing edges. Inter-vertex communication occurs by sending messages.
Присоединяйся к команде
ISSN:
Следуй за Полисом
Оставайся в курсе последних событий
License
Except as otherwise noted, the content of this page is licensed under the Creative Commons Creative Commons «Attribution-NonCommercial-NoDerivatives» 4.0 License, and code samples are licensed under the Apache 2.0 License. See Terms of Use for details.