Difference between revisions of "Apache Storm"

From Bauman National Library
Line 91: Line 91:
 
* [https://github.com/AirSage/Petrel Petrel, a tool for creating Storm applications in Python]
 
* [https://github.com/AirSage/Petrel Petrel, a tool for creating Storm applications in Python]
 
* [https://fsstorm.github.io/FsStorm/ FsStorm, a lib for authoring Storm components and topologies in F#]
 
* [https://fsstorm.github.io/FsStorm/ FsStorm, a lib for authoring Storm components and topologies in F#]
 +
 +
[[ru:Apache Storm]]

Revision as of 19:36, 30 December 2016

</td></tr>
Apache Storm
Apache Storm's Logo
Distributed and fault-tolerant realtime computation
Developer(s) Backtype, Twitter
Stable release
1.0.2 / 10 August 2016 (2016-08-10)
Repository {{#property:P1324}}
Development status Active
Written in Clojure Java
Operating system Cross-platform
Type Distributed stream processing
License Apache License 2.0
Website storm.apache.org

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz[1] and team at BackType,[2] the project was open sourced after being acquired by Twitter.[3] It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.[4]

A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.[5]

Storm became an Apache Top-Level Project in September 2014[6] and was previously in Apache Incubator|incubation since September 2013.[7][8]

Development

Apache Storm is developed under the Apache License, making it available to most companies to use.

Main benefits

Here is a list of the benefits that Apache Storm offers:

  • Storm is open source, robust, and user friendly. It could be utilized in small companies as well as large corporations.
  • Storm is fault tolerant, flexible, reliable, and supports any programming language.
  • Allows real-time stream processing.
  • Storm is unbelievably fast because it has enormous power of processing the data.
  • Storm can keep up the performance even under increasing load by adding resources linearly. It is highly scalable.
  • Storm performs data refresh and end-to-end delivery response in seconds or minutes depends upon the problem. It has very low latency.
  • Storm has operational intelligence.
  • Storm provides guaranteed data processing even if any of the connected nodes in the cluster die or messages are lost.

Core concepts

Apache Storm reads raw stream of real-time data from one end and passes it through a sequence of small processing units and output the processed / useful information at the other end.

The following diagram depicts the core concept of Apache Storm.

Core concept
Components
Component Description
Tuple Tuple is the main data structure in Storm. It is a list of ordered elements. By default, a Tuple supports all data types. Generally, it is modelled as a set of comma separated values and passed to a Storm cluster.
Stream Stream is an unordered sequence of tuples.
Spouts Source of stream. Generally, Storm accepts input data from raw data sources like Twitter Streaming API, Apache Kafka queue, Kestrel queue, etc. Otherwise you can write spouts to read data from datasources. “ISpout" is the core interface for implementing spouts. Some of the specific interfaces are IRichSpout, BaseRichSpout, KafkaSpout, etc.
Bolts Bolts are logical processing units. Spouts pass data to bolts and bolts process and produce a new output stream. Bolts can perform the operations of filtering, aggregation, joining, interacting with data sources and databases. Bolt receives data and emits to one or more bolts. “IBolt” is the core interface for implementing bolts. Some of the common interfaces are IRichBolt, IBasicBolt, etc.

Usage

Apache Storm is used in production by following companies:

Twitter − Twitter is using Apache Storm for its range of “Publisher Analytics products”. “Publisher Analytics Products” process each and every tweets and clicks in the Twitter Platform. Apache Storm is deeply integrated with Twitter infrastructure.

NaviSite − NaviSite is using Storm for Event log monitoring/auditing system. Every logs generated in the system will go through the Storm. Storm will check the message against the configured set of regular expression and if there is a match, then that particular message will be saved to the database.

Wego − Wego is a travel metasearch engine located in Singapore. Travel related data comes from many sources all over the world with different timing. Storm helps Wego to search real-time data, resolves concurrency issues and find the best match for the end-user.

Installation

Peer platforms

Storm is but one of dozens of stream processing engines, for a more complete list see Stream processing. Twitter announced Heron (event processor) on June 2, 2015[9] which is API compatible with Storm. There are other comparable streaming data engines such as Spark Streaming and Flink

[10]

References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />

External links

  • Marz, Nathan. "About Nathan Marz". Nathan Marz. Retrieved 28 March 2013. 
  • "BackType Website (defunct)". BackType. Retrieved 28 March 2013. 
  • "A Storm is coming: more details and plans for release". Engineering Blog. Twitter Inc. Retrieved 29 July 2015. 
  • "Storm Codebase". Github. Retrieved 8 February 2013. 
  • "Tutorial - Components of a Storm cluster". Documentation. Apache Storm. Retrieved 29 July 2015. 
  • "Apache Storm Graduates to a Top-Level Project". 
  • "Storm Project Incubation Status". Apache Software Foundation. Retrieved 29 October 2013. 
  • "Storm Proposal". Apache Software Foundation. Retrieved 29 October 2013. 
  • "Flying faster with Twitter Heron". Engineering Blog. Twitter Inc. Retrieved 3 June 2015. 
  • "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming" (PDF). IEEE. May 2016.