Apache Ambari

From Bauman National Library
This page was last modified on 14 January 2019, at 19:10.
</td></tr>
Apache Ambari
Apache-ambari-project.png
Developer(s) Apache Software Foundation, Cloudsoft
Initial release November 2013; 9 years ago (2013-11)
Stable release
2.4.2 / 29 November 2016; 6 years ago (2016-11-29)
Preview release
2.4.1 / September 2016; 6 years ago (2016-09)
Repository {{#property:P1324}}
Development status Active
Written in Java, JavaScript and Python
Operating system Linux
Size 49 MB
Website ApacheAmbari

Apache Ambari is a software project of the Apache Software Foundation[Source 1], is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides web UI for Hadoop management backed by its RESTful APIs. Ambari was a sub-project of Hadoop but is now a top-level project in its own right.

Ambari is used by companies including Cardinal_Health, EBay, Expedia, Kayak, Lending club, Neustar, Pandora, Priceline.com, Samsung, Shutterfly, Spotify.

Overview

Ambari enables system administrators to provision, manage and monitor a Hadoop cluster, and also to integrate Hadoop with the existing enterprise infrastructure.

  • Provision a Hadoop Cluster
    • Ambari provides a step-by-step wizard for installing Hadoop services across any number of hosts.
    • Ambari handles configuration of Hadoop services for the cluster.
  • Manage a Hadoop cluster
    • Ambari provides central management for starting, stopping, and reconfiguring Hadoop services across the entire cluster.
  • Monitor a Hadoop cluster
    • Ambari provides a dashboard for monitoring health and status of the Hadoop cluster.
    • Ambari leverages Ambari Metrics System for metrics collection.
    • Ambari leverages Ambari Alert Framework for system alerting and will notify you when your attention is needed (e.g., a node goes down, remaining disk space is low, etc).
  • Integrate Hadoop with the Enterprise
    • Easily integrate Hadoop provisioning, management, and monitoring capabilities to their own applications with the Ambari REST APIs.

Hadoop cluster provisioning and ongoing management can be a complicated task, especially when there are hundreds or thousands of hosts involved. Ambari provides a single control point for viewing, updating and managing Hadoop service life cycles.

HDFS - distributed file system used in Hadoop project. HDFS-cluster consists primarily of NameNode server and DataNode-servers, which store the data itself. NameNode-server manages the file system namespace and access customer data. To unload NameNode-server, data transmission is performed only between the client and the DataNode-server.

Architecture

High level architecture of Ambari presented on the picture 1. It consists of Agents and Ambari Server.

Picture 1 – High level Ambari architecture

Agents

Agents heartbeat to the master every few seconds and receive commands from the master in the heartbeat responses. Heartbeat responses is the only way for master to send a command to the agent.

The command will be queued in the action queue, which will be picked up by the action executioner. Action executioner will pick the right tool (Puppet, Python, etc) for execution depending on the command type and action type. Thus the actions sent in the hearbeat response will be processed asynchronously at the agent. The action executioner will put the response or progress messages on the message queue. The agent will send everything on the message queue to the master in the next heartbeat.

The following architecture of Ambari Agent presented on picture 2.

Picture 2 – Architecture of Ambari Agent

Ambari Server

The requests land on the server via API. For each request request id is generated. Then a handler for the API is invoked in the Coordinator.

The API Handler implements the steps needed to execute requested API. For example, in case of adding a new service to an existing cluster the steps would be: install all the service components along with the required prerequisites, start the prerequisites and the service components in a specific order, and re-configure Nagios server to add new service monitoring as well.

The Coordinator can lookup in Dependency Tracker to know the entire set of components and their required states. Then the Coordinator will pass the list of components and their desired states to the Stage Planner. Stage Planner will return the staged sequence of operations that need to be performed on each node where these components are to be installed/started/modified. The Stage Planner will also generate the manifest (tasks for each individual nodes for each stage) using the Manifest Generator.

Coordinator will pass this ordered list of stages to the Action Manager with the corresponding request id.

Action Manager will update the state of each node-component, in the FSM, which will reflect that an operation is in progress. Note that the FSM for each affected node-component is updated. In this step, the FSM may detect an invalid event and throw failure, which will abort the operation and all actions will be marked as failed with an error.

Action Manager will create an action id for each operation and add it to the Plan. The Action Manager will pick first Stage from the plan and adds each action in this Stage to the queue for each of the affected nodes. The next Stage will be picked when first Stage completes. Action Manager will also start a timer for scheduled actions.

Heartbeat Handler will receive the response for the actions and notify the Action Manager. Action Manager will send an event to the FSM to update the state. In case of a timeout, the action will be scheduled again or marked as failed. Once all nodes for an action have reached completion (response received or final timeout) the action is considered completed. Once all actions for a Stage are completed the Stage is considered completed.

Action completion is also recorded in the database.

The Action Manager proceeds to the next Stage and repeats.

Picture 3 – Architecture of Ambari Server

Setup

Guide below shows how to set up a cluster using Ambari on local machine using virtual machines.

VMs Installation

You will need VirtualBox and Vagrant preinstalled.

Clone “ambari-vagrant” repo from github:

git clone https://github.com/u39kun/ambari-vagrant.git

Edit your /etc/hosts on your computer so that you will be able to resolve hostnames for the VMs:

sudo -s 'cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts'

Copy the private key to your home directory (or some place convenient for you) so that it’s easily accessible for uploading via Ambari Web:

vagrant

The above command shows the command usage and also creates a private key as ~/.vagrant.d/insecure_private_key. [Source 2]

Starting VMS

Change directory to ambari-vagrant.

cd ambari-vagrant

You will see subdirectories for different OS’s. “cd” into the OS that you want to test. centos6.8 is recommended as this is quicker to launch than other OS's. Now you can start VMs with the following command:

cd centos6.8
cp ~/.vagrant.d/insecure_private_key .
./up.sh <# of VMs to launch>

For example, up.sh 3 starts 3 VMs. 3 seems to be a good number with 16GB of RAM without taxing the system too much. With the default Vagrantfile, you can specify up to 10 (if your computer can handle it; you can even add more).

VMs will have the FQDN <os-code>[01-10].ambari.apache.org, where <os-code> is c59 (CentOS 5.9), c68 (CentOS 6.8), etc. E.g., c5901.ambari.apache.org, c6801.ambari.apache.org, etc.

VMs will have the IP address 192.168.<os-subnet>.1[01-10], where <os-subnet> is 59 for CentOS 5.9, 68 for CentOS 6.8, etc. E.g., 192.168.59.101, 192.168.64.101, etc.

Testing Ambari

If it is your first time running a vagrant command, run:

vagrant init

Log into the VM:

vagrant ssh c6801

Note that this logs you in as user vagrant. Once you are logged in, you can run:

sudo su -

to make yourself root.

To install Ambari, you can build it yourself from source[Source 3], or you can use published binaries.

In this guide we will use publicly available binaries made via Hortonworks, a commercial vendor for Hadoop. [Source 2]

 # CentOS 6 (for CentOS 7, replace centos6 with centos7 in the repo URL)
 # 
 # to test public release 2.5.1
wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.1.0/ambari.repo
yum install ambari-server -y
  
# Ubuntu 14 (for Ubuntu 16, replace ubuntu14 with ubuntu16 in the repo URL)
# to test public release 2.5.1
wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.1.0/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update
apt-get install ambari-server -y
  
 # SUSE 11 (for SUSE 12, replace suse11 with suse12 in the repo URL)
 # to test public release 2.5.1
 wget -O /etc/zypp/repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.5.1.0/ambari.repo
 zypper install ambari-server -y


Ambari offers many installation options (see Ambari User Guides), but to get up and running quickly with default settings, you can run the following to set up and start ambari-server.

ambari-server setup -s
ambari-server start

Once Ambari Server is started, hit http://c6801.ambari.apache.org:8080 (URL depends on the OS being tested) from your browser on your local computer. Note that Ambari Server can take some time to fully come up and ready to accept connections. Keep hitting the URL until you get the login page. [Source 2]

Once you are at the login page, login with the default username admin and password admin. On the Install Options page, use the FQDNs of the VMs. For example:

c6801.ambari.apache.org
c6802.ambari.apache.org
c6803.ambari.apache.org


Alternatively, you can use a range expression:

c68[01-03].ambari.apache.org

Specify the the non-root SSH user vagrant, and upload insecure_private_key file that you copied earlier as the private key.

Follow the onscreen instructions to install your cluster.

When done testing, run this to purge the VMs.

vagrant destroy -f

References

  1. Apache Ambari // Website. URL: https://ambari.apache.org (Retrieved 19.12.2018)
  2. 2.0 2.1 2.2 Apache Ambari Wiki // Website. URL: https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide (Retrieved 19.12.2018)
  3. Ambari Development // Website. URL: https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Development (Retrieved 19.12.2018)

External links