Hadoop 2.x Quick Notes :: Part - 2


Bhaskar S 12/25/2014


Overview

In Part-1 we laid out the steps to install, setup, and start-up Hadoop 2.x on a single node (localhost).

In this part, we will layout the steps to setup a 3-node Hadoop 2.x cluster.


Installation and Setup

We will install Hadoop 2.x on a 3-node Ubuntu 14.04 LTS based cluster.

To simulate a 3-node cluster, we leveraged VirtualBox to create three virtual machines with identical specs (1 CPU, 4GB RAM, 20 GB Disk) running on a Ubuntu 14.04 host computer (with one ethernet card).

CAUTION :: The only VirtualBox network setting that will work for this 3-node cluster setup is the Host-only Adapter option.

To use the Host-only networking for a virtual machine, we first need to define a new adapter for Host-only network by choosing the menu option File->Preferences->Network and then clicking on the Host-only Networks tab.

The following screenshot shows the Adapter entry under Host-only Networks:

Adapter Host-only Networks
Adapter

The following screenshot shows the DHCP Server entry under Host-only Networks:

DHCP Server Host-only Networks
DHCP Server

Use the above defined Host-only Adapter for a virtual machine by choosing the menu option Machine->Settings->Network.

The following screenshot shows the network settings for one of our virtual machines:

Network
Network Settings

NOTE :: The host computer will be able to connect to the 3 virtual machines. The virtual machines will be able to connect to each other. The virtual machines will *NOT* be able to connect to the host computer or the internet.

We named the the virtual machine nodes vb-host-1, vb-host-2, and vb-host-3 respectively.

In addition, we assigned static IP address to each of the three virtual machine hosts as 192.168.50.101 (vb-host-1), 192.168.50.102 (vb-host-2), and 192.168.50.103 (vb-host-3) respectively.

The following screenshot shows how we assigned a static IP address using the Ubuntu NetworkManager:

Static IP Address
Static IP Address for Virtual Host vb-host-1

Next, we modified the /etc/hosts file in each of the three virtual machine hosts vb-host-1, vb-host-2, and vb-host-3 to add the host names and IP addresses.

The following screenshot shows the modified /etc/hosts file:

Hosts File
/etc/hosts

Make sure these are the only entries in the /etc/hosts file. Delete all other entries.

In our 3-node cluster, we will designate vb-host-1 as the master node and have vb-host-2 and vb-host-3 as the slave nodes.

Following are the steps to install and setup Hadoop 2.x on our 3-node cluster:

This completes the installation, the necessary setup, and the start-up of our 3-node Hadoop 2.x cluster.

We were also able to successfully use HDFS and demonstrated the execution of the MapReduce example on our 3-node Hadoop 2.x cluster.

References

Hadoop 2.x Quick Notes :: Part - 1