Exploring Apache ZooKeeper :: Part-1


Bhaskar S 05/17/2014


Overview

Apache ZooKeeper is a robust, reliable, and resilient open-source coordination service that can be leveraged by distributed applications.

ZooKeeper exposes a simple set of primitives which can be leveraged to build highly reliable cluster-aware distributed applications with the following capabilities:

ZooKeeper being a distributed coordination service is a distributed application in itself with a set of servers (nodes) providing the reliable coordination service. This set of ZooKeeper servers (nodes) providing the reliable coordination service is called an ensemble.

The following Figure-1 is an illustration of an Ensemble:

ZooKeeper Ensemble
Figure-1

ZooKeeper allows the processes in a cluster-aware distributed application to coordinate with each other through a shared hierarchical namespace which is organized similarly to the standard file system. The hierarchical namespace consists of data registers that can hold some amount of data as byte array (upto 1MB) and are called znodes. Think of these znodes as similar to a directory in a standard file system.

The following Figure-2 is an illustration of the znodes hierarchy:

ZooKeeper Ensemble
Figure-2

Each of the znodes in the ZooKeeper namespace is identified by a name, which is a sequence of path elements separated by a slash (/). From the above example, the znode S2 is identified by the name /T1/S2.

The znodes hierarchy (along with data) is stored in-memory within each of the ZooKeeper servers in the ensemble. This allows ZooKeeper to achieve high-throughput and low-latency and respond quickly to reads from the clients. Each ZooKeeper server also maintains a snapshot of the in-memory structure as well as a transaction log on the disk for persistent store.

Any updates to the znode hierarchy (including data) is replicated amongst the servers in the ensemble.

ZooKeeper exposes a very simple set of primitive operations which are as follows:

Operation Description
create <path> <data> Creates a znode at the appropriate location in the hierarchical namespace as specified by the <path> containing the specified <data>
exists <path> Checks if the znode exists at the location as specified by the <path>
stat <path> Returns the metadata information associated with the znode for the specified by the <path>. Metadata includes information such as creation time, last modification time, version number, etc
getData <path> Returns data contained in the znode at the location as specified by the <path>
setData <path> <data> Sets the data contained in the znode at the location as specified by the <path> to the specified <data>
getChildren <path> Returns a list of all the children under the znode as specified by the <path>
delete <path> Deletes the znode at the location as specified by the <path>

A distributed application (client) can leverage the above mentioned primitives to create higher level distributed operations such as Configuration Management, Naming Service, Group Membership, etc.

Client(s) can connect to the ensemble and issue requests for the above mentioned primitives. At any given moment in time, a client is connected to only one of the servers in the ensemble. The client periodically checks to see if the server it is connected is alive. If the client detects the server as dead, then the client automatically connects to a different server in the ensemble.

When a client issues a read request on a znode, the read request is serviced from the local replica of the server that the client is connected to.

On the other hand, when the client issues a write request on a znode, the actual write request is processed by the leader of the ensemble. When the ZooKeeper service is initialized, one servers from the ensemble is elected as the leader (see Figure.1). The remaining servers are the followers. When a client issues a write request, the server the client is connected to passes on the request to the leader. This leader then issues the same write request to a quoram of servers (N/2+1) in the ensemble. If the write request on the quorum of servers succeeds, the write request is considered successful.

A znode can be created as being either persistent or ephemeral. A persistent znode can only be deleted using the delete primitive. An ephemeral znode on the other hand, will be deleted when the client that created the znode either crashes or closes its connection gracefully.

A znode maintains metadata just like the standard filesystem. The metadata includes information such as when the znode was created, when the znode was last modified, length of data in the znode, version number of the data in the znode, etc.

The version number is incremented each time the data in the znode is modified.

A client can register for notification of changes to a znode. This is referred to as a watch. A watch is one-time trigger event that is sent to the client that set it. The trigger event occurs when the data corresponding to the znode changes.

Installation and Setup On a 3-node Cluster

Ensure there are at least 3 Ubuntu 64-bit based Linux nodes in the cluster. Let us assume the IP addresses of the 3 nodes be: 192.168.1.1, 192.168.1.2, and 192.168.1.3 respectively.

Download the latest stable version (3.4.6 as of 05/10/2014) of Apache ZooKeeper from the project site located at the URL zookeeper.apache.org/releases.html

Following are the steps to install and setup Apache ZooKeeper on a Ubuntu 64-bit based Linux workstation:

Do not worry about the exceptions from the Output.1 above; it is from the Leader Election process as we start the ZooKeeper server in each of the 3 nodes one-by-one.

Hands-on with Apache ZooKeeper

Apache ZooKeeper has a command-line client $ZOOKEEPER_HOME/bin/zkCli.sh, which behaves like an interactive shell. In the following paragraphs we will explore some of the primitives of ZooKeeper.

To launch the interactive ZooKeeper client, execute the following command:

$ZOOKEEPER_HOME/bin/zkCli.sh -server 192.168.1.6:2181,192.168.1.3:2181,192.168.1.13:2181

The following will be the output:

Output.2

Connecting to 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181
2014-05-09 22:57:17,975 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-05-09 22:57:17,978 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=my-host
2014-05-09 22:57:17,979 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_05
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-8-oracle/jre
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/home/zkuser/zookeeper/bin/../build/classes:/home/zkuser/zookeeper/bin/../build/lib/*.jar:/home/zkuser/zookeeper/bin/../lib/zookeeper-3.4.6.jar:/home/zkuser/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/zkuser/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/home/zkuser/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/home/zkuser/zookeeper/bin/../lib/log4j-1.2.16.jar:/home/zkuser/zookeeper/bin/../lib/jline-0.9.94.jar:/home/zkuser/zookeeper/bin/../zookeeper-*.jar:/home/zkuser/zookeeper/bin/../src/java/lib/*.jar:/home/zkuser/zookeeper/bin/../conf:
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2014-05-09 22:57:17,981 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.13.0-24-generic
2014-05-09 22:57:17,982 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=zkuser
2014-05-09 22:57:17,982 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/zkuser
2014-05-09 22:57:17,982 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/zkuser/zookeeper
2014-05-09 22:57:17,983 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=192.168.1.6:2181,192.168.1.3:2181,192.168.1.13:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@506c589e
Welcome to ZooKeeper!
2014-05-09 22:57:18,009 [myid:] - INFO  [main-SendThread(192.168.1.3:2181):ClientCnxn$SendThread@975] - Opening socket connection to server 192.168.1.1/192.168.1.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2014-05-09 22:57:18,013 [myid:] - INFO  [main-SendThread(192.168.1.1:2181):ClientCnxn$SendThread@852] - Socket connection established to 192.168.1.1/192.168.1.1:2181, initiating session
[zk: 192.168.1.6:2181,192.168.1.3:2181,192.168.1.13:2181(CONNECTING) 0] 2014-05-09 22:57:18,043 [myid:] - INFO  [main-SendThread(192.168.1.1:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server 192.168.1.1/192.168.1.1:2181, sessionid = 0x245e7dfa13a0001, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] 

To list all the znodes in ZooKeeper hierarchical namespace, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] ls /

The following will be the output:

Output.3

[zookeeper]

The above output indicates that there is a default znode called zookeper.

To create a persistent znode named mytest with no data, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] create /mytest ""

The following will be the output:

Output.4

Created /mytest

To again list all the znodes in ZooKeeper, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] ls /

The following will be the output:

Output.5

[mytest, zookeeper]

To create an ephemeral znode named mytemp with no data, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] create -e /mytemp ""

The following will be the output:

Output.6

Created /mytemp

To again list all the znodes in ZooKeeper, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] ls /

The following will be the output:

Output.7

[mytemp, mytest, zookeeper]

To close the client connection to ZooKeeper and quit the interactive shell, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] quit

The following will be the output:

Output.8

Quitting...
2014-05-09 22:55:42,255 [myid:] - INFO  [main:ZooKeeper@684] - Session: 0x245e7dfa13a0000 closed
2014-05-09 22:55:42,255 [myid:] - INFO  [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down

Now, re-launch the interactive ZooKeeper client by executing the following command:

$ZOOKEEPER_HOME/bin/zkCli.sh -server 192.168.1.6:2181,192.168.1.3:2181,192.168.1.13:2181

And issue the command to list all the znodes in ZooKeeper by executing the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] ls /

The following will be the output:

Output.9

[mytest, zookeeper]

As expected, the ephemeral znode with the path /mytemp has been automatically deleted.

To fetch the data stored in the znode named /mytest, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] get /mytest

The following will be the output:

Output.10

""
cZxid = 0x100000004
ctime = Sat May 10 22:52:50 EDT 2014
mZxid = 0x100000004
mtime = Sat May 10 22:52:50 EDT 2014
pZxid = 0x100000004
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 2
numChildren = 0

The first line with the empty double quotes is the data. Remember we created the znode named /mytest with no data.

The get data command also displays the metadata information.

To display the metadata associate with the znode named /mytest, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] stat /mytest

The following will be the output:

Output.11

cZxid = 0x100000004
ctime = Sat May 10 22:52:50 EDT 2014
mZxid = 0x100000004
mtime = Sat May 10 22:52:50 EDT 2014
pZxid = 0x100000004
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 2
numChildren = 0

To set the data stored in the znode named /mytest, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] set /mytest "hello"

The following will be the output:

Output.12

cZxid = 0x100000004
ctime = Sat May 10 22:52:50 EDT 2014
mZxid = 0x100000008
mtime = Sat May 10 23:05:48 EDT 2014
pZxid = 0x100000004
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0

As can be observed from the Output.12, the version number of the data at the znode named /mytest has been incremented to 1.

To add a watch on the znode named /mytest, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] stat /mytest true

The following will be the output:

Output.13

cZxid = 0x100000004
ctime = Sat May 10 22:52:50 EDT 2014
mZxid = 0x100000004
mtime = Sat May 10 23:05:48 EDT 2014
pZxid = 0x100000004
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0

To set a watch on a path, we can either use the stat command or the get command.

Now, let us again change (or modify) the data stored in the znode named /mytest by executing the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] set /mytest "world"

The following will be the output:

Output.14

WATCHER::
WatchedEvent state:SyncConnected type:NodeDataChanged path:/mytest

cZxid = 0x100000004 ctime = Sat May 10 22:52:50 EDT 2014 mZxid = 0x100000008 mtime = Sat May 10 23:06:37 EDT 2014 pZxid = 0x100000004 cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 7 numChildren = 0

Notice the WatchedEvent fire in the above Output.14.

Finally, to delete the znode named /mytest, execute the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] delete /mytest

There will be no output.

Now issue the command to list all the znodes in ZooKeeper by executing the following command:

[zk: 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181(CONNECTED) 0] ls /

The following will be the output:

Output.15

[zookeeper]