Monday, January 7, 2013

Deploying Zookeeper Ensemble


A brief  about ZooKeeper: A Distributed Coordination Service for Distributed Applications

It exposes a simple set of primitives that distributed applications   can   build  upon to implement higher level services  for synchronization, configuration maintenance, and groups and naming.
Coordination services are notoriously hard to get right. They are especially.prone to errors such as race conditions and deadlock.  The motivation   behind  ZooKeeper  is to   relieve   distributed  applications  the responsibility.of implementing coordination services from scratch.

Use Cases :

Zookeeper  now these getting popularity for distributed application like       hadoop, Flume ,HBase etc.
For any distributed application , ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to   manage. 
Once ,Zookeeper been deployed , it can be shared accross number of distributed Apllication for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.

ZOOKEEPER ENSEMBLE:

 Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an Zookeeper ensemble.
In zookeeper ensemble , all zookeeper server   must  all  know  about  each other  zookeeper server . They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store.
Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.
ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives, deadlocks etc.

DEPLOYING ZOOKEEPER ENSEMBLE  ON A SINGLE MACHINE :

Well , after a brief about zookeeper ensemble architechture , we are ready for  ensemble deployment . I am giving here steps  to be followed . these  all  steps for three server cluster,  you can  change  accordingly   as  number  of    servers increases / decreses.
Step :1. download zookeeper :
              Downloaded a stable ZooKeeper release .
$ sudo wget  http://mirror.cloudera.com/apache/hadoop/zookeeper/zookeeper-3.3.1/zookeeper-3.3.1.tar.gz

Step :2. Unpacking Download

              Unpack it  at three places and rename    these  as.
                
/usr/local/zookeeper1
/usr/local/zookeeper2
/usr/local/zookeeper3  
      
            and  use cd  to zookeeper root folder.

Step :3. Creating Configuration file 

              To start ZooKeeper you need a configuration file.
              Create it in conf/zoo.cfg .   Here I giving sample for these three server.
           server.1 : /usr/local/zookeeper1/conf/zoo.cfg
         tickTime=2000
         initLimit=10
         syncLimit=5
         dataDir=/var/zookeeper1
         clientPort=2184
         server.1=localhost:2888:3888
         server.2= localhost:2889:3889
         server.3= localhost:2890:3890 
            server.2 /usr/local/zookeeper2/conf/zoo.cfg
         tickTime=2000
         initLimit=10
         syncLimit=5
         dataDir=/var/zookeeper2
         clientPort=2185
         server.1=localhost:2888:3888
         server.2= localhost:2889:3889
         server.3= localhost:2890:3890 
            server.3 /usr/local/zookeeper3/conf/zoo.cfg
         tickTime=2000
         initLimit=10
         syncLimit=5
         dataDir=/var/zookeeper3
         clientPort=2186
         server.1=localhost:2888:3888
         server.2= localhost:2889:3889
         server.3= localhost:2890:3890 
Here, a brief about each configuration parameter : 
dataDir : the location to store the in-memory database snapshots and, .unless specified otherwise, the transaction log of updates to the database.  
clientPort : the port to listen for client connections. 
initLimit :    timeouts ZooKeeper uses to limit the length of time the.ZooKeeper  servers in quorum have to connect to a leader.
syncLimit  : limits how far out of date server can be from a leader.
tickTime : With both of these timeouts, you specify the unit of time  using  tickTime in miliseconds.
server.X= hostname:nnnn:mmmm   : list the servers that make up.the ZooKeeper service.where X is ids of servers. 
nnnn :    quorum election port
mmmm:  leader election port
Note : If you want to test multiple servers on a single machine, specify the servername aslocalhost with unique quorum & leader election ports  for each server.X in that server's config file. Of course separate dataDirs and distinct clientPorts are also necessary .  

 Step :4   Define Server-id
Then,you should come with question,“How would server know his id ?”
When the server starts up, it knows which server it is by looking for the.file myid in the data directory dataDir . That file has the contains the server number, in ASCII.  
To do so, follow me:
Create a file named myid at  dataDir directory as specified in .configuration.and put id of that server.
 For this cluster , create file myid at three place as shown :
  /var/zookeeper1/myid          contains    1
  /var/zookeeper2/myid          contains    2
  /var/zookeeper3/myid          contains    3

Step :5   Running  zookeeper Ensemble :
 you created the configuration file, you can start   Zookeepers as :
 open three terminals :
            At Termanal 1:
$  cd  /usr/local/zookeeper1/
$  bin/zkServer.sh  start  

           At Terminal 2:
$  cd  /usr/local/zookeeper2/
$  bin/zkServer.sh  start  

          At Terminal 3:
$  cd  /usr/local/zookeeper3/
$  bin/zkServer.sh  start  

              Now , zookeeper ensemble is running mode .

Step :5  Connecting to ZooKeeper

Once ZooKeeper is running, you have several options for connection to it:
·         Java: Use
     $ cd <path_to_zookeeper_root>
     $ bin/zkCli.sh -server <hostname>:<clientPort>


As here :
     $ cd /usr/local/zookeeper1
     $ bin/zkCli.sh -server localhost:2184


This lets you perform simple, file-like operations.
C: compile cli_mt (multi-threaded) or cli_st (single-threaded) by running make cli_mt or make cli_st in thesrc/c subdirectory in the ZooKeeper sources.
You can run the program from src/c using:
   LD_LIBRARY_PATH=. cli_mt localhost:2184
or
   LD   _LIBRARY_PATH=. cli_st localhost:2184


DEPLOYING ZOOKEEPER ENSEMBLE ACCROS NETWORK:

 

Step :1. download zookeeper :

            Downloaded and unpack a stable ZooKeeper release at each machine  .
       $ cd  /usr/local
       $ sudo wget  http://mirror.cloudera.com/apache/hadoop/zookeeper/zookeeper-   3.3.1/zookeeper-3.3.1.tar.g
       $ sudo tar -zxvf  zookeeper-3.3.1.tar.gz 
       $  cd  /zookeeper-3.3.1 

Step :2. Create configuration :

   To start ZooKeeper you need a configuration file. Create it in  conf/zoo.cfg  on .each server.
    Here I giving sample for these three server.
   /usr/local/zookeeper-3.3.1/conf/zoo.cfg
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/var/zookeeper
    clientPort=2181
    server.1=192.168.145.110:2888:3888
    server.2=192.168.145.111:2888:3888
    server.3=192.168.145.112:2888:3888  

Step :3    Define server-id

Create a file named myid at  dataDir directory as specified in configuration.and put id of that server.
For this cluster , create file myid at three place as shown :
At   192.168.145.110  /var/zookeeper  contains    1
At   192.168.145.111  /var/zookeeper  contains    2
At   192.168.145.112  /var/zookeeper  contains    3

 Step :4   Running  zookeeper Ensemble :

 you created the configuration file, you can start   Zookeepers with terminal .
     At  each machine :
  $  cd  /usr/local/zookeeper-3.3.1/
  $  bin/zkServer.sh  start  

Now , zookeeper ensemble is running mode .

Step :5  Connecting to ZooKeeper

 Once ZooKeeper is running, you have several options for connection to it:
Java: Use
 
      $ cd <path_to_zookeeper_root>
      $ bin/zkCli.sh -server <hostname>:<clientPort>
As here :
      $ cd /usr/local/zookeeper1
      $ bin/zkCli.sh -server localhost:2184 
This lets you perform simple, file-like operations.
·         C: compile cli_mt (multi-threaded) or cli_st (single-threaded) by running make cli_mt or make cli_st in the src/c subdirectory in the ZooKeeper sources.
You can run the program from src/c using:
      LD_LIBRARY_PATH=. cli_mt localhost:2184
or 
      LD_LIBRARY_PATH=. cli_st localhost:2184 

Hopefully  this article is helping you out. and tried  to make it simple and  

No comments:

Post a Comment