Tuesday 3 January 2017

Fencing Method for ZK based HA in Hadoop

While enabling  ZK quorum based HA via ZKFC service you can observed that default fencing method (i.e sshfence ) alone is not sufficient .Details about scenarios and fencing methods Discussed below:

dfs.ha.fencing.methods - a list of scripts or Java classes which will be used to fence the Active NameNode during a failover

The fencing methods used during a failover are configured as a carriage-return-separated list, which will be attempted in order until one indicates that fencing has succeeded. There are two methods which ship with Hadoop: shell and sshfence. For information on implementing your own custom fencing method, see the org.apache.hadoop.ha.NodeFencer class.

sshfence - SSH to the Active NameNode and kill the process

The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service’s TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Thus, one must also configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files. For example:
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>

    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/exampleuser/.ssh/id_rsa</value>
    </property>


shell - run an arbitrary shell command to fence the Active NameNode

The shell fencing method runs an arbitrary shell command. It may be configured like so:
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
    </property>
The string between ‘(’ and ‘)’ is passed directly to a bash shell and may not include any closing parentheses.

If the shell command returns an exit code of 0, the fencing is determined to be successful. If it returns any other exit code, the fencing was not successful and the next fencing method in the list will be attempted.
Note: This fencing method does not implement any timeout. If timeouts are necessary, they should be implemented in the shell script itself (eg by forking a subshell to kill its parent in some number of seconds)

Scenario #1
If namenode process killed or Active Name Node rebooted
  Then passive node using ZKFC process able to make passive namenode  state ‘active’ using (sshfence)
  ssh work fine which allow to fence active namenode

Scenario #2
If namenode shutdown/node cable unplugged/ major hardware/power failure
    Then ssh will not happen and HA will never work .
    SSH fail so fencing of active Namenode failed
  
Best Approach
To avoid Scenario #2 one can use both fencing method which will be tried by ZKFC in the order of: if the first one fails, the second one will be tried.
 if ssh fencing failed then shell(/bin/true) will always true and fence active namenode.

Example:

<value>sshfence
       shell(/bin/true)

</value>

Property TAG:

<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<name>dfs.ha.fencing.methods</name>
<value>sshfence
       shell(/bin/true)
</value>

Tested this approach and it worked fine specially when Primary NN host down due to major hardware/power failure as mentioned.

26 comments:

  1. Thank you for sharing the information here. Its much informative and really i got some valid information. You had posted the amazing article.

    MSBI Training in Chennai

    Informatica Training in Chennai

    Dataware Housing Training in Chennai

    ReplyDelete
  2. Thanks for Helping

    ReplyDelete
  3. our blog is to help those of you who are unaware of the steps you should take in the event that you decide to dig on your property. This is good information for anyone who plans to dig.wood fence styles

    ReplyDelete
  4. This post will be very useful to us....i like your blog and helpful to me....nice thoughts for your great work....



    Fencing Services

    ReplyDelete
  5. great info about hadoop in this blog At SynergisticIT we offer the best hadoop training in california

    ReplyDelete
  6. very nice article,thank you for sharing this awesome articlw with us.
    keep updating,...


    big data and hadoop online training

    ReplyDelete