Saturday, 20 December 2014

LZO Compression in Hadoop and HBase


LZO's licence (GPL) is incompatible with Hadoop (Apache) and therefore one should install the LZO separately in cluster to enable LZO compression in Hadoop and HBase. LZO compression format is split-table compression. It provides the high compression and decompression speed.
Perform the below steps to enable the LZO compression in Hadoop and HBase:
1.       Install the LZO development packages:

 sudo yum install lzo lzo-devel

2.       Download the Latest LZO release using below command:

 wget https://github.com/twitter/hadoop-lzo/archive/release-0.4.17.zip

3.       Unzip the downloaded bundle:

 unzip release-0.4.17.zip

4.       Change the current directory to the extracted folder:

cd hadoop-lzo-release-0.4.17

5.       Run the command to generate the native libraries

 ant compile-native

6.       Copy the generated jar and native libraries to Hadoop and HBase lib directories.

 cp build/hadoop-lzo-0.4.17.jar $HADOOP_HOME/lib/
 cp build/hadoop-lzo-0.4.17.jar $HBASE_HOME/lib/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HBASE_HOME/lib/native/

7.       Add the following properties in core-site.xml file of hadoop.


 <property>
                <name>io.compression.codecs</name>
                <value>
                    org.apache.hadoop.io.compress.DefaultCodec,
                    org.apache.hadoop.io.compress.GzipCodec,
                    org.apache.hadoop.io.compress.BZip2Codec,
        org.apache.hadoop.io.compress.DeflateCodec,
        org.apache.hadoop.io.compress.SnappyCodec,
        org.apache.hadoop.io.compress.Lz4Codec,
        com.hadoop.compression.lzo.LzoCodec,
        com.hadoop.compression.lzo.LzopCodec
   </value>
  </property>
  <property>
                <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


8.       Sync the hadoop and HBase Home directory on all nodes of hadoop and hbase cluster.

 rsync $HADOOP_HOME/ node1:$HADOOP_HOME/ node2:$HADOOP_HOME/
 rsync $HBASE_HOME/ node1:$HBASE_HOME/ node2:$HBASE_HOME/

9.       Add the HADOOP_OPTS  variable in .bashrc  file on all hadoop nodes:

 export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/"

10.   Add the HBASE_OPTS  variable in .bashrc  file on all HBase nodes:

 export HBASE_OPTS="-Djava.library.path=$HBASE_HOME/lib/native/:$HBASE_HOME/lib/"

11.   Verify the LZO compression in Hadoop:

a.       Create a LZO compressed file using lzop utility. Below command will create a compressed file for the LICENSE.txt file which is available inside the HADOOP_HOME directory.

lzop LICENSE.txt

b.      Copy the Generated LICENSE.txt.lzo file to / (root) HDFS path using below command.

bin/hadoop fs -copyFromLocal LICENSE.txt.lzo /

c.       Index the LICENSE.txt.lzo file in HDFS using below command.

bin/hadoop jar lib/hadoop-lzo-0.4.17.jar com.hadoop.compression.lzo.LzoIndexer /LICENSE.txt.lzo

Once you execute the above command you will see the below output on console. You can also verify the index file creation on HADOOP UI in HDFS Browser.

14/12/20 14:04:05 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/12/20 14:04:05 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo revc461d77a0feec38a2bba31c4380ad60084c09205]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/repo/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/12/20 14:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/20 14:04:08 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /LICENSE.txt.lzo, size 0.00 GB...
14/12/20 14:04:08 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
14/12/20 14:04:09 INFO lzo.LzoIndexer: Completed LZO Indexing in 0.61 seconds (0.01 MB/s).  Index size is 0.01 KB.

12.   Verify the LZO Compression in HBase:

You can verify the LZO Compression in HBase by creating a table using the LZO compression from HBase shell.
a.       Create a table with LZO Compression using below command:

create ‘t1’, { NAME=>’f1’, COMPRESSION=>’lzo’ }

b.      Verify the Compression type in table using below describe command on table:

describe ‘t1’

Once you execute the above command you will see the below console output. The LZO Compression for the table can also be verified on HBase UI.

 DESCRIPTION ENABLED
 't1',  { NAME => 'f1' , DATA_BLOCK_ENCODING => 'NONE' ,  BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSION true  S => '1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCK SIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.8250 seconds

Sunday, 23 November 2014

ElasticSearch Installation

This post explains the installation steps of ElasticSearch(1.3.4) cluster on linux machine. For ElasticSeach, It is required to have java installed on machines. If java is not available on machines then first install it.

After installing the java on machines, follow the below steps to install the ElasticSearch.

1. Download ElasticSearch: Download the ElasticSearch bundle from the website and extract the bundle using the below commands. 

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.4.zip 
tar –xvf elasticsearch-1.3.4.zip


3. Configure ElasticSearch: To configure the ElasticSearch, It is required to add the following properties in elasticsearch.yaml file, which is available inside the conf directory of extracted directory. 

a. cluster.name: This is the name of elastic search cluster and it should be same across all nodes of the cluster. This property is responsible for joining all the nodes in cluster. 
b. node.name:  This property defines the name of node. It should be unique across all the nodes.
c. path.data: This property defines the path for the data which is going to stored in elastic search for the indexes.
d. action.auto_create_index: This property restricts the auto creation of indexes. It accepts the true/false value. 
e. index.number_of_shards: It is used to set the number of shards for the index. 
f. index.number_of_replicas: It is used to set the number of replication for the index. 
g. bootstrap.mlockall: It Locks the memory for better performance of ElasticSearch.

4. Start ElasticSearch: To start the elastic search run the below command from the bin directory of extracted elastic search directory. A flag -d is used to run the elasticsearch in daemon mode.

./elasticsearch -d

ElasticSearch Cluster Setup on two nodes:

To install the elastic search as a cluster, you have to perform the above 3 steps on the nodes where you want to install the elastic search cluster. To configure the ElasticSearch let’s take an example of 2 nodes for the elastic search cluster. 


ElasticSearch Configuration on node1 in elasticsearch.yaml file.
cluster.name : elasticsearch_cluster
node.name : node1
path.data : /home/$username/elasticsearch-1.3.4/data_dir
action.auto_create_index : false
index.number_of_shards : 2
index.number_of_replicas : 1
bootstrap.mlockall : false


ElasticSearch configuration on node2 in elasticsearch.yaml file.
cluster.name : elasticsearch_cluster
node.name : node2
path.data : /home/$username/elasticsearch-1.3.4/data_dir
action.auto_create_index : false
index.number_of_shards : 2
index.number_of_replicas : 1
bootstrap.mlockall : false

Now run the elastic search on both the nodes using the command mentioned in 3rd step.

Thursday, 3 April 2014

Leveraging Embedded Tomcat in Maven Application


In java when we develop a web application, we go for installation of tomcat on intended system. And then we deploy the application on tomcat for running it or for accessing it from the browser.

How about developing an application which can directly run on the intended system without installation of tomcat? Is it really achievable?

Yes, this can be easily achieved through Maven plug-in. Let us see how!

You might be puzzled as to how this is possible to run a web application without the installation of web container. Let us quickly have a look at the procedure which is pretty simple.

To accomplish this you just need to use a maven plug-in in pom.xml file of application. This plugin simply embed the tomcat inside the application and on build of application it generates an executable jar file, which can be easily executed by the java jar command and you can access the application on browser.

Please follow the below detailed step-by-step procedure:
1.   Add the following plugin in your pom.xml file:

<plugin>
    <groupId>org.apache.tomcat.maven</groupId>
    <artifactId>tomcat7-maven-plugin</artifactId>
    <version>2.1</version>
    <executions>
        <execution>
        <id>tomcat-run</id>
     <goals>
            <goal>exec-war-only</goal>
        </goals>
       <phase>package</phase>
        <configuration>
            <path>/test</path>
            <attachartifactclassifier>exec-war</attachartifactclassifier>
            <attachartifactclassifiertype>jar</attachartifactclassifiertype>
        </configuration>
        </execution>
    </executions>
</plugin>

2.     Build your application: Build your application using the install or package goal. This will generate the 3 extra files in application target folder.
 
    a.  yourappname-version-exec-war.jar: An executable jar file with embedded tomcat. 
    b. war-exec.manifest:  Manifest file contains Main class name. 
    c. war-exec.properties: A properties file contains the tomcat information and some configuration options.
3.     Run your application: Once you are ready with the above three files, execute the below command to run the application.

 java –jar yourappname-version-exec-war.jar