Saturday 20 December 2014

LZO Compression in Hadoop and HBase


LZO's licence (GPL) is incompatible with Hadoop (Apache) and therefore one should install the LZO separately in cluster to enable LZO compression in Hadoop and HBase. LZO compression format is split-table compression. It provides the high compression and decompression speed.
Perform the below steps to enable the LZO compression in Hadoop and HBase:
1.       Install the LZO development packages:

 sudo yum install lzo lzo-devel

2.       Download the Latest LZO release using below command:

 wget https://github.com/twitter/hadoop-lzo/archive/release-0.4.17.zip

3.       Unzip the downloaded bundle:

 unzip release-0.4.17.zip

4.       Change the current directory to the extracted folder:

cd hadoop-lzo-release-0.4.17

5.       Run the command to generate the native libraries

 ant compile-native

6.       Copy the generated jar and native libraries to Hadoop and HBase lib directories.

 cp build/hadoop-lzo-0.4.17.jar $HADOOP_HOME/lib/
 cp build/hadoop-lzo-0.4.17.jar $HBASE_HOME/lib/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HBASE_HOME/lib/native/

7.       Add the following properties in core-site.xml file of hadoop.


 <property>
                <name>io.compression.codecs</name>
                <value>
                    org.apache.hadoop.io.compress.DefaultCodec,
                    org.apache.hadoop.io.compress.GzipCodec,
                    org.apache.hadoop.io.compress.BZip2Codec,
        org.apache.hadoop.io.compress.DeflateCodec,
        org.apache.hadoop.io.compress.SnappyCodec,
        org.apache.hadoop.io.compress.Lz4Codec,
        com.hadoop.compression.lzo.LzoCodec,
        com.hadoop.compression.lzo.LzopCodec
   </value>
  </property>
  <property>
                <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


8.       Sync the hadoop and HBase Home directory on all nodes of hadoop and hbase cluster.

 rsync $HADOOP_HOME/ node1:$HADOOP_HOME/ node2:$HADOOP_HOME/
 rsync $HBASE_HOME/ node1:$HBASE_HOME/ node2:$HBASE_HOME/

9.       Add the HADOOP_OPTS  variable in .bashrc  file on all hadoop nodes:

 export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/"

10.   Add the HBASE_OPTS  variable in .bashrc  file on all HBase nodes:

 export HBASE_OPTS="-Djava.library.path=$HBASE_HOME/lib/native/:$HBASE_HOME/lib/"

11.   Verify the LZO compression in Hadoop:

a.       Create a LZO compressed file using lzop utility. Below command will create a compressed file for the LICENSE.txt file which is available inside the HADOOP_HOME directory.

lzop LICENSE.txt

b.      Copy the Generated LICENSE.txt.lzo file to / (root) HDFS path using below command.

bin/hadoop fs -copyFromLocal LICENSE.txt.lzo /

c.       Index the LICENSE.txt.lzo file in HDFS using below command.

bin/hadoop jar lib/hadoop-lzo-0.4.17.jar com.hadoop.compression.lzo.LzoIndexer /LICENSE.txt.lzo

Once you execute the above command you will see the below output on console. You can also verify the index file creation on HADOOP UI in HDFS Browser.

14/12/20 14:04:05 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/12/20 14:04:05 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo revc461d77a0feec38a2bba31c4380ad60084c09205]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/repo/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/12/20 14:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/20 14:04:08 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /LICENSE.txt.lzo, size 0.00 GB...
14/12/20 14:04:08 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
14/12/20 14:04:09 INFO lzo.LzoIndexer: Completed LZO Indexing in 0.61 seconds (0.01 MB/s).  Index size is 0.01 KB.

12.   Verify the LZO Compression in HBase:

You can verify the LZO Compression in HBase by creating a table using the LZO compression from HBase shell.
a.       Create a table with LZO Compression using below command:

create ‘t1’, { NAME=>’f1’, COMPRESSION=>’lzo’ }

b.      Verify the Compression type in table using below describe command on table:

describe ‘t1’

Once you execute the above command you will see the below console output. The LZO Compression for the table can also be verified on HBase UI.

 DESCRIPTION ENABLED
 't1',  { NAME => 'f1' , DATA_BLOCK_ENCODING => 'NONE' ,  BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSION true  S => '1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCK SIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.8250 seconds