Saturday, 20 December 2014

LZO Compression in Hadoop and HBase


LZO's licence (GPL) is incompatible with Hadoop (Apache) and therefore one should install the LZO separately in cluster to enable LZO compression in Hadoop and HBase. LZO compression format is split-table compression. It provides the high compression and decompression speed.
Perform the below steps to enable the LZO compression in Hadoop and HBase:
1.       Install the LZO development packages:

 sudo yum install lzo lzo-devel

2.       Download the Latest LZO release using below command:

 wget https://github.com/twitter/hadoop-lzo/archive/release-0.4.17.zip

3.       Unzip the downloaded bundle:

 unzip release-0.4.17.zip

4.       Change the current directory to the extracted folder:

cd hadoop-lzo-release-0.4.17

5.       Run the command to generate the native libraries

 ant compile-native

6.       Copy the generated jar and native libraries to Hadoop and HBase lib directories.

 cp build/hadoop-lzo-0.4.17.jar $HADOOP_HOME/lib/
 cp build/hadoop-lzo-0.4.17.jar $HBASE_HOME/lib/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/
 cp build/hadoop-lzo-0.4.17/lib/native/Linux-amd64-64/* $HBASE_HOME/lib/native/

7.       Add the following properties in core-site.xml file of hadoop.


 <property>
                <name>io.compression.codecs</name>
                <value>
                    org.apache.hadoop.io.compress.DefaultCodec,
                    org.apache.hadoop.io.compress.GzipCodec,
                    org.apache.hadoop.io.compress.BZip2Codec,
        org.apache.hadoop.io.compress.DeflateCodec,
        org.apache.hadoop.io.compress.SnappyCodec,
        org.apache.hadoop.io.compress.Lz4Codec,
        com.hadoop.compression.lzo.LzoCodec,
        com.hadoop.compression.lzo.LzopCodec
   </value>
  </property>
  <property>
                <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


8.       Sync the hadoop and HBase Home directory on all nodes of hadoop and hbase cluster.

 rsync $HADOOP_HOME/ node1:$HADOOP_HOME/ node2:$HADOOP_HOME/
 rsync $HBASE_HOME/ node1:$HBASE_HOME/ node2:$HBASE_HOME/

9.       Add the HADOOP_OPTS  variable in .bashrc  file on all hadoop nodes:

 export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/"

10.   Add the HBASE_OPTS  variable in .bashrc  file on all HBase nodes:

 export HBASE_OPTS="-Djava.library.path=$HBASE_HOME/lib/native/:$HBASE_HOME/lib/"

11.   Verify the LZO compression in Hadoop:

a.       Create a LZO compressed file using lzop utility. Below command will create a compressed file for the LICENSE.txt file which is available inside the HADOOP_HOME directory.

lzop LICENSE.txt

b.      Copy the Generated LICENSE.txt.lzo file to / (root) HDFS path using below command.

bin/hadoop fs -copyFromLocal LICENSE.txt.lzo /

c.       Index the LICENSE.txt.lzo file in HDFS using below command.

bin/hadoop jar lib/hadoop-lzo-0.4.17.jar com.hadoop.compression.lzo.LzoIndexer /LICENSE.txt.lzo

Once you execute the above command you will see the below output on console. You can also verify the index file creation on HADOOP UI in HDFS Browser.

14/12/20 14:04:05 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/12/20 14:04:05 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo revc461d77a0feec38a2bba31c4380ad60084c09205]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/repo/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/12/20 14:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/20 14:04:08 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /LICENSE.txt.lzo, size 0.00 GB...
14/12/20 14:04:08 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
14/12/20 14:04:09 INFO lzo.LzoIndexer: Completed LZO Indexing in 0.61 seconds (0.01 MB/s).  Index size is 0.01 KB.

12.   Verify the LZO Compression in HBase:

You can verify the LZO Compression in HBase by creating a table using the LZO compression from HBase shell.
a.       Create a table with LZO Compression using below command:

create ‘t1’, { NAME=>’f1’, COMPRESSION=>’lzo’ }

b.      Verify the Compression type in table using below describe command on table:

describe ‘t1’

Once you execute the above command you will see the below console output. The LZO Compression for the table can also be verified on HBase UI.

 DESCRIPTION ENABLED
 't1',  { NAME => 'f1' , DATA_BLOCK_ENCODING => 'NONE' ,  BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSION true  S => '1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCK SIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.8250 seconds

2 comments:

  1. bin/hadoop jar lib/hadoop-lzo-0.4.17.jar com.hadoop.compression.lzo.LzoIndexer /LICENSE.txt.lzo
    15/03/24 21:08:01 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
    java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
    at java.lang.Runtime.loadLibrary0(Runtime.java:849)
    at java.lang.System.loadLibrary(System.java:1088)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.(GPLNativeCodeLoader.java:32)
    at com.hadoop.compression.lzo.LzoCodec.(LzoCodec.java:71)
    at com.hadoop.compression.lzo.LzoIndexer.(LzoIndexer.java:36)
    at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:134)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    15/03/24 21:08:01 ERROR lzo.LzoCodec: Cannot load native-lzo without native-hadoop
    15/03/24 21:08:02 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /LICENSE.txt.lzo, size 0.00 GB...
    Exception in thread "main" java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzopCodec.createDecompressor(LzopCodec.java:128)
    at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:229)
    at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
    at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
    at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
    at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

    ReplyDelete
    Replies
    1. Hi Abhi,

      From the shared logs it looks like, you missed to add the HADOOP_OPTS environment variable in .bashrc file using the 9th step.

      And if you didn't miss to add that in .bashrc file then i guess you missed to source the .bashrc file. Once you source the .bashrc file then only the HADOOP_OPTS environemnt variable will be available for the current session.

      So do try the above thing and let me know if you face the same issue again.

      Regards,
      Hokam

      Delete