Friday, July 27, 2012

Loading protobuf format file into pig script using loadfunc pig UDF


Twitter's open source library elephant bird has many such loaders:https://github.com/kevinweil/elephant-bird
You can use LzoProtobufB64LinePigLoader and LzoProtobufBlockPigLoader.https://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load
To use it, you just need to do:
1) Build the elephant-bird using maven.
2) Register the elephant-bird's Core and Pig jar into your pig scripts
3) Load the lzo Files using com.twitter.elephantbird.pig.load.LzoTokenizedLoader class.
Example:
REGISTER /root/pig/eleplant-bird/elephant-bird-core-3.0.2.jar;
REGISTER /root/pig/eleplant-bird/elephant-bird-pig-3.0.2.jar;
A = LOAD '/root/files' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');
STORE A INTO '/root';

No comments:

Post a Comment