Today I attended Hadoop meet up in General Assembly. The topic was Pig, large volume data processing framework, build on the top of Hadoop. It uses scripting language like javascript for manipulating data and transform it into Map Reduce jobs on the top of Hadoop.
Pig is a good way to learn Hadoop.
The framework is very easy to get in and is perfect for jump starters. Sometimes Map Reduce programming is too low level. Here Pig comes on stage Just download Cloudera VM or install rpm package. To can run it via Grunt Shell, Script file or Embedded program.
Pig is another way to proceeds data, complimentary to Hive, SQL style data processing framework. It gives more con toll then Hive and allows to process complex data flows.
The example of pig script:
Some useful pig resources:
http://wiki.apache.org/pig/PigTutorial
http://www.cloudera.com/videos/introduction_to_pig
Pig is a good way to learn Hadoop.
The framework is very easy to get in and is perfect for jump starters. Sometimes Map Reduce programming is too low level. Here Pig comes on stage Just download Cloudera VM or install rpm package. To can run it via Grunt Shell, Script file or Embedded program.
Pig is another way to proceeds data, complimentary to Hive, SQL style data processing framework. It gives more con toll then Hive and allows to process complex data flows.
The example of pig script:
#Use the PigStorage function to load the excite log file (excite.log or excite-small.log) into the “raw” bag as an array of records with the fields user, time, and query.
raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);
# Call the NonURLDetector UDF to remove records if the query
field is empty or a URL.
clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);
# Call the ToLower UDF to change the query field to lowercase.
clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(query) as query;
...
Some useful pig resources:
http://wiki.apache.org/pig/PigTutorial
http://www.cloudera.com/videos/introduction_to_pig
0 comments:
Post a Comment