The SPLIT operator is used to split a relation into two or more relations.
Given below is the syntax of the SPLIT operator.
grunt> SPLIT Relation1_name INTO Relation2_name IF (condition1), Relation2_name (condition2),
Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad 002,siddarth,Battacharya,22,9848022338,Kolkata 003,Rajesh,Khanna,22,9848022339,Delhi 004,Preethi,Agarwal,21,9848022330,Pune 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 006,Archana,Mishra,23,9848022335,Chennai 007,Komal,Nayak,24,9848022334,trivendram 008,Bharathi,Nambiayar,24,9848022333,Chennai
And we have loaded this file into Pig with the relation name student_details as shown below.
student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
Let us now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.
SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25);
Verify the relations student_details1 and student_details2 using the DUMP operator as shown below.
grunt> Dump student_details1; grunt> Dump student_details2;
It will produce the following output, displaying the contents of the relations student_details1 and student_details2 respectively.
grunt> Dump student_details1; (1,Rajiv,Reddy,21,9848022337,Hyderabad) (2,siddarth,Battacharya,22,9848022338,Kolkata) (3,Rajesh,Khanna,22,9848022339,Delhi) (4,Preethi,Agarwal,21,9848022330,Pune) grunt> Dump student_details2; (5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar) (6,Archana,Mishra,23,9848022335,Chennai) (7,Komal,Nayak,24,9848022334,trivendram) (8,Bharathi,Nambiayar,24,9848022333,Chennai)