hadoop - Downloading list of files in parallel in Apache Pig -
i have simple text file contains list of folders on ftp servers. each line separate folder. each folder contains couple of thousand images. want connect each folder, store files inside foder in sequencefile , remove folder ftp server. have written simple pig udf this. here is: dirs = load '/var/location.txt' using pigstorage(); results = foreach dirs generate download_whole_folder_into_single_sequence_file($0); /* don't need results bag. dummy bag */ the problem i'm not sure if each line of input processed in separate mapper. input file not huge file couple of hundred lines. if pure map/reduce use nlineinputformat , process each line in separate mapper . how can achieve same thing in pig? pig lets write own load functions , let specify inputformat you'll using. write own. that said, job described sounds involve single map-reduce step. since using pig wouldn't reduce complexity in case, , you'd have write custom code use pig, i'd sug...