apache spark - How to move 1000 files to RDD's? -
i new in apache spark , need help.
i have python script reading 6 tdms files (tdms() function) , building graph numerical data of each of them (graph() function). loop. want load 1000 such files , run script in parallels each one. want create rdd's files , apply function each file?
how can it? can define number of nodes in spark?
have tried making python list includes files need read, , run in loop read data file, create rdd, run graph function, , guess save it?
or make file list rdd, , run map, lambda(for graph), each.
if care parallel run, can keep loading data , make 1 big rdd, , call sc.parallelize. can either decide spark it, or can specify number want use calling sc.parallelize(data, ).
Comments
Post a Comment