apache spark - How to move 1000 files to RDD's? -


i new in apache spark , need help.

i have python script reading 6 tdms files (tdms() function) , building graph numerical data of each of them (graph() function). loop. want load 1000 such files , run script in parallels each one. want create rdd's files , apply function each file?

how can it? can define number of nodes in spark?

have tried making python list includes files need read, , run in loop read data file, create rdd, run graph function, , guess save it?

or make file list rdd, , run map, lambda(for graph), each.

if care parallel run, can keep loading data , make 1 big rdd, , call sc.parallelize. can either decide spark it, or can specify number want use calling sc.parallelize(data, ).


Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -