python - Reading a tar compressed file in pandas? -
this code seems work , takes list of files , compresses them in format pandas can read, , combines them 1 location.
edit - modified code add new files (based on file not existing in tar).
os.chdir(r'c:\\users\documents\ftp\\') saveloc = r'\\fnp\mydownloads\\' compression = "w:bz2" extension = '.tar.bz2' filename = 'global_performance' filetype = 'performance_*.csv' tarname = saveloc+filename+extension files = glob(filetype) tar = tarfile.open(tarname, compression) file in files: if file not in tarname: tar.add(file) tar.close() filename = 'global_status' filetype = 'status_*.csv' tarname = saveloc+filename+extension files = glob(filetype) tar = tarfile.open(tarname, compression) file in files: if file not in tarname: tar.add(file) tar.close()
- is there way pandas read tar file? can specify file know exists within file, or perhaps concat of files 1 read?
- being able add new files nice, assume computer has read file names determine if exists or not. there way modify code add latest files based on creation date or something? can sped compress , read newest files or perhaps within time range (30 days maybe instead of reading files in directory goes 2010)?
- as can see above, reading each file type within directory (based on filename) , adding separate tar. there way optimize bit instead of pasting same code on , on (there 10+ files to)?
edit - code seems operate slowly. intention find newest files not within tar , compress them , add them existing tar. based on time taking, thinking still compressing files , replacing them. can me make more efficient process.
Comments
Post a Comment