python - pandas dataframe format from elements with multiple identifiers -
i have csv file following format. imagine white spaces comma seperated.
slot0 slot1 serial timestamp height width score height width score .... fa125 2015_05 215.00 125.01 156.02 235.23 862.23 135.52 ....
this goes on thousands of rows , repeats many slot#'s pattern. slot# associated "height, width, , score" centered on. is, slot0 corresponds first height, width , score , slot1 corresponds second height, width , score. each slot has 3 measurements.
i'm having trouble finding best way stick data pandas.dataframe associate slot number particular heights, widths , scores, serial or timestamps.
one thing have thought of this, it's not clear if can better.
serial timestamp s0_height s0_width s0_score s1_height s1_width s1_score .... fa125 2015_05 215.00 125.01 156.02 235.23 862.23 135.52 ....
it seems little awkard in form, if it's way guess can manage.
# maybe this? pd.dataframe({'fsj1503007n-ct0': ['20150311_021738', 140, 123, 213]}, ['timestamp', 's0_height', 's0_score', 's0_width'])
keep in mind can adapt csv in way instantiate dataframe, problem i'm not sure best way create dataframe data.
thanks!
import pandas pd import numpy np # create string buffer, don't need if have csv file io import stringio # replicate csv file structure line1 = ','.join(['slot' + str(x) x in range(3)]) + '\n' line2 = 'serial,timestamp,' + ','.join(['height', 'width', 'score']*3) + '\n' np.random.seed(0) data = np.random.randint(100, 1000, size=9) line3 = 'fa125,2015_5,'+','.join([str(x) x in data]) + '\n' csv_buffer = line1+line2+line3 out[40]: 'slot0,slot1,slot2\nserial,timestamp,height,width,score,height,width,score,height,width,score\nfa125,2015_5,784,659,729,292,935,863,807,459,109\n' # read file, set first 2 columns index, rest using multi-level column index level1 = ['slot'+str(x) x in range(3)] level2 = ['height', 'width', 'score'] multi_level_columns = pd.multiindex.from_product([level1, level2]) df = pd.read_csv(stringio(csv_buffer), index_col=[0,1], skiprows=[0], header=0) df.columns = multi_level_columns out[62]: slot0 slot1 slot2 height width score height width score height width score serial timestamp fa125 2015_5 784 659 729 292 935 863 807 459 109 # can reshape original df df.stack(level=0) out[63]: height score width serial timestamp fa125 2015_5 slot0 784 729 659 slot1 292 863 935 slot2 807 109 459
Comments
Post a Comment