python - pandas dataframe format from elements with multiple identifiers -

i have csv file following format. imagine white spaces comma seperated.

                                    slot0                          slot1   serial     timestamp     height     width     score     height     width     score    .... fa125       2015_05      215.00     125.01    156.02    235.23     862.23    135.52   ....

this goes on thousands of rows , repeats many slot#'s pattern. slot# associated "height, width, , score" centered on. is, slot0 corresponds first height, width , score , slot1 corresponds second height, width , score. each slot has 3 measurements.

i'm having trouble finding best way stick data pandas.dataframe associate slot number particular heights, widths , scores, serial or timestamps.

one thing have thought of this, it's not clear if can better.

serial     timestamp  s0_height  s0_width  s0_score  s1_height  s1_width  s1_score  .... fa125       2015_05    215.00     125.01    156.02     235.23     862.23   135.52   ....

it seems little awkard in form, if it's way guess can manage.

# maybe this? pd.dataframe({'fsj1503007n-ct0': ['20150311_021738', 140, 123, 213]}, ['timestamp', 's0_height', 's0_score', 's0_width'])

keep in mind can adapt csv in way instantiate dataframe, problem i'm not sure best way create dataframe data.

thanks!

import pandas pd import numpy np # create string buffer, don't need if have csv file io import stringio    # replicate csv file structure line1 = ','.join(['slot' + str(x) x in range(3)]) + '\n' line2 = 'serial,timestamp,' + ','.join(['height', 'width', 'score']*3) + '\n' np.random.seed(0) data = np.random.randint(100, 1000, size=9) line3 =  'fa125,2015_5,'+','.join([str(x) x in data]) + '\n' csv_buffer = line1+line2+line3  out[40]: 'slot0,slot1,slot2\nserial,timestamp,height,width,score,height,width,score,height,width,score\nfa125,2015_5,784,659,729,292,935,863,807,459,109\n'   # read file, set first 2 columns index, rest using multi-level column index level1 = ['slot'+str(x) x in range(3)] level2 = ['height', 'width', 'score'] multi_level_columns = pd.multiindex.from_product([level1, level2])  df = pd.read_csv(stringio(csv_buffer), index_col=[0,1], skiprows=[0], header=0) df.columns = multi_level_columns  out[62]:                    slot0              slot1              slot2                              height width score height width score height width score serial timestamp                                                          fa125  2015_5       784   659   729    292   935   863    807   459   109  # can reshape original df  df.stack(level=0)  out[63]:                          height  score  width serial timestamp                             fa125  2015_5    slot0     784    729    659                  slot1     292    863    935                  slot2     807    109    459

Search This Blog

Mind Blowing Facts

python - pandas dataframe format from elements with multiple identifiers -

Comments

Post a Comment

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -