python - Modifying Multiple Rows based on Specific Criteria -


i have csv file looks this:

id         class      status    species 1          sands        d        carex 1          sands        c        eupesu 1          sands        c        poapra 2          limy         d        carcra 2          limy         c        eupesu 2          limy         c        poapra 3          limy         d        poapra 3          limy         c        eupesu 3          limy         c        poapra 

when status d , species carex or carcra want change class wet values within specific id. desired output is:

id         class     status    species 1          wet         d        carex 1          wet         c        eupesu 1          wet         c        poapra 2          wet         d        carcra 2          wet         c        eupesu 2          wet         c        poapra 3          limy        d        poapra 3          limy        c        eupesu 3          limy        c        poapra 

import pandas pd df = pd.read_table('data', sep='\s+') mask = ((df['status'] == 'd')          & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any') df.loc[mask, 'class'] = 'wet' print(df) 

yields

   id class status species 0   1   wet      d   carex 1   1   wet      c  eupesu 2   1   wet      c  poapra 3   2   wet      d  carcra 4   2   wet      c  eupesu 5   2   wet      c  poapra 6   3  limy      d  poapra 7   3  limy      c  eupesu 8   3  limy      c  poapra 

the assignment

df['mask'] = ((df['status'] == 'd')          & df['species'].isin(['carex','carcra'])) 

makes df this:

in [166]: df out[166]:     id  class status species   mask 0   1  sands      d   carex   true 1   1  sands      c  eupesu  false 2   1  sands      c  poapra  false 3   2   limy      d  carcra   true 4   2   limy      c  eupesu  false 5   2   limy      c  poapra  false 6   3   limy      d  poapra  false 7   3   limy      c  eupesu  false 8   3   limy      c  poapra  false 

now, (thanks dsm):

mask = ((df['status'] == 'd')          & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any') 

groups mask df['id'], , assigns true rows of group if any value in original mask true, , false otherwise.

in [168]: mask out[168]:  0     true 1     true 2     true 3     true 4     true 5     true 6    false 7    false 8    false dtype: bool 

df.loc can used select rows , columns df. df.loc[mask] selects rows mask true:

in [169]: df.loc[mask] out[169]:     id  class status species   mask 0   1  sands      d   carex   true 1   1  sands      c  eupesu  false 2   1  sands      c  poapra  false 3   2   limy      d  carcra   true 4   2   limy      c  eupesu  false 5   2   limy      c  poapra  false 

df.loc[mask, 'class'] further selects column class:

in [170]: df.loc[mask, 'class'] out[170]:  0    sands 1    sands 2    sands 3     limy 4     limy 5     limy name: class, dtype: object 

df.loc[mask]['class'] = value may fail modify df since df.loc[mask] returns copy. (the same holds true of df[mask]['class'] = value). using [...] twice called "chained indexing" , problem can avoided if avoid chained indexing.

so instead of using [...] twice, use, df.loc[mask, 'class'] = 'wet':

in [172]: df out[172]:     id class status species 0   1   wet      d   carex 1   1   wet      c  eupesu 2   1   wet      c  poapra 3   2   wet      d  carcra 4   2   wet      c  eupesu 5   2   wet      c  poapra 6   3  limy      d  poapra 7   3  limy      c  eupesu 8   3  limy      c  poapra 

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -