python - Modifying Multiple Rows based on Specific Criteria -
i have csv file looks this:
id class status species 1 sands d carex 1 sands c eupesu 1 sands c poapra 2 limy d carcra 2 limy c eupesu 2 limy c poapra 3 limy d poapra 3 limy c eupesu 3 limy c poapra
when status
d
, species
carex or carcra want change class
wet
values within specific id. desired output is:
id class status species 1 wet d carex 1 wet c eupesu 1 wet c poapra 2 wet d carcra 2 wet c eupesu 2 wet c poapra 3 limy d poapra 3 limy c eupesu 3 limy c poapra
import pandas pd df = pd.read_table('data', sep='\s+') mask = ((df['status'] == 'd') & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any') df.loc[mask, 'class'] = 'wet' print(df)
yields
id class status species 0 1 wet d carex 1 1 wet c eupesu 2 1 wet c poapra 3 2 wet d carcra 4 2 wet c eupesu 5 2 wet c poapra 6 3 limy d poapra 7 3 limy c eupesu 8 3 limy c poapra
the assignment
df['mask'] = ((df['status'] == 'd') & df['species'].isin(['carex','carcra']))
makes df
this:
in [166]: df out[166]: id class status species mask 0 1 sands d carex true 1 1 sands c eupesu false 2 1 sands c poapra false 3 2 limy d carcra true 4 2 limy c eupesu false 5 2 limy c poapra false 6 3 limy d poapra false 7 3 limy c eupesu false 8 3 limy c poapra false
now, (thanks dsm):
mask = ((df['status'] == 'd') & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any')
groups mask
df['id']
, , assigns true
rows of group if any value in original mask
true, , false
otherwise.
in [168]: mask out[168]: 0 true 1 true 2 true 3 true 4 true 5 true 6 false 7 false 8 false dtype: bool
df.loc
can used select rows , columns df
. df.loc[mask]
selects rows mask
true:
in [169]: df.loc[mask] out[169]: id class status species mask 0 1 sands d carex true 1 1 sands c eupesu false 2 1 sands c poapra false 3 2 limy d carcra true 4 2 limy c eupesu false 5 2 limy c poapra false
df.loc[mask, 'class']
further selects column class
:
in [170]: df.loc[mask, 'class'] out[170]: 0 sands 1 sands 2 sands 3 limy 4 limy 5 limy name: class, dtype: object
df.loc[mask]['class'] = value
may fail modify df
since df.loc[mask]
returns copy. (the same holds true of df[mask]['class'] = value
). using [...]
twice called "chained indexing" , problem can avoided if avoid chained indexing.
so instead of using [...]
twice, use, df.loc[mask, 'class'] = 'wet'
:
in [172]: df out[172]: id class status species 0 1 wet d carex 1 1 wet c eupesu 2 1 wet c poapra 3 2 wet d carcra 4 2 wet c eupesu 5 2 wet c poapra 6 3 limy d poapra 7 3 limy c eupesu 8 3 limy c poapra
Comments
Post a Comment