python - Modifying Multiple Rows based on Specific Criteria -
i have csv file looks this:
id class status species 1 sands d carex 1 sands c eupesu 1 sands c poapra 2 limy d carcra 2 limy c eupesu 2 limy c poapra 3 limy d poapra 3 limy c eupesu 3 limy c poapra when status d , species carex or carcra want change class wet values within specific id. desired output is:
id class status species 1 wet d carex 1 wet c eupesu 1 wet c poapra 2 wet d carcra 2 wet c eupesu 2 wet c poapra 3 limy d poapra 3 limy c eupesu 3 limy c poapra
import pandas pd df = pd.read_table('data', sep='\s+') mask = ((df['status'] == 'd') & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any') df.loc[mask, 'class'] = 'wet' print(df) yields
id class status species 0 1 wet d carex 1 1 wet c eupesu 2 1 wet c poapra 3 2 wet d carcra 4 2 wet c eupesu 5 2 wet c poapra 6 3 limy d poapra 7 3 limy c eupesu 8 3 limy c poapra the assignment
df['mask'] = ((df['status'] == 'd') & df['species'].isin(['carex','carcra'])) makes df this:
in [166]: df out[166]: id class status species mask 0 1 sands d carex true 1 1 sands c eupesu false 2 1 sands c poapra false 3 2 limy d carcra true 4 2 limy c eupesu false 5 2 limy c poapra false 6 3 limy d poapra false 7 3 limy c eupesu false 8 3 limy c poapra false now, (thanks dsm):
mask = ((df['status'] == 'd') & df['species'].isin(['carex','carcra'])) mask = mask.groupby(df['id']).transform('any') groups mask df['id'], , assigns true rows of group if any value in original mask true, , false otherwise.
in [168]: mask out[168]: 0 true 1 true 2 true 3 true 4 true 5 true 6 false 7 false 8 false dtype: bool df.loc can used select rows , columns df. df.loc[mask] selects rows mask true:
in [169]: df.loc[mask] out[169]: id class status species mask 0 1 sands d carex true 1 1 sands c eupesu false 2 1 sands c poapra false 3 2 limy d carcra true 4 2 limy c eupesu false 5 2 limy c poapra false df.loc[mask, 'class'] further selects column class:
in [170]: df.loc[mask, 'class'] out[170]: 0 sands 1 sands 2 sands 3 limy 4 limy 5 limy name: class, dtype: object df.loc[mask]['class'] = value may fail modify df since df.loc[mask] returns copy. (the same holds true of df[mask]['class'] = value). using [...] twice called "chained indexing" , problem can avoided if avoid chained indexing.
so instead of using [...] twice, use, df.loc[mask, 'class'] = 'wet':
in [172]: df out[172]: id class status species 0 1 wet d carex 1 1 wet c eupesu 2 1 wet c poapra 3 2 wet d carcra 4 2 wet c eupesu 5 2 wet c poapra 6 3 limy d poapra 7 3 limy c eupesu 8 3 limy c poapra
Comments
Post a Comment