python - Pandas Optimized Way to Create Dummy-Variable? -
i creating new dummy variable based off of given column , criteria. below code working with. works slow do. there faster, maybe vectorized way create dummies in pandas? specifically, according example?
i have looked get_dummies function in pandas seems little different doing here. wrong though if has way make get_dummies work example, acceptable answer too.
def flagger(row, criteria, col): if row[col] <= criteria: return 1 if row[col] > criteria: return 0 dstk['dropflag'] = dstk.apply(lambda row: flagger(row, criteria, col), axis=1)
edit: there 2 answers here. @ glance both equally fast (at least same order of magnitude) accepted one. if wants more serious profiling happy revise answer choice.
why not try np.where
. it's column-wise vectorized operation , faster row-wise apply.
dstk['dropflag'] = np.where(dstk.col <= criteria, 1, 0)
Comments
Post a Comment