apply - R: Trying to determine which decile each data point is in, for all variables in a data frame -
i have data containing information on prices consumers willing pay services. i'm trying find deciles each response falls into, several services using cut function.
for (i in 2:13){ x<-quantile(data1[,i],c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1),na.rm=true) data1[paste(names(data1[i]), "deciles", sep="_")] <- cut(data1[,i], breaks=x, includ) }
however, have 2 problems: there variables 2 deciles same value (e.g. 0 =0, .1=0), cut function not accept. also, initial columns code work, actual decile , not decile number (for example "(1.99,2.56]" instead of .2.
if has ideas, appreciate it.
for first problem: can use unique
breaks , pass cut
. second, convert factor integer , use integer index in probs
vector pull out appropriate quantile break.
## sample data, third column fail `cut` set.seed(0) data1 <- data.frame(x=rnorm(100), y=rnorm(100), z=sample(0:5, 100, rep=t)) qs <- seq(0, 1, by=0.1) # probs quantile (i in 1:3){ x <- quantile(data1[,i], qs, na.rm=true) used <- qs[which(diff(c(0, x)) > 0)] # quantiles worked cuts <- cut(data1[,i], breaks=unique(x), include=t) # factors had them data1[paste(names(data1[i]), "deciles", sep="_")] <- cuts data1[paste(names(data1[i]), "num", sep="_")] <- used[as.integer(cuts)] # numeric values } # x y z x_deciles x_num y_deciles y_num z_deciles # 1 1.2629543 0.7818592 0 (1.24,2.44] 1.0 (0.78,1.5] 0.9 [0,1.7] # 2 -0.3262334 -0.7767766 3 (-0.421,-0.252] 0.4 (-0.956,-0.714] 0.3 (2,3] # 3 1.3297993 -0.6159899 1 (1.24,2.44] 1.0 (-0.714,-0.459] 0.4 [0,1.7] # 4 1.2724293 0.0465803 5 (1.24,2.44] 1.0 (0.0262,0.376] 0.7 (4,5] # 5 0.4146414 -1.1303858 5 (0.234,0.421] 0.7 [-1.68,-1.12] 0.1 (4,5] # 6 -1.5399500 0.5767188 5 [-2.22,-1.07] 0.1 (0.376,0.78] 0.8 (4,5] # z_num # 1 0.3 # 2 0.6 # 3 0.3 # 4 0.8 # 5 0.8 # 6 0.8
Comments
Post a Comment