aggregate - R Aggregation Error: count distinct -
i'm relatively new r, forgive me if seems dumb question. i've started run out of ideas other examples on how make work, , hoping guide me in right direction working.
so i'm attempting count distinct on site_id clncl_trial_id.
my data in dataframe (data2), kind of looks like:
clncl_trial_id: 89794, 89794, 8613, 8613 site_id: 12456, 12456, 100341, 30807
the idea end result count of 89794=1 , 8613=2
here's have far:
z <- aggregate(data2$site_id ~ data2$clncl_trial_id, data2, function(site_id) length(unique(data2$site_id)))
and i've attempted alternate forms
aggregate(site_id ~ clncl_trial_id, data2, sum(!duplicated(data$site_id))) aggregate(site_id ~ clncl_trial_id, data2, nlevels(factor(data2$site_id))) aggregate(site_id ~ clncl_trial_id, data2, function(site_id) length(unique(data2$site_id)))
i keep running problem instead of grouping trial_id, counting whole table. 89794=3 , 8613=3.
does have idea how correct issue? feel i'm overlooking silly. also, side note: i'm trying keep limited base package of r if @ possible. if isn't possible, no biggie.
a couple of methods:
data:
df <- data.frame(clncl_trial_id = c(89794, 89794,8613, 8613), site_id = c(12456, 12456, 100341, 30807))
base r - table:
table(df) site_id clncl_trial_id 12456 30807 100341 8613 0 1 1 89794 2 0 0
dplyr:
library(dplyr) df %>% group_by(clncl_trial_id, site_id) %>% summarise(count = n()) clncl_trial_id site_id count 1 8613 30807 1 2 8613 100341 1 3 89794 12456 2
update
to count distinct, use unique
base r, or distinct
dplyr:
table(unique(df)) ## group/summarise results can use rowsums() rowsums(table(unique(df))) df %>% distinct %>% group_by(clncl_trial_id) %>% summarise(count = n())
or, more succintly using marek's suggestion
df %>% distinct %>% count(clncl_trial_id)
Comments
Post a Comment