bioinformatics - populating a matrix with a list, where each vector in the list may be 1 - 7 elements [R] -


say have ';' separated information in vector, want split apart, using strsplit. data contains information looks this:

[1] "k__fungi; p__ascomycota; c__eurotiomycetes; o__unidentified; f__unidentified; g__unidentified; s__eurotiomycetes sp" [2] "k__fungi; p__basidiomycota; c__agaricomycetes; o__agaricales; f__mycenaceae; g__unidentified; s__mycenaceae sp"      [3] "k__fungi; p__ascomycota"                                                                                             [4] "none"                                                                                                                [5] "k__fungi; p__glomeromycota; c__glomeromycetes; o__glomerales; f__glomeraceae; g__glomus; s__glomus macrocarpum"      [6] "k__fungi; p__basidiomycota; c__agaricomycetes; o__agaricales; f__inocybaceae; g__inocybe"                            

i use strsplit separate out information this:

list<- strsplit(data,split=";") 

the output of is

[[1]] [1] "k__fungi"              " p__ascomycota"        " c__eurotiomycetes"    " o__unidentified"      " f__unidentified"      " g__unidentified"      " s__eurotiomycetes sp"  [[2]] [1] "k__fungi"           " p__basidiomycota"  " c__agaricomycetes" " o__agaricales"     " f__mycenaceae"     " g__unidentified"   " s__mycenaceae sp"   [[3]] [1] "k__fungi"       " p__ascomycota"  [[4]] [1] "none"  [[5]] [1] "k__fungi"               " p__glomeromycota"      " c__glomeromycetes"     " o__glomerales"         " f__glomeraceae"        " g__glomus"             " s__glomus macrocarpum"  [[6]] [1] "k__fungi"           " p__basidiomycota"  " c__agaricomycetes" " o__agaricales"     " f__inocybaceae"    " g__inocybe"       

i want push information matrix formatted length of original data object, , 7 named columns. generate empty matrix this:

out<- matrix(nrow=(length(data)),ncol=7) colnames(out)<-c("kingdom","phylum","class","order","family","genus","species") 

the empty matrix ends looking this:

     kingdom phylum class order family genus species [1,]      na     na    na    na     na    na      na [2,]      na     na    na    na     na    na      na [3,]      na     na    na    na     na    na      na [4,]      na     na    na    na     na    na      na [5,]      na     na    na    na     na    na      na [6,]      na     na    na    na     na    na      na 

i want insert information list matrix, such if first vector in list has 7 elements, 7 columns in row 1 have entries. however, if vector in list has 2 elements, first 2 columns in matrix row have entries, , rest remain na values.

**note: intentionally avoiding loops. had loop solution, fails when scale data set 100,000 lines.

you may try

library(stringi) m1 <- stri_list2matrix(list, byrow=true) colnames(m1) <- c("kingdom","phylum","class","order","family","genus","species") 

or instead of using strsplit, can directly read read.table

read.table(text=data, sep=";", fill=true, stringsasfactors=false, na.strings='') 

or using devel version of data.table

library(data.table)#v1.9.5+ setdt(list(data))[,tstrsplit(v1, '; ')] 

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -