Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
749 views
in Technique[技术] by (71.8m points)

r - Combine some csv files into one - different number of columns

I already loaded 20 csv files with function:

tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))

or

list_of_data = lapply(tbl, read.csv)

That how it looks like:

> head(tbl)
[1] "F1.csv"          "F10_noS3.csv"    "F11.csv"         "F12.csv"         "F12_noS7_S8.csv"
[6] "F13.csv"

I have to combine all of those files into one. Let's call it a master file but let's try with making a one table with all of the names. In all of those csv files is a column called "Accession". I would like to make a table of all "names" from all of those csv files. Of course many of the accessions can be repeated in different csv files. I would like to keep all of the data corresponding to the accession.

Some problems:

  • Some of those "names" are the same and I don't want to duplicate them
  • Some of those "names" are ALMOST the same. The difference is that there is name and after become the dot and the numer.
  • The number of columns can be different is those csv files.

That's the screenshot showing how those data looks like: http://imageshack.com/a/img811/7103/29hg.jpg

Let me show you how it looks:

AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--

<-- = Same sample, different names. Should be treated as one. So just ignore dot and a number after.

Is it possible to do ?

I couldn't do a dput(head) because it's even too big data set.

I tried to use such code:

all_data = do.call(rbind, list_of_data)
Error in rbind(deparse.level, ...) : 
The number of columns is not correct.


all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))

I tried to do it for almost 2 weeks and I am not able to. So please help me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your questions seems to contain multiple subquestions. I encourage you to separate them.

The first thing you apparently need is to combine data frames with different columns. You can use rbind.fill from the plyr package:

library(plyr)
all_data = do.call(rbind.fill, list_of_data)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...