Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
699 views
in Technique[技术] by (71.8m points)

dataframe - R) how to remove "rows" with empty values?

I have a weird data which contain a lot of empty values.

test=read.table("test.csv", sep=",", header=T)
class(test)
[1] "data.frame"
test[1:5]
     GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478
1       CDC27      FEM1B       CUL2       CTSS      AP2A2
2       FEM1B      PSMA1      PSMA1      ITGAV       CTSS
3        NAE1      PSMA3      PSMA3      PSMA1     DYNLL1
4       PSMA1      PSMB5      PSMB5      PSMA3      ITGAV
5       PSMA3      PSMC1      PSMC1      PSMB5      KIF5A
6       PSMB5      PSMC5      PSMC5      PSMC1     KIFAP3
7       PSMC1      PSMC6      PSMC6      PSMC5      PSMA1
8       PSMC5      PSMD1      PSMD1      PSMC6      PSMA3
9       PSMC6     PSMD12     PSMD12      PSMD1      PSMB5
10      PSMD1     PSMD13     PSMD13     PSMD12      PSMC1
11     PSMD12     PSMD14     PSMD14     PSMD13      PSMC5
12     PSMD13      PSMD4      PSMD4     PSMD14      PSMC6
13     PSMD14      PSME3      PSME3      PSMD4      PSMD1
14      PSMD4     PTPN11                 PSME3     PSMD12
15      PSME3                                      PSMD13
16     PTPN11                                      PSMD14
17                                                  PSMD4
18                                                  PSME3
19                                                       
20                                                       
21                                                       
22                                                       
23                                                       
24                                                       
25                                                       
26                                                       
27                                                       
28                                                       
29                                                       
30                                                       
31                                                       
32                                                       
33                                                       
34         
nrow(test[1])
[1] 34

## I want to get the number of column with any value: that is,16     
## So, I tried to remove empty columns like this

test2<-test[-which(is.na(test)),]
test2
[1] GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007   ...
## another way..
test[test==""] <- NA
test
GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007
1       CDC27      FEM1B       CUL2       CTSS      AP2A2      ITGAV      ALDOA          ALDOA
2       FEM1B      PSMA1      PSMA1      ITGAV       CTSS      PSMA1     ARPP19       ENO2
3        NAE1      PSMA3      PSMA3      PSMA1     DYNLL1      PSMA3       ENO2        GPI
4       PSMA1      PSMB5      PSMB5      PSMA3      ITGAV      PSMB5       GOT1        HK2
5       PSMA3      PSMC1      PSMC1      PSMB5      KIF5A      PSMC1       GOT2       IGF1
6       PSMB5      PSMC5      PSMC5      PSMC1     KIFAP3      PSMC5        GPI       LDHA
7       PSMC1      PSMC6      PSMC6      PSMC5      PSMA1      PSMC6        HK2       PFKP
8       PSMC5      PSMD1      PSMD1      PSMC6      PSMA3      PSMD1       IGF1      PGAM1
9       PSMC6     PSMD12     PSMD12      PSMD1      PSMB5     PSMD12       LDHA       TPI1
10      PSMD1     PSMD13     PSMD13     PSMD12      PSMC1     PSMD13       MDH1       <NA>
11     PSMD12     PSMD14     PSMD14     PSMD13      PSMC5     PSMD14       PFKP       <NA>
12     PSMD13      PSMD4      PSMD4     PSMD14      PSMC6      PSMD4      PGAM1       <NA>
13     PSMD14      PSME3      PSME3      PSMD4      PSMD1      PSME3     RANBP2       <NA>
14      PSMD4     PTPN11       <NA>      PSME3     PSMD12       <NA>       TPI1       <NA>
15      PSME3       <NA>       <NA>       <NA>     PSMD13       <NA>       <NA>       <NA>
16     PTPN11       <NA>       <NA>       <NA>     PSMD14       <NA>       <NA>       <NA>
17       <NA>       <NA>       <NA>       <NA>      PSMD4       <NA>       <NA>       <NA>
18       <NA>       <NA>       <NA>       <NA>      PSME3       <NA>       <NA>       <NA>
19       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
20       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
21       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
22       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
23       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
24       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
25       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
26       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
27       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
28       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
29       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
30       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
31       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
32       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
33       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
34       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>

test<-na.omit(test)
test
GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007
1      CDC27      FEM1B       CUL2       CTSS      AP2A2      ITGAV      ALDOA      ALDOA
2      FEM1B      PSMA1      PSMA1      ITGAV       CTSS      PSMA1     ARPP19       ENO2
3       NAE1      PSMA3      PSMA3      PSMA1     DYNLL1      PSMA3       ENO2        GPI
  GO.0006091 GO.0006094 GO.0006096 GO.0006099 GO.0006106 GO.0006119 GO.0006120 GO.0006418
1      ACACB      ALDOA      ALDOA         FH         FH       BDNF       BDNF       KARS
2      ALDOA     ARPP19       ENO2      IDH3A       GOT1     NDUFA9     NDUFA9       NARS
3     ATP5A1       ENO2        GPI       LDLR       GOT2    NDUFAF1    NDUFAF1       PPA1

I also try to exclude Blank and get the number of rows with value (ex. row(test[1])=16) with complete.cases function. But just it just returned me the same result.

What am i supposed to do?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try something like this

test[rowSums(is.na(test))!=ncol(test), ] # first set blank to NA

or

test[rowSums(test=="")!=ncol(test), ]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...