Background: I am in the process of annotating SNPs from a GWAS in an organism without much annotation. I am using the chained tBLASTn table from UCSC along with biomaRt to map each SNP to a probable gene(s).
I have a dataframe that looks like this:
SNP hu_mRNA gene
chr1.111642529 NM_002107 H3F3A
chr1.111642529 NM_005324 H3F3B
chr1.111801684 BC098118 <NA>
chr1.111925084 NM_020435 GJC2
chr1.11801605 AK027740 <NA>
chr1.11801605 NM_032849 C13orf33
chr1.151220354 NM_018913 PCDHGA10
chr1.151220354 NM_018918 PCDHGA5
What I would like to end up with is a single row for each SNP, and comma delimit the genes and hu_mRNAs. Here is what I am after:
SNP hu_mRNA gene
chr1.111642529 NM_002107,NM_005324 H3F3A
chr1.111801684 BC098118,NM_020435 GJC2
chr1.11801605 AK027740,NM_032849 C13orf33
chr1.151220354 NM_018913,NM_018918 PCDHGA10,PCDHGA5
Now I know I can do this with a flick of the wrist in perl, but I really want to do this all in R. Any suggestions?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…