Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

Dealing with "R type" text files with rownames as the first, unamed col in readR

When reading a text file with base read.table,

"If row.names is not specified and the header line has one less entry than the number of columns, the first column is taken to be the row names. This allows data frames to be read in from the format in which they are printed. If row.names is specified and does not refer to the first column, that column is discarded from such files."

But how can I read such a file using tidyverse's readr ?

Consider this file (let's call it test.txt):

col1    col2
sample1 2   3
sample2 2   5

it is tab-separated, the first line has two items separated by a tab, the 2nd and 3rd lines have 3 items separated by two tabs.

Base R:

> read.table("test.txt")
        col1 col2
sample1    2    3
sample2    2    5

R with readr:

> read_delim("test.txt",delim="")

-- Column specification --------------------------------------------------------------------------------------------------
cols(
  col1 = col_character(),
  col2 = col_double()
)

Warning: 2 parsing failures.
row col  expected    actual                                     file
  1  -- 2 columns 3 columns 'C:Usersmoje4671Desktopest.txt'
  2  -- 2 columns 3 columns 'C:Usersmoje4671Desktopest.txt'

# A tibble: 2 x 2
  col1     col2
  <chr>   <dbl>
1 sample1     2
2 sample2     2

Unfortunately I do have quite a few files floating around that obey this convention (I won't discuss its merits).

I find it hard to imagine that there is no simple readr way to read this sort of file .. which is, after all, a legitimate R file format (so to speak);

Of course, a workaround is along the lines of

> as.tibble(read.table("test.txt"))
# A tibble: 2 x 2
   col1  col2
  <int> <int>
1     2     3
2     2     5

(plus some magic to preserve the rownames, alright)

.. but this is sort of defeating the purpose of using readr (faster, no automatic type conversion, etc...). Any better way ?

question from:https://stackoverflow.com/questions/65860981/dealing-with-r-type-text-files-with-rownames-as-the-first-unamed-col-in-readr

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to use the possibilities of the function arguments of read_delim a bit more to get this to work:

read_delim("test.txt", delim = "", skip = 1,
           col_names = c("col1","col2"),
           col_types = "_ii")

which gives:

# A tibble: 2 x 2
   col1  col2
  <int> <int>
1     2     3
2     2     5

If you are willing to look outside the tidyverse, another option would to use the fread-function from the -package:

fread("test.txt")

which gives:

        V1 col1 col2
1: sample1    2    3
2: sample2    2    5

As you can see, the rownames are now in the first column. You can eliminate this by using the drop-argument:

fread("test.txt", drop = 1)

which gives:

   col1 col2
1:    2    3
2:    2    5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...