Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
175 views
in Technique[技术] by (71.8m points)

read.table - How to read a subset of large dataset in R?

I have a dataset with about 2 million rows, so without reading the whole dataset I want to read a subset of dataset . My dataset contains a date column in it so I just want to read dataset between a date range without reading whole dataset as it will be time consuming and memory waste. so how to accomplish it can anyone guide me on this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use skip= parameter in read.table

read.table("file.txt",skip= ,nrows= )

Both the skip= and nrows= take in row indicator numbers so just add them after the=.

The nrows= defines how deep you range when you are importing the file.

I suggest reading https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html if you haven't done so already.

Also, please see one of my questions:

R - Reading lines from a .txt-file after a specific line

It, somewhat, touches the same subject.

The other possible way might be to use grep() in skip=

read.table(...,skip=grep("2005-12-31", readLines("File.txt")),nrows=365)

What this line does is it skips until it finds the line depicted in grep() and reads the lines after that. The nrow= will stop the reading after it has read 365 lines (this way you have read one year of dates provided one line equals one date).

This seems kinda complicated, but it's the only way I know how to solve this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...