Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
305 views
in Technique[技术] by (71.8m points)

python - Reading data from text file with missing values

I want to read data from a file that has many missing values, as in this example:

1,2,3,4,5
6,,,7,8
,,9,10,11

I am using the numpy.loadtxt function:

data = numpy.loadtxt('test.data', delimiter=',')

The problem is that the missing values break loadtxt (I get a "ValueError: could not convert string to float:", no doubt because of the two or more consecutive delimiters).

Is there a way to do this automatically, with loadtxt or another function, or do I have to bite the bullet and parse each line manually?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd probably use genfromtxt:

>>> from numpy import genfromtxt
>>> genfromtxt("missing1.dat", delimiter=",")
array([[  1.,   2.,   3.,   4.,   5.],
       [  6.,  nan,  nan,   7.,   8.],
       [ nan,  nan,   9.,  10.,  11.]])

and then do whatever with the nans (change them to something, use a mask instead, etc.) Some of this could be done inline:

>>> genfromtxt("missing1.dat", delimiter=",", filling_values=99)
array([[  1.,   2.,   3.,   4.,   5.],
       [  6.,  99.,  99.,   7.,   8.],
       [ 99.,  99.,   9.,  10.,  11.]])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...