Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
354 views
in Technique[技术] by (71.8m points)

python - Load text file as strings using numpy.loadtxt()

I would like to load a big text file (around 1 GB with 3*10^6 rows and 10 - 100 columns) as a 2D np-array containing strings. However, it seems like numpy.loadtxt() only takes floats as default. Is it possible to specify another data type for the entire array? I've tried the following without luck:

loadedData = np.loadtxt(address, dtype=np.str)

I get the following error message:

/Library/Python/2.7/site-packages/numpy-1.8.0.dev_20224ea_20121123-py2.7-macosx-10.8-x86_64.egg/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    833             fh.close()
    834
--> 835     X = np.array(X, dtype)
    836     # Multicolumn data are returned with shape (1, N, M), i.e.
    837     # (1, 1, M) for a single row - remove the singleton dimension there

ValueError: cannot set an array element with a sequence

Any ideas? (I don't know the exact number of columns in my file on beforehand.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use genfromtxt instead. It's a much more general method than loadtxt:

import numpy as np
print np.genfromtxt('col.txt',dtype='str')

Using the file col.txt:

foo bar
cat dog
man wine

This gives:

[['foo' 'bar']
 ['cat' 'dog']
 ['man' 'wine']]

If you expect that each row has the same number of columns, read the first row and set the attribute filling_values to fix any missing rows.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...