Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
333 views
in Technique[技术] by (71.8m points)

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

Similar to this question, numpy.genfromtxt modifies my columns' names:

import numpy as np
from io import BytesIO  # https://stackoverflow.com/a/11970414/321973

str = 'x,-1,1
0,1,1
1,2,3'
data = np.genfromtxt(BytesIO(str.encode()), delimiter=',', names=True)
print(data.dtype.names)

yields ('x', '1', '1_1') instead of the desired ('x', '-1', '1') (or even better, ('x', -1, 1)). I tried deletechars="""~!@#$%^&*()=+~|]}[{';: /?>,<""" as suggested there to no avail.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The behavior you're seeing is caused by the fact that np.genfromtxt uses the NameValidator class here to automatically strip certain non-alphanumeric characters from the field names.

It's perfectly legal for a field name to contain a '-' character, e.g.:

x = np.array((1,), dtype=[('-1', 'i')])
print(x['-1'])
# 1

In fact, two out of three of the modified field names you get back from np.genfromtxt are also not "valid Python identifiers" ('1' and '1_1', since they start with digits).

It's therefore possible to construct the array you describe as long as you bypass using np.genfromtxt to set the field names. One way to do it would be to initialize an empty array, specify the field names and dtypes explicitly, then fill it with the rest of the string contents:

names = str.splitlines()[0].split(',')
types = ('i',) * 3
dtype = zip(names, types)

data = np.empty(2, dtype=dtype)
data[:] = np.genfromtxt(BytesIO(str.encode()), delimiter=',', dtype=dtype,
                        skiprows=1)
print(repr(data))
# array([(0, 0, 1), (1, 0, 2)], 
#       dtype=[('x', '<i4'), ('-1', '<i4'), ('1', '<i4')])

However, just because you can doesn't mean you should - there may well be other unpredictable consequences to having a '-' in one of your field names. The safest option is to stick with using only valid Python identifiers as field names.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...