Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
771 views
in Technique[技术] by (71.8m points)

numpy - Python Pandas: Increase Maximum Number of Rows

I am processing a large text file (500k lines), formatted as below:

S1_A16
0.141,0.009340221649748676
0.141,4.192618196894668E-5
0.11,0.014122135626540204
S1_A17
0.188,2.3292323316081486E-6
0.469,0.007928706856794138
0.172,3.726771730573038E-5

I'm using the code below to return the correlation coefficients of each series, e.g. S!_A16:

import numpy as np
import pandas as pd
import csv
pd.options.display.max_rows = None
fileName = 'wordUnigramPauseTEST.data'

df = pd.read_csv(fileName, names=['pause', 'probability'])
mask = df['pause'].str.match('^Sd+_Ad+')
df['S/A'] = (df['pause']
              .where(mask, np.nan)
              .fillna(method='ffill'))
df = df.loc[~mask]

result = df.groupby(['S/A']).apply(lambda grp: grp['pause'].corr(grp['probability']))
print(result)

However, on some large files, this returns the error:

Traceback (most recent call last):
  File "/Users/adamg/PycharmProjects/Subj_AnswerCorrCoef/GetCorrCoef.py", line 15, in <module>
    print(result)
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 35, in __str__
    return self.__bytes__()
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 47, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 857, in __unicode__
    result = self._tidy_repr(min(30, max_rows - 4))
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

I understand that this is related to the print statement, but how do I fix it?

EDIT: This is related to the maximum number of rows. Does anyone know how to accommodate a greater number of rows?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The error message:

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

is saying None minus an int is a TypeError. If you look at the next-to-last line in the traceback you see that the only subtraction going on there is

max_rows - 4

So max_rows must be None. If you dive into /Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py, near line 857 and ask yourself how max_rows could end up being equal to None, you'll see that somehow

get_option("display.max_rows")

must be returning None.

This part of the code is calling _tidy_repr which is used to summarize the Series. None is the correct value to set when you want pandas to display all lines of the Series. So this part of the code should not have been reached when max_rows is None.

I've made a pull request to correct this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...