Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
398 views
in Technique[技术] by (71.8m points)

python - How to use Pandas groupby apply() without adding an extra index

I very often want to create a new DataFrame by combining multiple columns of a grouped DataFrame. The apply() function allows me to do that, but it requires that I create an unneeded index:

 In [359]: df = pandas.DataFrame({'x': 3 * ['a'] + 2 * ['b'], 'y': np.random.normal(size=5), 'z': np.random.normal(size=5)})

 In [360]: df
 Out[360]: 
    x         y         z
 0  a  0.201980 -0.470388
 1  a  0.190846 -2.089032
 2  a -1.131010  0.227859
 3  b -0.263865 -1.906575
 4  b -1.335956 -0.722087

 In [361]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))
 ---------------------------------------------------------------------------
 ValueError                                Traceback (most recent call last)
 /home/emarkley/work/src/partner_analysis2/main.py in <module>()
 ----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))

 /usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
     267         applied : type depending on grouped object and function
     268         """
 --> 269         return self._python_apply_general(func, *args, **kwargs)
     270 
     271     def aggregate(self, func, *args, **kwargs):

 /usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in _python_apply_general(self, func, *args, **kwargs)
     417             group_axes = _get_axes(group)
     418 
 --> 419             res = func(group, *args, **kwargs)
     420 
     421             if not _is_indexed_like(res, group_axes):

 /home/emarkley/work/src/partner_analysis2/main.py in <lambda>(x)
 ----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}))

 /usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
     371             mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
     372         elif isinstance(data, dict):
 --> 373             mgr = self._init_dict(data, index, columns, dtype=dtype)
     374         elif isinstance(data, ma.MaskedArray):
     375             mask = ma.getmaskarray(data)

 /usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
     454         # figure out the index, if necessary
     455         if index is None:
 --> 456             index = extract_index(data)
     457         else:
     458             index = _ensure_index(index)

 /usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in extract_index(data)
    4719 
    4720         if not indexes and not raw_lengths:
 -> 4721             raise ValueError('If use all scalar values, must pass index')
    4722 
    4723         if have_series or have_dicts:

 ValueError: If use all scalar values, must pass index

 In [362]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum() / x.z.sum(), 's': (x.y + x.z ** 2).sum() / x.z.sum()}, index=[0]))
 Out[362]: 
             r         s
 x                      
 a 0  1.316605 -1.672293
 b 0  1.608606 -0.972593

Is there any way to use apply() or some other function to get the same results without the extra index of zeros?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You're producing an aggregate r and s value per group, so you should be using Series here:

In [26]: df.groupby('x').apply(lambda x: 
             Series({'r': (x.y + x.z).sum() / x.z.sum(), 
                     's': (x.y + x.z ** 2).sum() / x.z.sum()}))
Out[26]: 
           r           s
x                       
a  -0.338590   -0.916635
b  66.655533  102.566146

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...