I am using an aggregation function that I have used in my work for a long time now. The idea is that if the Series passed to the function is of length 1 (i.e. the group only has one observation) then that observations is returned. If the length of the Series passed is greater than one, then the observations are returned in a list.
This may seem odd to some, but this is not an X,Y problem, I have good reason for wanting to do this that is not relevant to this question.
This is the function that I have been using:
def MakeList(x):
""" This function is used to aggregate data that needs to be kept distinc within multi day
observations for later use and transformation. It makes a list of the data and if the list is of length 1
then there is only one line/day observation in that group so the single element of the list is returned.
If the list is longer than one then there are multiple line/day observations and the list itself is
returned."""
L = x.tolist()
if len(L) > 1:
return L
else:
return L[0]
Now for some reason, with the current data set I am working on I get a ValueError stating that the function does not reduce. Here is some test data and the remaining steps I am using:
import pandas as pd
DF = pd.DataFrame({'date': ['2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02',
'2013-04-02'],
'line_code': ['401101',
'401101',
'401102',
'401103',
'401104',
'401105',
'401105',
'401106',
'401106',
'401107'],
's.m.v.': [ 7.760,
25.564,
25.564,
9.550,
4.870,
7.760,
25.564,
5.282,
25.564,
5.282]})
DFGrouped = DF.groupby(['date', 'line_code'], as_index = False)
DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})
In trying to debug this, I put a print statement to the effect of print L
and print x.index
and
the output was as follows:
[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')
[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')
For some reason it appears that agg
is passing the Series twice to the function. This as far as I know is not normal at all, and is presumably the reason why my function is not reducing.
For example if I write a function like this:
def test_func(x):
print x.index
return x.iloc[0]
This runs without problem and the print statements are:
DF_Agg = DFGrouped.agg({'s.m.v.' : test_func})
Int64Index([0, 1], dtype='int64')
Int64Index([2], dtype='int64')
Int64Index([3], dtype='int64')
Int64Index([4], dtype='int64')
Int64Index([5, 6], dtype='int64')
Int64Index([7, 8], dtype='int64')
Int64Index([9], dtype='int64')
Which indicates that each group is only being passed once as a Series to the function.
Can anyone help me understand why this is failing? I have used this function with success in many many data sets I work with....
Thanks
See Question&Answers more detail:
os