python - Faster way to transform group with mean value in Pandas

Question

Welcome To Ask or Share your Answers For Others

python - Faster way to transform group with mean value in Pandas

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Faster way to transform group with mean value in Pandas

I have a Pandas dataframe where I am trying to replace the values in each group by the mean of the group. On my machine, the line df["signal"].groupby(g).transform(np.mean) takes about 10 seconds to run with N and N_TRANSITIONS set to the numbers below.

Is there any faster way to achieve the same result?

import pandas as pd
import numpy as np
from time import time

np.random.seed(0)

N = 120000
N_TRANSITIONS = 1400

# generate groups
transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]
transition_points.sort()
transitions = np.zeros((N,), dtype=np.bool)
transitions[transition_points] = True
g = transitions.cumsum()

df = pd.DataFrame({ "signal" : np.random.rand(N)})

# here is my bottleneck for large N
tic = time()
result = df["signal"].groupby(g).transform(np.mean)
toc = time()
print toc - tic

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:13:37+0000

Current method, using transform

In [44]: grp = df["signal"].groupby(g)

In [45]: result2 = df["signal"].groupby(g).transform(np.mean)

In [47]: %timeit df["signal"].groupby(g).transform(np.mean)
1 loops, best of 3: 535 ms per loop

Using 'broadcasting' of the results

 In [43]: result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)

In [42]: %timeit pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
10 loops, best of 3: 119 ms per loop

In [46]: result.equals(result2)
Out[46]: True

I think you might need to set the index of the returned on the broadcast result (it happens to work here because its a default index

result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
result.index = df.index

Categories

python - Faster way to transform group with mean value in Pandas

python - Faster way to transform group with mean value in Pandas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags