Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
465 views
in Technique[技术] by (71.8m points)

python - Cumulative pandas column "reseting" once threshold is reached

I am facing an issue with the following dataset:

item                  price       
1                     1706
2                     210
3                     1664
4                     103
5                     103
6                     314
7                     1664
8                     57
9                     140
10                    1628
11                    688
12                    180
13                    604
14                    86
15                    180
16                    86
17                    1616
18                    832
19                    1038
20                    57
21                    2343
22                    151
23                    328
24                    328
25                    57
26                    86
27                    1706
28                    604
29                    609
30                    86
31                    0
32                    57
33                    302
34                    328

I want to have a cumulative sum column which "resets" each time it reaches the threshold (read not exceed it, it is fine to have a big gap between the last cumsum number and the threshold as long as it does not exceed it).

I have tried the following code:

threshold = (7.17*1728)*0.75  #this is equal to 9292.32
df['cumsum'] = df.groupby((df['price'].cumsum()) // threshold)['price'].cumsum()

This output the following:

item                  price             cumsum    
1                     1706              1706
2                     210               1916
3                     1664              3580
4                     103               3683
5                     103               3786
6                     314               4100
7                     1664              5764
8                     57                5821
9                     140               5961
10                    1628              7589
11                    688               8277
12                    180               8757
13                    604               9061
14                    86                9147
15                    180               9327 #exceeds threshold
16                    86                9413 #
17                    1616              1616
18                    832               2448
19                    1038              3486
20                    57                3543
21                    2343              5886
22                    151               6037
23                    328               6365
24                    328               6693
25                    57                6750
26                    86                6836
27                    1706              8542
28                    604               9146
29                    609               9755 #exceeds threshold same below
30                    86                9841 #
31                    0                 9841 #
32                    57                9898 #
33                    302               10200 #
34                    328               328

My expected result would be the following instead (for the first part for example):

item                  price             cumsum    
1                     1706              1706
2                     210               1916
3                     1664              3580
4                     103               3683
5                     103               3786
6                     314               4100
7                     1664              5764
8                     57                5821
9                     140               5961
10                    1628              7589
11                    688               8277
12                    180               8757
13                    604               9061
14                    86                9147
15                    180               180 #
16                    86                266 #

What do I need to change in order to get this result? also i would appreciate any explanation as to why the above code does not work.

Thank you in advance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Maybe it costs a lot, but it can work...

threshold = (7.17*1728)*0.75  #this is equal to 9292.32
df['cumsum'] = df['price'].cumsum()

# handle the cumsum which is gt threshold by loops
n = 1
while True:
    print(n)
    cond = df['cumsum'].ge(threshold)
    if cond.sum():
        df.loc[cond, 'cumsum'] = df.loc[cond, 'price'].cumsum()
    else:
        break
    n += 1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...