Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.5k views
in Technique[技术] by (71.8m points)

python - Add values to a data frame column at a specific row number

I have a data frame df call which looks like:

                  A          B             C
Date            
02/02/2007  14.8966   0.289371   0.009984836
05/02/2007  14.8719   0.288368  -0.001659473
06/02/2007  14.9295   0.279869   0.003865595
07/02/2007  15.0035   0.283038   0.004944386
08/02/2007  15.0528   0.277092   0.003280513
09/02/2007  14.7733   0.28663   -0.018742523
12/02/2007  14.6911   0.286458  -0.005579629
13/02/2007  14.996    0.275362   0.020541631
14/02/2007  15.5731   0.253263   0.037761568

I have a for loop in which I am using to manipulate a timeseries. The result is a variable called testValue that I would like to add to df in a new column called 'D' at the row reference endpoint (endpoint is a sequentially increasing integer).

for i in range(buildRange,len(pair_data)-calcRange+1):
    startpoint=i
    endpoint = startpoint+calcRange
    Calc_df=df[startpoint:endpoint].copy()
    Calc_df['C']=Calc_df.iloc[-1, Calc_df.columns.get_loc('A')]-Calc_df['B']
    testValue= sum(x for x in Calc_df["C"] if x > 0)
    
    df.loc[endpoint, 'D'] = testValue
    

So over the course of the loop column 'D' will build up with testValue values. Is it possible to reference a row in a dataframe and not the index? At the moment using df.loc[endpoint, 'D'] = testValue creates a new column 'D' but adds the data to the bottom of the data-frame and not to the correct row. (I think it's because endpoint is an integer and the index is date so it can't find the reference so creates a new one at the bottom of the data frame).

So for example if endpoint started at 4 the desired output to look like:

              A          B             C             D
Date            
02/02/2007  14.8966   0.289371   0.009984836
05/02/2007  14.8719   0.288368  -0.001659473
06/02/2007  14.9295   0.279869   0.003865595
07/02/2007  15.0035   0.283038   0.004944386    1.36535
08/02/2007  15.0528   0.277092   0.003280513    0.27821
09/02/2007  14.7733   0.28663   -0.018742523    0.25356
12/02/2007  14.6911   0.286458  -0.005579629    2780435
13/02/2007  14.996    0.275362   0.020541631    0.36635
14/02/2007  15.5731   0.253263   0.037761568    0.25368
        :                            :
31/12/2007  15.9364   0.763263   0.047435768    0.24663

(Values in column 4 are just for illustrative purposes and would note correct if the code was run).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's hard to verify what you are trying to do. From your code, this is what I guess:

df['D'] = (df['A'].shift()       # correspond to `Calc_df.iloc[-1, Calc_df.columns.get_loc('A')]`
            .sub( df['B'])       # subtract df['B']
            .mul(df['C'].gt(0))  # only look at `df['C']>0`
            .rolling(4).sum()    # sum the last 4 occurrences
          )

Output:

                  A         B         C          D
Date                                              
02/02/2007  14.8966  0.289371  0.009985        NaN
05/02/2007  14.8719  0.288368 -0.001659        NaN
06/02/2007  14.9295  0.279869  0.003866        NaN
07/02/2007  15.0035  0.283038  0.004944        NaN
08/02/2007  15.0528  0.277092  0.003281  43.964901
09/02/2007  14.7733  0.286630 -0.018743  43.964901
12/02/2007  14.6911  0.286458 -0.005580  29.372870
13/02/2007  14.9960  0.275362  0.020542  29.142146
14/02/2007  15.5731  0.253263  0.037762  29.158475

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...