Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

python - 熊猫忽略丢失的日期来查找百分位数(Pandas ignore missing dates to find percentiles)

I have a dataframe.

(我有一个数据框。)

I am trying to find percentiles of datetimes.

(我试图找到日期时间的百分位数。)

I am using the function:

(我正在使用该功能:)

Dataframe:

(数据框:)

student, attempts, time
student 1,14, 9/3/2019  12:32:32 AM
student 2,2, 9/3/2019  9:37:14 PM
student 3, 5
student 4, 16, 9/5/2019  8:58:14 PM

studentInfo2 = [14, 4, Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data['time'].notnull(), student2Info[2], 'rank')

where student2Info[2] holds the datetime for a particular student.

(其中student2Info [2]保存特定学生的日期时间。)

When I try and do this I get the error:

(当我尝试执行此操作时,出现错误:)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Any ideas on how I can get the percentile to calculate correctly even when there are missing times in the columns?

(关于即使在列中缺少时间的情况下如何获得百分位以正确计算的任何想法?)

  ask by newwebdev22 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to transform the Timestamps into units that percentileofscore can understand.

(您需要将时间戳转换为percentileofscore可以理解的单位。)

Also, pd.DataFrame.notnull() returns a boolean list that you may use to filter your DataFrame , it does not return the filtered list, so I've updated that for you.

(另外, pd.DataFrame.notnull()返回一个布尔列表,您可以使用它来过滤DataFrame ,它不返回过滤后的列表,因此我为您更新了该列表。)

Here is a working example:

(这是一个工作示例:)

import pandas as pd
import scipy.stats as stats

data = pd.DataFrame.from_dict({
    "student": [1, 2, 3, 4],
    "attempts": [14, 2, 5, 16],
    "time_0001": [
        "9/3/2019  12:32:32 AM",
        "9/3/2019  9:37:14 PM",
        "",
        "9/5/2019  8:58:14 PM"
    ]
})

student2Info = [14, 4, pd.Timestamp('2019-09-04 00:26:36')]
data['time'] = pd.to_datetime(data['time_0001'], errors='coerce')
perc1_first = stats.percentileofscore(data[data['time'].notnull()].time.transform(pd.Timestamp.toordinal), student2Info[2].toordinal(), 'rank')
print(perc1_first)  #-> 66.66666666666667

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...