First filter False
values by inverting mask with ~
and sorting values (if necessary), then filter groups with threshold and last create Group
column by factorize
:
df = df[~df['isCorrect']].sort_values(['Host','Time'])
mask = df['Host'].map(df['Host'].value_counts()) >= 3
df = df[mask].copy()
df['Group'] = pd.factorize(df['Host'])[0] + 1
print (df)
Time Host isCorrect Group
2 10:03 HostA False 1
3 10:15 HostA False 1
4 10:16 HostA False 1
5 10:18 HostB False 2
8 10:22 HostB False 2
9 10:23 HostB False 2
If grouping by consecutive False
s:
m = ~df['isCorrect']
df['Group'] = df['isCorrect'].cumsum()[m]
df = df[m].sort_values(['Host','Time'])
mask = df.groupby(['Group', 'Host'])['Group'].transform('size') >= 3
df = df[mask].copy()
df['Group'] = pd.factorize(df['Host'])[0] + 1
print (df)
Time Host isCorrect Group
2 10:03 HostA False 1
3 10:15 HostA False 1
4 10:16 HostA False 1
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…