Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
153 views
in Technique[技术] by (71.8m points)

python - Pandas groupby by the same value in different columns

I have a dataframe as this:

                 HOME_TEAM     AWAY_TEAM      BOOL
1153     Manchester United  Swansea City      True             
1163        Leicester City       Everton     False            
1172               Everton     Hull City      True        
1183            Stoke City       Everton      True         
1193  West Bromwich Albion    Sunderland     False 

    

I want groups for each team that appears in HOME_TEAM or AWAY_TEAM. For example, for Everton, I would like something similar as a result:

                 HOME_TEAM     AWAY_TEAM             BOOL            
1163        Leicester City       Everton            False            
1172               Everton     Hull City             True        
1183            Stoke City       Everton             True      

then I have to count the consecutive True of False, but this is not a problem. The problem is groups the matches in this way.

I know I can simply use

(df.HOME_TEAM == 'Everton') | (df.AWAY_TEAM == 'Everton)

but this way I should use a for loop for each team in my dataframe and it is too slow for my big dataframe.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the following:

>>> (df.filter(like='TEAM').stack()
       .reset_index(level=1, drop=True).to_frame('teams')
       .join(df).set_index('teams', append=True
     ).swaplevel().sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

Or for groupby:

>>> group = (df.filter(like='TEAM').stack()
           .reset_index(level=1, drop=True).to_frame('teams')
           .join(df).groupby('teams'))
>>> group.get_group('Everton')
        teams       HOME_TEAM  AWAY_TEAM   BOOL
1163  Everton  Leicester-City    Everton  False
1172  Everton         Everton  Hull-City   True
1183  Everton      Stoke-City    Everton   True

HOW IT WORKS

>>> df.filter(like='TEAM')
 
                 HOME_TEAM     AWAY_TEAM
1153     Manchester-United  Swansea-City
1163        Leicester-City       Everton
1172               Everton     Hull-City
1183            Stoke-City       Everton
1193  West-Bromwich-Albion    Sunderland
>>> _.stack()
1153  HOME_TEAM       Manchester-United
      AWAY_TEAM            Swansea-City
1163  HOME_TEAM          Leicester-City
      AWAY_TEAM                 Everton
1172  HOME_TEAM                 Everton
      AWAY_TEAM               Hull-City
1183  HOME_TEAM              Stoke-City
      AWAY_TEAM                 Everton
1193  HOME_TEAM    West-Bromwich-Albion
      AWAY_TEAM              Sunderland
>>> _.reset_index(level=1, drop=True)
 
1153       Manchester-United
1153            Swansea-City
1163          Leicester-City
1163                 Everton
1172                 Everton
1172               Hull-City
1183              Stoke-City
1183                 Everton
1193    West-Bromwich-Albion
1193              Sunderland

>>> _.to_frame('teams') 
                     teams
1153     Manchester-United
1153          Swansea-City
1163        Leicester-City
1163               Everton
1172               Everton
1172             Hull-City
1183            Stoke-City
1183               Everton
1193  West-Bromwich-Albion
1193            Sunderland

>>> _.join(df)
                     teams             HOME_TEAM     AWAY_TEAM   BOOL
1153     Manchester-United     Manchester-United  Swansea-City   True
1153          Swansea-City     Manchester-United  Swansea-City   True
1163        Leicester-City        Leicester-City       Everton  False
1163               Everton        Leicester-City       Everton  False
1172               Everton               Everton     Hull-City   True
1172             Hull-City               Everton     Hull-City   True
1183            Stoke-City            Stoke-City       Everton   True
1183               Everton            Stoke-City       Everton   True
1193  West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
1193            Sunderland  West-Bromwich-Albion    Sunderland  False

>>> _.set_index('teams', append=True)
                                      HOME_TEAM     AWAY_TEAM   BOOL
     teams                                                          
1153 Manchester-United        Manchester-United  Swansea-City   True
     Swansea-City             Manchester-United  Swansea-City   True
1163 Leicester-City              Leicester-City       Everton  False
     Everton                     Leicester-City       Everton  False
1172 Everton                            Everton     Hull-City   True
     Hull-City                          Everton     Hull-City   True
1183 Stoke-City                      Stoke-City       Everton   True
     Everton                         Stoke-City       Everton   True
1193 West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
     Sunderland            West-Bromwich-Albion    Sunderland  False
>>> _.swaplevel()

                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Manchester-United    1153     Manchester-United  Swansea-City   True
Swansea-City         1153     Manchester-United  Swansea-City   True
Leicester-City       1163        Leicester-City       Everton  False
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
Hull-City            1172               Everton     Hull-City   True
Stoke-City           1183            Stoke-City       Everton   True
Everton              1183            Stoke-City       Everton   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
>>> _.sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

ALTERNATIVELY

>>> df.append([df]).set_index(
        df.filter(like='TEAM').melt().value, 
        drop=False, append=True).swaplevel().sort_index()

                                      HOME_TEAM     AWAY_TEAM   BOOL
value                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

>>> df.append([df]).set_index(df.filter(like='TEAM').melt().value, 
       drop=False, append=True).swaplevel().groupby(level=0).get_group('Everton') 
                   HOME_TEAM  AWAY_TEAM   BOOL
value                                         
Everton 1163  Leicester-City    Everton  False
        1172         Everton  Hull-City   True
        1183      Stoke-City    Everton   True

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...