Python - merge dataframes to replace missing values with the actual ones

Question

Welcome To Ask or Share your Answers For Others

Python - merge dataframes to replace missing values with the actual ones

posted Feb 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python - merge dataframes to replace missing values with the actual ones

I have a list of 20 dataframes with sports results with this structure:

Key1	Key2	Variable1	Variable2
TeamA	TeamB	20	Nan
TeamC	TeamA	Nan	25
TeamA	TeamD	17	Nan

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-06T00:21:14+0000

you can try it like this... This is just for creating your datafiles (I saved them as .csv and then read them in). Keep in mind that I read in your 'Nan'-values so that pandas recognizes them:

import pandas as pd
import os

path = r'C:...'
df1_fl = r'2020-12-31_df1.csv'
df2_fl = r'2020-12-31_df2.csv'
df3_fl = r'2020-12-31_df3.csv'

df1 = pd.read_csv(os.path.join(path, df1_fl), sep=';', na_values='Nan')
df2 = pd.read_csv(os.path.join(path, df2_fl), sep=';', na_values='Nan')
df3 = pd.read_csv(os.path.join(path, df3_fl), sep=';', na_values='Nan')

Then I just replace the nan-values with a zero value and aggregate all your data together in one dataframe:

df = pd.concat([df1, df2, df3]).fillna(0)

Then the interesting part starts, grouping the data by the columns 'Key1' and 'Key2', finding the max over the group (this fills up the nan values). In the end, you need to extract out the now existing multi-index in two columns as given in the beginning dataframes with reset_index.

df_agg = df.groupby(by=['Key1', 'Key2']).max().reset_index()

Categories

Python - merge dataframes to replace missing values with the actual ones

Python - merge dataframes to replace missing values with the actual ones

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags