Python Pandas - Dataset with many columns - want to iterate over each column, add row values to new list only from fields that are not null

Question

Welcome To Ask or Share your Answers For Others

Python Pandas - Dataset with many columns - want to iterate over each column, add row values to new list only from fields that are not null

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python Pandas - Dataset with many columns - want to iterate over each column, add row values to new list only from fields that are not null

I have a dataset that I am inhering of website logs that basically adds a new series of columns based on the number of pages visited. For example, if someone went to 2 pages on our website we'd have something like: visit_id, url_1, visit_datetime_1, url_2, visit_datetime_2. The problem is that some people visit just one page, and some visit 14. I want to simply this. See below for my current format and desired output. I guess I just don't understand how I will go through each column, when the number of fields are not always consistent (but the column names WILL be consistent: visit_id is a unique identifier, url_x, visit_datetime_x). I'm stumped.

Just to be clear below, visit_id 1000 visited 3 pages, 2000 visited 1 page, and 3000 visited 2 pages.

I've just never done anything like this before in Pandas and I'm just at a roadblock. I've gotten this far, which isn't far, but at least shows I'm trying. All help is appreciated.


visit_ids = []
urls = []
visit_datetimes = []

dataset = pd.read_excel('data.xlsx', engine='openpyxl')
df = pd.DataFrame(dataset)

for colname in df.iteritems():
    
    #do something to add to list

question from:https://stackoverflow.com/questions/66045484/python-pandas-dataset-with-many-columns-want-to-iterate-over-each-column-ad

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:16:33+0000

You can split last numbers after _ to MultiIndex and reshape by DataFrame.stack:

df = pd.read_excel('data.xlsx', engine='openpyxl')

df1 = df.set_index('visit_id')
df1.columns = df1.columns.str.rsplit('_', n=1, expand=True)

df1 = df1.stack().reset_index()

Categories

Python Pandas - Dataset with many columns - want to iterate over each column, add row values to new list only from fields that are not null

Python Pandas - Dataset with many columns - want to iterate over each column, add row values to new list only from fields that are not null

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags