I have a dataset that I am inhering of website logs that basically adds a new series of columns based on the number of pages visited. For example, if someone went to 2 pages on our website we'd have something like: visit_id, url_1, visit_datetime_1, url_2, visit_datetime_2. The problem is that some people visit just one page, and some visit 14. I want to simply this. See below for my current format and desired output. I guess I just don't understand how I will go through each column, when the number of fields are not always consistent (but the column names WILL be consistent: visit_id is a unique identifier, url_x, visit_datetime_x). I'm stumped.
Just to be clear below, visit_id 1000 visited 3 pages, 2000 visited 1 page, and 3000 visited 2 pages.
I've just never done anything like this before in Pandas and I'm just at a roadblock. I've gotten this far, which isn't far, but at least shows I'm trying. All help is appreciated.
visit_ids = []
urls = []
visit_datetimes = []
dataset = pd.read_excel('data.xlsx', engine='openpyxl')
df = pd.DataFrame(dataset)
for colname in df.iteritems():
#do something to add to list
question from:
https://stackoverflow.com/questions/66045484/python-pandas-dataset-with-many-columns-want-to-iterate-over-each-column-ad 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…