Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
78 views
in Technique[技术] by (71.8m points)

python - put only elements into a list with a certian number

S_id is the sales ID and i_id is the sold itemid. I would like to use all unique i_ids to find all purchases that have interacted with i_id. I also implemented this in the loop. What I would like that I only want to add something to the list when the s_id has more of a 1 item.

How do I do that so that I only add the purchases to the list if it contains more than one item?

import pandas as pd

d = {'s_id': [1, 2, 2, 2, 3, 4, 4, 4, 5, 5],
     'i_id': [1, 1, 2, 3, 1, 4, 1, 2, 3, 5]}
df = pd.DataFrame(data=d)

print(df)


numers_i = df.i_id.unique().tolist()

for i in numers_i:
  buyers = df[df.i_id.eq(i)].s_id.unique()
  df_new = df[df.s_id.isin(buyers)]
  list_new = df_new.groupby("s_id")['i_id'].apply(list).tolist()
  print(list_new)

Output

[[1], [1, 2, 3], [1], [4, 1, 2]]
[[1, 2, 3], [4, 1, 2]]
[[1, 2, 3], [3, 5]]
[[4, 1, 2]]
[[3, 5]]

But what I want

[[REMOVED], [1, 2, 3], [REMOVED], [4, 1, 2]] 
[[1, 2, 3], [4, 1, 2]]
[[1, 2, 3], [3, 5]]
[[4, 1, 2]]
[[3, 5]]

[REMOVED] means that the element does not exist, I only wrote for a better understanding

question from:https://stackoverflow.com/questions/65642545/put-only-elements-into-a-list-with-a-certian-number

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A slightly different approach with two groupbys. One for getting the items in an s_id and the other for grouping the purchases happened along with particular i_ids.

Firstly, get the mapping of s_id to list of i_id

map_dict = df.groupby('s_id')['i_id'].apply(list).to_dict()

map_dict
{1: [1], 2: [1, 2, 3], 3: [1], 4: [4, 1, 2], 5: [3, 5]}

Then grouping by i_id to create list of list of items if length of "list of items" is greater than 1

def func(df):
    return ([items for items in df['s_id'].map(map_dict) if len(items) > 1])

df.groupby('i_id').apply(lambda x: func(x))

i_id
1    [[1, 2, 3], [4, 1, 2]]
2    [[1, 2, 3], [4, 1, 2]]
3    [[1, 2, 3], [3, 5]]   
4    [[4, 1, 2]]           
5    [[3, 5]]              
dtype: object

Compared the timing, this approach (6.71 ms ± 91 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)) seems to be faster than approach in question (12.3 ms ± 201 μs per loop (mean ± std. dev. of 7 runs, 100 loops each))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...