Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
420 views
in Technique[技术] by (71.8m points)

cluster analysis - Is there a python package that can find the most impactful group (categorical features) from my data?

My problem is that I have a dataset of our campaign like this:

| Customer | Province | District | City | Age | No. of Order |
| -------- | -------  | -------- | -----| ----| -------      |
| A        | P1       | D1       | C1   | 21  | 5            |
| B        | P2       | D2       | C2   | 22  | 9            |
....

And I need to find the most impactful group of customers (usually there will be >20 categorical groups). For example: "Customers from Province P1, District D1, Age 25 are the most promising group because they contributed 50% total order while being 10% of our customer base".

I'm currently using Pandas to loop through all the combinations of [2,3,4] from all my categorical features and calculate the sale proportion for each group but it is very time-consuming

I want to ask if there is already a Python package that can help to find that kind of group?

question from:https://stackoverflow.com/questions/65839351/is-there-a-python-package-that-can-find-the-most-impactful-group-categorical-fe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can automate that by using Decision Trees.

Not all features may be useful. Eliminate trivial ones using PCA (principal component analysis)

You may use scikit-learn package for both of above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...