Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
750 views
in Technique[技术] by (71.8m points)

python - How to find a columns set for a primary key candidate in CSV file?

I have a CSV file (not normalized, example, real file up to 100 columns):

   ID, CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
    1,     CUST1,     CLIENT1,          10, 2018-04-01, 2018-04-02
    2,     CUST1,     CLIENT1,          10, 2018-04-01, 2018-05-30
    3,     CUST1,     CLIENT1,         101, 2018-04-02, 2018-04-03
    4,     CUST2,     CLIENT1,         102, 2018-04-02, 2018-04-03

How can I find ALL possible sets of columns which could be used as Primary key.

Desired output:

  1) ID
  2) PAYMENT_NUM,START_DATE,END_DATE
  3) CUST_NAME, CLIENT_NAME, PAYMENT_NUM,START_DATE,END_DATE

I could do it in Java but may be Python/Pandas already provides a quick solution

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

pandas and itertools will give you what you're looking for.

import pandas
from itertools import chain, combinations

def key_options(items):
    return chain.from_iterable(combinations(items, r) for r in range(1, len(items)+1) )

df = pandas.read_csv('test.csv');

# iterate over all combos of headings, excluding ID for brevity
for candidate in key_options(list(df)[1:]):
    deduped = df.drop_duplicates(candidate)

    if len(deduped.index) == len(df.index):
        print ','.join(candidate)

This gives you the output:

PAYMENT_NUM, END_DATE
CUST_NAME, CLIENT_NAME, END_DATE
CUST_NAME, PAYMENT_NUM, END_DATE
CLIENT_NAME, PAYMENT_NUM, END_DATE
PAYMENT_NUM, START_DATE, END_DATE
CUST_NAME, CLIENT_NAME, PAYMENT_NUM, END_DATE
CUST_NAME, CLIENT_NAME, START_DATE, END_DATE
CUST_NAME, PAYMENT_NUM, START_DATE, END_DATE
CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...