Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
888 views
in Technique[技术] by (71.8m points)

excel - get a number of unique values without separating values that belong to the same block of values

I'm OK with either a PL/SQL solution or an Access VBA/Excel VBA (though Access VBA is preferred over Excel VBA) one. so, PL/SQL is the first choice, Access VBA is second and Excel VBA is third.

This is a very tough problem to explain. Please ask any questions and i will do my best to answer them clearly.

I have the following dataset in a table called NR_PVO_120. How do i pick out a number (which can change but let's say, 6) of UNIQUE OtherIDs without excluding any OtherIDs under any fax numbers?

So, if you pick OtherID from Row7 you then also must pick OtherIDs from rows 8 and 9 because they have the same fax number. Basically, once you pick an OtherID you're then obligated to pick all OtherIDs that have the same fax number as the one you picked.

If the number requested (6 for this example) isn't possible then "the closest number possible but not exceeding" would be the rule.

For example, if you take OtherIDs from rows 1-10 you will get 6 unique OtherIDs but row 10 shares a fax with rows 11 and 12. You either need to take all 3 (but that will raise the unique count to 8, which isn't acceptable) or skip this OtherID and find one with a fax that will add 1 unique OtherID (for example, it can have 4 OtherIDs but 3 of them exist on the result set and therefore don't add to unique counts). My result of 6 UNIQUE OtherIDs will need to contain ALL OtherIDs under any fax the existing OtherIDs are connected to.

So one solution is to take rows 1-6, 26. Another is to take rows 1-4,10-14. There are more but you get the idea.

There will be many possibilities (the real dataset has tens of thousands of rows and the number of people requested will be around 10K), as long all OtherIDs connected to all faxes on the result set are part of the requested number (6 in this case) any combination would do.

A few notes.

  1. Getting as close as possible to the requested number is a requirement.

  2. Some OtherIDs will have a blank fax, they should only be included as a last resort (not enough OtherIDs for the requested number).

How is this done?

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
5       56148974    7182232046
6       56530104    7182234134
7       25138850    7182234166
8       56148974    7182234166
9       11098554    7182234166
10      56597717    7182248132
11      56166294    7182248132
12      25138850    7182248132
13      56148974    7182390090
14      56226456    7182390090
15      56148974    7182395285
16      25138850    7182395285
17      56166614    7180930966
18      11098554    7180930966
19      56159509    7180930966
20      25138850    7185462234
21      56148974    7185462234
22      25138850    7185465013
23      56024315    7185465013
24      56115247    7185465281
25      25138850    7185465281
26      56148975    7185466029

A few sample outputs

one solution is taking rows 1-6 and 26.

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
5       56148974    7182232046
6       56530104    7182234134
26      56148975    7185466029

Another solution is taking rows 1-4 and 10-14.

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
10      56597717    7182248132
11      56166294    7182248132
12      25138850    7182248132
13      56148974    7182390090
14      56226456    7182390090

There are many more.

I only need FAX as my output.

This is for a fax campaign, we need to make sure no fax number is faxed twice, that all people connected to that fax number are contacted under one fax sent.

So the idea is to take all OtherIDs under ANY fax you end up using.

EDIT here's how it's currently done, maybe this helps paint a picture

list is sorted by fax, they go down the list to a random point MAKING SURE THE LAST RECORD ENDS WITH THE SAME FAX. so in my example they'd stop at either row 1,2,4,5,6,9,12,14,16,19,21,23,25,26. they then see how many unique OtherIDs they have up until that point. if it's too many they go up some, see how many they have. if it's too little, they go down some, see how many they have. and they keep doing this until they get their unique number. the only requirement is to always include all OtherIDs under a fax.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not a full answer, but I don't want to write a lot of queries in comments.
Your main goal is to send information to people, and to avoid the situation when one person receives fax twice. So you first you need a list of unique recipients, like this:

select distinct otherid
  from NR_PVO_120

If one person has two fax numbers, you need to decide, which one to choose:

select otherid, fax
  from (select otherid, fax, row_number() over (partition by otherid order by <choosing rule>) rn
          from NR_PVO_120)
 where rn = 1

(All of this you have in answers of previous question)
If you take this list of fax numbers, all of your recipients receive the fax, and only one fax for every person. But some fax numbers will not be used. You can easily find them:

select otherid, fax
  from (select otherid, fax, row_number() over (partition by otherid order by <choosing rule>) rn
          from NR_PVO_120)
 where rn > 1

If you send fax to any of this numbers, some of people get one fax twice.
English is not my native language, so I don't understand what you mean when say "without breaking up fax numbers". As I can see in your question, possibly you need to use order of fax numbers in your question as number priority (the higher number is situated in the table - the higher probability to use it). It seems like you can use following:

select otherid, fax
  from (select otherid, fax, row_number() over (partition by otherid order by row) rn
          from NR_PVO_120)
 where rn = 1

here row in order by clause is a Row from your example table.

UPD
P. S. About my last query: we have a table with certain order, and the order is important. We take rows of the table line by line. Take first row and put its otherid and fax to result table. Then take next row. If it contains another fax number and otherid, we take it, if otherid already in our result table, we skip it. Did you ask this algorithm?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...