Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
704 views
in Technique[技术] by (71.8m points)

postgresql - Postgres: Distinct but only for one column

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.

I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.

But how can I do that so it only gives me a list having no duplicates in names.

For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.

I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.

Anyone knows how to fetch many columns but do only a distinct on one column?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To do a distinct on only one (or n) column(s):

select distinct on (name)
    name, col1, col2
from names

This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:

select distinct on (name)
    name, col1, col2
from names
order by name, col1

Will return the first row when ordered by col1.

distinct on:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...