Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
447 views
in Technique[技术] by (71.8m points)

Find distinct values, not distinct counts in elasticsearch

Elasticsearch documentation suggests* that their piece of code

*documentation fixed

GET /cars/transactions/_search?search_type=count
{
  "aggs": {
    "distinct_colors": {
      "cardinality": {
        "field": "color"
      }
    }
  }
}

corresponds to sql query

SELECT DISTINCT(color) FROM cars

but it actually corresponds to

SELECT COUNT(DISTINCT(color)) FROM cars

I don't want to know how many distinct values I have but what are the distinct values. Anyone knows how to achieve that?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use a terms aggregation on the color field. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.

If you still want tokenization AND to use the terms aggregation you might want to look at not_analyzed type of indexing for that field, and maybe use multi fields.

Terms aggregation for cars:

GET /cars/transactions/_search?search_type=count
{
  "aggs": {
    "distinct_colors": {
      "terms": {
        "field": "color",
        "size": 1000
      }
    }
  }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...