Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
732 views
in Technique[技术] by (71.8m points)

elasticsearch - Extract record from multiple arrays based on a filter

I have documents in ElasticSearch with the following structure :

"_source": {
          "last_updated": "2017-10-25T18:33:51.434706",
          "country": "Italia",
          "price": [
            "€?139",
            "€?125",
            "€?120",
            "€?108"
          ],
          "max_occupancy": [
            2,
            2,
            1,
            1
          ],
          "type": [
            "Type 1",
            "Type 1 - (Tag)",
            "Type 2",
            "Type 2 (Tag)",
          ],
          "availability": [
            10,
            10,
            10,
            10
          ],
          "size": [
            "26 m2",
            "35 m2",
            "47 m2",
            "31 m2"
          ]
        }
      }

Basically, the details records are split in 5 arrays, and fields of the same record have the same index position in the 5 arrays. As can be seen in the example data there are 5 array(price, max_occupancy, type, availability, size) that are containing values related to the same element. I want to extract the element that has max_occupancy field greater or equal than 2 (if there is no record with 2 grab a 3 if there is no 3 grab a four, ...), with the lower price, in this case the record and place the result into a new JSON object like the following :

{
          "last_updated": "2017-10-25T18:33:51.434706",
          "country": "Italia",
          "price: ": "€ 125",
          "max_occupancy": "2",
          "type": "Type 1 - (Tag)",
          "availability": 10,
          "size": "35 m2"
}  

Basically the result structure should show the extracted record(that in this case is the second index of all array), and add the general information to it(fields : "last_updated", "country").

Is it possible to extract such a result from elastic search? What kind of query do I need to perform?

Could someone suggest the best approach?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My best approach: go nested with Nested Datatype

Except for easier querying, it easier to read and understand the connections between those objects that are, currently, scattered in different arrays.

Yes, if you'll decide this approach you will have to edit your mapping and re-index your entire data.

How would the mapping is going to look like? something like this:

{
  "mappings": {
    "properties": {
      "last_updated": {
        "type": "date"
      },
      "country": {
        "type": "string"
      },
      "records": {
        "type": "nested",
        "properties": {
          "price": {
            "type": "string"
          },
          "max_occupancy": {
            "type": "long"
          },
          "type": {
            "type": "string"
          },
          "availability": {
            "type": "long"
          },
          "size": {
            "type": "string"
          }
        }
      }
    }
  }
}

EDIT: New document structure (containing nested documents) -

{
  "last_updated": "2017-10-25T18:33:51.434706",
  "country": "Italia",
  "records": [
    {
      "price": "€ 139",
      "max_occupancy": 2,
      "type": "Type 1",
      "availability": 10,
      "size": "26 m2"
    },
    {
      "price": "€ 125",
      "max_occupancy": 2,
      "type": "Type 1 - (Tag)",
      "availability": 10,
      "size": "35 m2"
    },
    {
      "price": "€ 120",
      "max_occupancy": 1,
      "type": "Type 2",
      "availability": 10,
      "size": "47 m2"
    },
    {
      "price": "€ 108",
      "max_occupancy": 1,
      "type": "Type 2 (Tag)",
      "availability": 10,
      "size": "31 m2"
    }
  ]
}

Now, its more easy to query for any specific condition with Nested Query and Inner Hits. for example:

{
  "_source": [
    "last_updated",
    "country"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "country": "Italia"
          }
        },
        {
          "nested": {
            "path": "records",
            "query": {
              "bool": {
                "must": [
                  {
                    "range": {
                      "records.max_occupancy": {
                        "gte": 2
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {
              "sort": {
                "records.price": "asc"
              },
              "size": 1
            }
          }
        }
      ]
    }
  }
}

Conditions are: Italia AND max_occupancy > 2.

Inner hits: sort by price ascending order and get the first result.

Hope you'll find it useful


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...