Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
394 views
in Technique[技术] by (71.8m points)

mapreduce - "merge" view collation into useful output in CouchDB

When doing a "join" in CouchDB, you can use view collation to group the records together. For example, having two document types customers and orders. So that you can return customer, then all the orders for that customer, then the next customer, and orders.

The question is, how do you do a merging of rows, so that if you have 10 customers, and 40 orders, your output is still 10 rows instead of 50. You essentially add more information into your customer row.

I believe using a _list or a reduce will solve this. The question is how exactly to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I second jhs answer, but I think his "Option 2" is too dangerous. I learned it the hard way. You can use reduce function for many nice things like getting the last post of each user of a blog, but you cannot use it for anything which does not reduce the amount of data returned.

To support it with facts, I made this little script to generate 200 customers with 20 orders each.

#!/bin/bash
echo '{"docs":['
for i in $(seq 1 200); do
  id=customer/$i
  echo '{"_id":"'$id'","type":"customer","name":"Customer '$i'"},'
  for o in $(seq 1 20); do
    echo '{"type":"order","_id":"order/'$i'/'$o'", "for":"'$id'", "desc":"Desc '$i$o'"},'
  done
done
echo ']}'

It is a very likely scenario and it is enough to throw a Error: reduce_overflow_error.

IMHO the two option you have are:

Option 1: Optimized list function

With a little bit of work, you can build the JSON response by hand, so that you do not need to accumulate orders in an array.

I have edited the list function of jhs to avoid any use of arrays, so you can have customers with any number of orders.

function(head, req) {
  start({'headers':{'Content-Type':'application/json'}});

  var first_customer = true
    , first_order = true
    , row
    ;

  send('{"rows":[');

  while(row = getRow()) {
    if(row.key[1] === 2) {
      // Order for customer
      if (first_order) {
        first_order = false;
      } else {
        send(',');
      }
      send(JSON.stringify(row.value));
    }
    else if (row.key[1] === 1) {
      // New customer
      if (first_customer) {
        first_customer = false;
      } else {
        send(']},');
      }
      send('{"customer":');
      send(JSON.stringify(row.key[0]));
      send(',"orders":[');
      first_order = true;
    }
  }
  if (!first_customer)
    send(']}');

  send('
]}');
}

Option 2: Optimize the documents for your use case

If you really need to have the orders in the same document, then ask yourself if you can store it this way and avoid any processing while querying.

In other words: try to fully exploit the possibilities offered by a document database. Design the documents to best fit your use case, and reduce the post-processing needed to use them.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...