Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
890 views
in Technique[技术] by (71.8m points)

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-

{
 _id:"53fe74a866455060e003c2db",
 name:"sam",
 subject:"maths",
 marks:"77"
}
{
 _id:"53fe79cbef038fee879263d2",
 name:"ryan", 
 subject:"bio",
 marks:"82"
}
{
 _id:"53fe74a866456060e003c2de",
 name:"tony",
 subject:"maths",
 marks:"86"
}

I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.

db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])

Now I should get the following result-

{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}

But I get-

{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}

Can someone point out what I might be doing wrong here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.


Option 1: Update Schema (Change Data Type)

The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:

db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
    db.student.update(
        { "_id": doc._id, "marks": { "$type": 2 } }, 
        { "$set": { "marks": parseInt(doc.marks) } }
    );
});

For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:

MongoDB versions >= 2.6 and < 3.2:

var bulk = db.student.initializeUnorderedBulkOp(),
    counter = 0;

db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {    
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "marks": parseInt(doc.marks) } 
    });

    counter++;
    if (counter % 1000 === 0) {
        // Execute per 1000 operations 
        bulk.execute(); 

        // re-initialize every 1000 update statements
        bulk = db.student.initializeUnorderedBulkOp();
    }
})

// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute(); 

MongoDB version 3.2 and newer:

var ops = [],
    cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});

cursor.forEach(function (doc) {     
    ops.push({ 
        "updateOne": { 
            "filter": { "_id": doc._id } ,              
            "update": { "$set": { "marks": parseInt(doc.marks) } } 
        }         
    });

    if (ops.length === 1000) {
        db.student.bulkWrite(ops);
        ops = [];
    }     
});

if (ops.length > 0) db.student.bulkWrite(ops);

Option 2: Run MapReduce

The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().

In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:

var mapper = function () {
    var x = parseInt(this.marks);
    emit(this.subject, x);
};

Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject. The function reduces the valuesMarks array to the sum of its elements.

var reducer = function(keySubject, valuesMarks) {
    return Array.sum(valuesMarks);
};

db.student.mapReduce(
    mapper,
    reducer,
    {
        out : "example_results",
        query: { subject : "maths" }       
    }
 );

With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:

/* 0 */
{
    "_id" : "maths",
    "value" : 163
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...