Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
960 views
in Technique[技术] by (71.8m points)

mongodb - Removing white spaces (leading and trailing) from string value

I have imported a csv file in mongo using mongoimport and I want to remove leading and trailing white spaces from my string value.

Is it possible directly in mongo to use a trim function for all collection or do I need to write a script for that?

My collection contains elements such as:

{
  "_id" : ObjectId("53857680f7b2eb611e843a32"),
  "category" : "Financial & Legal Services "
}

I want to apply trim function for all the collection so that "category" should not contain any leading and trailing spaces.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is not currently possible for an update in MongoDB to refer to the existing value of a current field when applying the update. So you are going to have to loop:

db.collection.find({},{ "category": 1 }).forEach(function(doc) {
   doc.category = doc.category.trim();
   db.collection.update(
       { "_id": doc._id },
       { "$set": { "category": doc.category } }
   );
})

Noting the use of the $set operator there and the projected "category" field only in order to reduce network traffic"

You might limit what that processes with a $regex to match:

db.collection.find({ 
    "$and": [
        { "category": /^s+/ },
        { "category": /s+$/ }
    ]
})

Or even as pure $regex without the use of $and which you only need in MongoDB where multiple conditions would be applied to the same field. Otherwise $and is implicit to all arguments:

db.collection.find({ "category": /^s+|s+$/ })

Which restricts the matched documents to process to only those with leading or trailing white-space.

If you are worried about the number of documents to look, bulk updating should help if you have MongoDB 2.6 or greater available:

var batch = [];
db.collection.find({ "category": /^s+|s+$/ },{ "category": 1 }).forEach(
    function(doc) {
        batch.push({
            "q": { "_id": doc._id },
            "u": { "$set": { "category": doc.catetgory.trim() } }
        });

        if ( batch.length % 1000 == 0 ) {
            db.runCommand("update", batch);
            batch = [];
        }
    }
);

if ( batch.length > 0 )
    db.runCommand("update", batch);

Or even with the bulk operations API for MongoDB 2.6 and above:

var counter = 0;
var bulk = db.collection.initializeOrderedBulkOp();
db.collection.find({ "category": /^s+|s+$/ },{ "category": 1}).forEach(
    function(doc) {
        bulk.find({ "_id": doc._id }).update({
            "$set": { "category": doc.category.trim() }
        });
        counter = counter + 1;

        if ( counter % 1000 == 0 ) {
            bulk.execute();
            bulk = db.collection.initializeOrderedBulkOp();
        }
    }
);

if ( counter > 1 )
    bulk.execute();

Best done with bulkWrite() for modern API's which uses the Bulk Operations API ( technically everything does now ) but actually in a way that is safely regressive with older versions of MongoDB. Though in all honesty that would mean prior to MongoDB 2.6 and you would be well out of coverage for official support options using such a version. The coding is somewhat cleaner for this:

var batch = [];
db.collection.find({ "category": /^s+|s+$/ },{ "category": 1}).forEach(
  function(doc) {
    batch.push({
      "updateOne": {
        "filter": { "_id": doc._id },
        "update": { "$set": { "category": doc.category.trim() } }
      }
    });

    if ( batch.legth % 1000 == 0 ) {
      db.collection.bulkWrite(batch);
      batch = [];
    }
  }
);

if ( batch.length > 0 ) {
  db.collection.bulkWrite(batch);
  batch = [];
}

Which all only send operations to the server once per 1000 documents, or as many modifications as you can fit under the 64MB BSON limit.

As just a few ways to approach the problem. Or update your CSV file first before importing.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...