Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

python - How to continue insertion after duplicate key error using PyMongo

If I need to insert a document in MongoDB if it does not exist yet

db_stock.update_one(document, {'$set': document}, upsert=True)

.will do the job (feel free to correct me if I am wrong)

But if I have a list of documents and want to insert them all what would be a best way of doing it?

There is a single-record version of this question but I need an en mass version of it, so it's different.

Let me reword my question. I have millions of documents, few of which can be already stored. How do I store remaining ones in MongoDB in a matter of seconds, not minutes/hours?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to use insert_many method and set the ordered option to False.

db_stock.insert_many(<list of documents>)

As mentioned in the ordered option documentation:

ordered (optional): If True (the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. If False, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted.

Which means that insertion will continue even if there is duplicate key error.

Demo:

>>> c.insert_many([{'_id': 2}, {'_id': 3}])
<pymongo.results.InsertManyResult object at 0x7f5ca669ef30>
>>> list(c.find())
[{'_id': 2}, {'_id': 3}]
>>> try:
...     c.insert_many([{'_id': 2}, {'_id': 3}, {'_id': 4}, {'_id': 5}], ordered=False)
... except pymongo.errors.BulkWriteError:
...     list(c.find())
... 
[{'_id': 2}, {'_id': 3}, {'_id': 4}, {'_id': 5}]

As you can see document with _id 4, 5 were inserted into the collection.


It worth noting that this is also possible in the shell using the insertMany method. All you need is set the undocumented option ordered to false.

db.collection.insertMany(
    [ 
        { '_id': 2 }, 
        { '_id': 3 },
        { '_id': 4 }, 
        { '_id': 5 }
    ],
    { 'ordered': false }
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...