Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
461 views
in Technique[技术] by (71.8m points)

mongodb - Single big collection for all products vs Separate collections for each Product category

I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.

The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.

There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.

Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.

Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?

I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".

So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?

Any help would be appreciated.

question from:https://stackoverflow.com/questions/65886635/single-big-collection-for-all-products-vs-separate-collections-for-each-product

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As you mentioned, you need to play with your data and use-case. You will have better picture.

Some decisions required as below.

  1. Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data

  2. Decide the number of indices required.

  3. Decide the number of writes, reads per second.

  4. Decide the size of documents per category.

  5. Decide the query pattern.

Some inputs based on the requirements

  1. If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.

  2. As you have different set of fields per category, you could try with multiple collections. There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.

  3. If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.

  4. If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.

  5. Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.


  1. The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.

  2. As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.

If any of my points are incorrect, please let me know.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...