Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

hashtable - What is the use of a Hash range in a dynamodb table?

I am new to dynamodb (ddb). I was going through its documentation and it says to add Hash Key and a Hash Range key. In the documentation it says that ddb will create an usorted index on the hash key and a sorted index on the hash range.

What is the purpose of having these 2 keys rather than just one key. Is it because the first key is used like : A HashTable which contains : key - range of keys for each value in the hash range

2nd HashTable hash range key - Actual data value.

This would help segregate data and make lookup fast. But then why only 2 levels of HashMaps, I could do this for n number of layers and get the faster lookups.

Thank you in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Q:"What is the purpose of having these 2 keys rather than just one key?"

In terms of the Data Model, the Hash Key allows you to uniquely identify a record from your table, and the Range Key can be optionally used to group and sort several records that are usually retrieved together. Example: If you are defining an Aggregate to store Order Items, the OrderId could be your Hash Key, and the OrderItemId the Range Key. You can find below a formal definition for the use of these two keys:

"Composite Hash Key with Range Key allows the developer to create a primary key that is the composite of two attributes, a 'hash attribute' and a 'range attribute.' When querying against a composite key, the hash attribute needs to be uniquely matched but a range operation can be specified for the range attribute: e.g. all orders from Werner in the past 24 hours, or all games played by an individual player in the past 24 hours." [VOGELS]

So the Range Key adds a grouping capability to the Data Model, however, the use of these two keys also have an implication on the Storage Model:

"Dynamo uses consistent hashing to partition its key space across its replicas and to ensure uniform load distribution. A uniform key distribution can help us achieve uniform load distribution assuming the access distribution of keys is not highly skewed." [DDB-SOSP2007]

Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.

Q:"But then why only 2 levels of HashMaps? I could do this for n number of layers and get the faster lookups."

Having many layers of lookups will add exponential complexity to effectively run the database in a cluster environment , which is one of the most essential use cases for the majority of NOSQL databases. The database has to be highly available, failure-proof, effectively scalable, and still perform in a distributed environment.

"One of the key design requirements for Dynamo is that it must scale incrementally. This requires a mechanism to dynamically partition the data over the set of nodes (i.e., storage hosts) in the system. Dynamo’s partitioning scheme relies on consistent hashing to distribute the load across multiple storage hosts."[DDB-SOSP2007]

It is always a trade off, every single limitation that you see in NOSQL databases are most likely introduced by the storage model requirements. Although Relational Databases are very flexible in terms of data modeling they have several limitations when it comes to run in a distributed environment.

Choosing the correct keys to represent your data is one of the most critical aspects during your design process, and it directly impacts how much your application will perform, scale and cost.


Footnotes:

  • The Data Model is the model through which we perceive and manipulate our data. It describes how we interact with the data in the database [FOWLER]. In other words, it is how you abstract your data model, the way you group your entities, the attributes that you choose as primary keys, etc

  • The Storage Model describes how the database stores and manipulates the data internally [FOWLER]. Although you cannot control this directly, you can certainly optimize how the data is retrieved or written by knowing how the database works internally.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...