Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
575 views
in Technique[技术] by (71.8m points)

amazon web services - Querying DynamoDB by date

I'm coming from a relational database background and trying to work with amazon's DynamoDB

I have a table with a hash key "DataID" and a range "CreatedAt" and a bunch of items in it.

I'm trying to get all the items that were created after a specific date and sorted by date. Which is pretty straightforward in a relational database.

In DynamoDB the closest thing i could find is a query and using the range key greater than filter. The only issue is that to perform a query i need a hash key which defeats the purpose.

So what am I doing wrong? Is my table schema wrong, shouldn't the hash key be unique? or is there another way to query?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Given your current table structure this is not currently possible in DynamoDB. The huge challenge is to understand that the Hash key of the table (partition) should be treated as creating separate tables. In some ways this is really powerful (think of partition keys as creating a new table for each user or customer, etc...).

Queries can only be done in a single partition. That's really the end of the story. This means if you want to query by date (you'll want to use msec since epoch), then all the items you want to retrieve in a single query must have the same Hash (partition key).

I should qualify this. You absolutely can scan by the criterion you are looking for, that's no problem, but that means you will be looking at every single row in your table, and then checking if that row has a date that matches your parameters. This is really expensive, especially if you are in the business of storing events by date in the first place (i.e. you have a lot of rows.)

You may be tempted to put all the data in a single partition to solve the problem, and you absolutely can, however your throughput will be painfully low, given that each partition only receives a fraction of the total set amount.

The best thing to do is determine more useful partitions to create to save the data:

  • Do you really need to look at all the rows, or is it only the rows by a specific user?

  • Would it be okay to first narrow down the list by Month, and do multiple queries (one for each month)? Or by Year?

  • If you are doing time series analysis there are a couple of options, change the partition key to something computated on PUT to make the query easier, or use another aws product like kinesis which lends itself to append-only logging.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...