Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
544 views
in Technique[技术] by (71.8m points)

nosql - Cassandra column key auto increment

I am trying to understand Cassandra and how to structure my column families (CF) but it's quite hard since I am used to relational databases.

For example if I create simple users CF and I try to insert new row, how can I make an incremental key like in MySQL?

I saw a lot of examples where you would just put the username instead of unique ID and that would make a little sense, but what if I want users to have duplicated usernames?

Also how can I make searches when from what I understand cassandra does not suport > operators, so something like select * from users where something > something2 would not work.

And probably the most important question what about grouping? Would I need to retrieve all data and then filter it with whatever language I am using? I think that would slow down my system a lot.

So basically I need some brief explanation how to get started with Cassanda.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your questions are quite general, but let me take a stab at it. First, you need to model your data in terms of your queries. With an RDBMS, you model your data in some normalized form, then optimize later for your specific queries. You cannot do this with Cassandra; you must write your data the way you intend to read it. Often this means writing it more than one way. In general, it helps to completely shed your RDBMS thinking if you want to work effectively with Cassandra.

Regarding keys:

  • They are used in Cassandra as the unit of distribution across the ring. So your key will get hashed and assigned an "owner" in the ring. Use the RandomPartitioner to guarantee even distribution

  • Presuming you use RandomPartitioner (you should), keys are not sorted. This means you cannot ask for a range of keys. You can, however, ask for a list of keys in a single query.

  • Keys are relevant in some models and not in others. If your model requires query-by-key, you can use any unique value that your application is aware of (such as a UUID). Sometimes keys are sentinel values, such as a Unix epoch representing the start of the day. This allows you to hand Cassandra a bunch of known keys, then get a range of data sorted by column (see below).

Regarding query predicates:

  • You can get ranges of data presuming you model it correctly to answer your queries.

  • Since columns are written in sorted order, you can query a range from column A to column n with a slice query (which is very fast). You can also use composite columns to abstract this mechanism a bit.

  • You can use secondary indexes on columns where you have low cardinality--this gives you query-by-value functionality.

  • You can create your own indexes where the data is sorted the way you need it.

Regarding grouping:

I presume you're referring to creating aggregates. If you need your data in real-time, you'll want to use some external mechanism (like Storm) to track data and constantly update your relevant aggregates into a CF. If you are creating aggregates as part of a batch process, Cassandra has excellent integration with Hadoop, allowing you to write map/reduce jobs in Pig, Hive, or directly in your language of choice.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...