Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
351 views
in Technique[技术] by (71.8m points)

cassandra - Query results not ordered despite WITH CLUSTERING ORDER BY

I am storing posts from all users in table. I want to retrieve post from all users the user is following.

CREATE TABLE posts (
  userid int,
  time timestamp,
  id uuid,
  content text,
  PRIMARY KEY (userid, time)
)WITH CLUSTERING ORDER BY (time DESC)

I have the data about who all user follows in another table

CREATE TABLE follow (
  userid int,
  who_follow_me set<int>,
  who_i_follow set<int>,
  PRIMARY KEY ((userid))
) 

I am making query like

select * from posts where userid in(1,2,3,4....n);

2 questions:

  1. why I still get data in random order, though CLUSTERING ORDER BY is specified in posts. ?
  2. Is model correct to satisfy the query optimally (user can have n number of followers)?

I am using Cassandra 2.0.10.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

"why I still get data in random order, though CLUSTERING ORDER BY is specified in posts?"

This is because ORDER BY only works for rows within a particular partitioning key. So in your case, if you wanted to see all of the posts for a specific user like this:

SELECT * FROM posts WHERE userid=1;

That return your results ordered by time, as all of the rows within the userid=1 partitioning key would be clustered by it.

"Is model correct to satisfy the query optimally (user can have n number of followers)?"

It will work, as long as you don't care about getting the results ordered by timestamp. To be able to query posts for all users ordered by time, you would need to come up with a different partitioning key. Without knowing too much about your application, you could use a column like GROUP (for instance) and partition on that.

So let's say that you evenly assign all of your users to eight groups: A, B, C, D, E, F, G and H. Let's say your table design changed like this:

CREATE TABLE posts (
  group text,
  userid int,
  time timestamp,
  id uuid,
  content text,
  PRIMARY KEY (group, time, userid)
)WITH CLUSTERING ORDER BY (time DESC)

You could then query all posts for all users for group B like this:

SELECT * FROM posts WHERE group='B';

That would give you all of the posts for all of the users in group B, ordered by time. So basically, for your query to order the posts appropriately by time, you need to partition your post data on something other than userid.

EDIT:

PRIMARY KEY (userid, follows)) WITH CLUSTERING ORDER BY (created DESC);

That's not going to work. In fact, that should produce the following error:

code=2200 [Invalid query] message="Missing CLUSTERING ORDER for column follows"

And even if you did add follows to your CLUSTERING ORDER clause, you would see this:

code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"

The CLUSTERING ORDER clause can only be used on the clustering column(s), which in this case, is only the follows column. Alter your PRIMARY KEY definition to cluster on follows (ASC) and created (DESC). I have tested this, and inserted some sample data, and can see that this query works:

aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2 AND follows=1;

 userid | follows | created                  | id
--------+---------+--------------------------+--------------------------------------
      2 |       1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
      2 |       1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
      2 |       1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4

(3 rows)

Although, if you want to query by just userid you can see posts from all of your followers. But in that case, the posts will only be ordered within each followerid, like this:

aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2;

 userid | follows | created                  | id
--------+---------+--------------------------+--------------------------------------
      2 |       0 | 2015-01-25 13:28:00-0600 | 94da27d0-e91f-4c1f-88f2-5a4bbc4a0096
      2 |       0 | 2015-01-25 13:23:00-0600 | 798053d3-f1c4-4c1d-a79d-d0faff10a5fb
      2 |       1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
      2 |       1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
      2 |       1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4

(5 rows)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...