Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
587 views
in Technique[技术] by (71.8m points)

cql - cassandra get all records in time range

I have to work with a column family that has (user_id, timestamp) as key. In my query I would like to fetch all records in a given time range independent of the user_id. This is the exact table schema:

CREATE TABLE userlog (
  user_id text,
  ts timestamp,
  action text,
  app_type text,
  channel_name text,
  channel_session_id text,
  pid text,
  region_id text,
  PRIMARY KEY (user_id, ts)
)

I tried to run

SELECT * FROM userlog  WHERE ts >= '2013-01-01 00:00:00+0200' AND  ts <= '2013-08-13 23:59:00+0200' ALLOW FILTERING;

which works fine on my local cassandra installation containing a small data set but fails with

Request did not complete within rpc_timeout.

on the productive system containing all the data.

Is there a, preferably cql, query that runs smoothly with the given column family or de we have to change the design?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The timeout is because Cassandra is taking longer than the timeout (default is 10 seconds) to return the data. For your query, Cassandra will attempt to fetch the entire dataset before returning. For more than a few records this can easily take longer than the timeout.

For queries that are producing lots of data you need to page e.g.

SELECT * FROM userlog WHERE ts >= '2013-01-01 00:00:00+0200' AND  ts <= '2013-08-13 23:59:00+0200' AND token(user_id) > previous_token LIMIT 100 ALLOW FILTERING;

where user_id is the previous user_id returned. You will also need to page on ts to guarantee you get all the records for the last user_id returned.

Alternatively, in Cassandra 2.0.0 (just released), paging is done transparently so your original query should work with no timeout or manual paging.

The ALLOW FILTERING means Cassandra is reading through all your data, but only returning data within the range specified. This is only efficient if the range is most of the data. If you wanted to find records within e.g. a 5 minute time window, this would be very inefficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...