Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
422 views
in Technique[技术] by (71.8m points)

cql - How to copy data from a Cassandra table to another structure for better performance

In several places it's advised to design our Cassandra tables according to the queries we are going to perform on them. In this article by DataScale they state this:

The truth is that having many similar tables with similar data is a good thing in Cassandra. Limit the primary key to exactly what you’ll be searching with. If you plan on searching the data with a similar, but different criteria, then make it a separate table. There is no drawback for having the same data stored differently. Duplication of data is your friend in Cassandra.

[...]

If you need to store the same piece of data in 14 different tables, then write it out 14 times. There isn’t a handicap against multiple writes.

I have understood this, and now my question is: provided that I have an existing table, say

CREATE TABLE invoices (
    id_invoice int PRIMARY KEY,
    year int,
    id_client int,
    type_invoice text
)

But I want to query by year and type instead, so I'd like to have something like

CREATE TABLE invoices_yr (
    id_invoice int,
    year int,
    id_client int,
    type_invoice text,
    PRIMARY KEY (type_invoice, year)
)

With id_invoice as the partition key and year as the clustering key, what's the preferred way to copy the data from one table to another to perform optimized queries later on?

My Cassandra version:

user@cqlsh> show version;
[cqlsh 5.0.1 | Cassandra 3.5.0 | CQL spec 3.4.0 | Native protocol v4]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use cqlsh COPY command :
To copy your invoices data into csv file use :

COPY invoices(id_invoice, year, id_client, type_invoice) TO 'invoices.csv';

And Copy back from csv file to table in your case invoices_yr use :

COPY invoices_yr(id_invoice, year, id_client, type_invoice) FROM 'invoices.csv';

If you have huge data you can use sstable writer to write and sstableloader to load data faster. http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...