Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
379 views
in Technique[技术] by (71.8m points)

python - Does sqlite3 compress data?

I've got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.

The csv has around 150,000,000 rows. It has header:

tkey,ipaddr,healthtime,numconnections,policystatus,activityflag

And each row looks something like

261846,172.10.28.15,2012-02-03 16:15:00,22,1,1

The script uses healthtime to split the data into tables 192 tables

When I first saw these numbers, I assumed I had made an error someplace. How big of a reduction in file size should I expect from the added efficiency of only writing the healthtime 192 times instead of 150,000,000 times?

EDIT: As soon as I posted this I realized the answer. I'm removing about 40% of the string, hence the 40% reduction in size.

Edit 2 Let's calculate the difference in size between the plain text:

"261846,172.10.28.15,2012-02-03 16:15:00,22,1,1" 

And the data base entry:

db(261846,'172.10.28.15',22,1,1)

First of all, we drop from 46 to 26 characters in plain text representation.

The remaining characters are:

"261846,172.10.28.15,22,1,1"

or 26 bytes. If each integer needs to be stored in 32 bit (4 bytes), then we have:

12 bytes (ipaddr) + 4 bytes * 4 (integer fields) = 28 bytes.

So it looks like converting to integers makes the storage slightly less efficient, and all of my gains come form reducing the number of characters stored in each row.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

Here are more details on the SQL Database File Format if you are interested.

Does that make sense?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...