Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
208 views
in Technique[技术] by (71.8m points)

python - Database on the fly with scripting languages

I have a set of .csv files that I want to process. It would be far easier to process it with SQL queries. I wonder if there is some way to load a .csv file and use SQL language to look into it with a scripting language like python or ruby. Loading it with something similar to ActiveRecord would be awesome.

The problem is that I don't want to have to run a database somewhere prior to running my script. I souldn't have additionnal installations needed outside of the scripting language and some modules.

My question is which language and what modules should I use for this task. I looked around and can't find anything that suits my need. Is it even possible?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There's sqlite3, included into python. With it you can create a database (on memory) and add rows to it, and perform SQL queries.

If you want neat ActiveRecord-like functionality you should add an external ORM, like sqlalchemy. That's a separate download though

Quick example using sqlalchemy:

from sqlalchemy import create_engine, Column, String, Integer, MetaData, Table
from sqlalchemy.orm import mapper, create_session
import csv
CSV_FILE = 'foo.csv'
engine = create_engine('sqlite://') # memory-only database

table = None
metadata = MetaData(bind=engine)
with open(CSV_FILE) as f:
    # assume first line is header
    cf = csv.DictReader(f, delimiter=',')
    for row in cf:
        if table is None:
            # create the table
            table = Table('foo', metadata, 
                Column('id', Integer, primary_key=True),
                *(Column(rowname, String()) for rowname in row.keys()))
            table.create()
        # insert data into the table
        table.insert().values(**row).execute()

class CsvTable(object): pass
mapper(CsvTable, table)
session = create_session(bind=engine, autocommit=False, autoflush=True)

Now you can query the database, filtering by any field, etc.

Suppose you run the code above on this csv:

name,age,nickname
nosklo,32,nosklo
Afila Tun,32,afilatun
Foo Bar,33,baz

That will create and populate a table in memory with fields name, age, nickname. You can then query the table:

for r in session.query(CsvTable).filter(CsvTable.age == '32'):
    print r.name, r.age, r.nickname

That will automatically create and run a SELECT query and return the correct rows.

Another advantage of using sqlalchemy is that, if you decide to use another, more powerful database in the future, you can do so pratically without changing the code.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...