Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
206 views
in Technique[技术] by (71.8m points)

python - Transactions and sqlalchemy

I am trying to figure out how to insert many (in the order of 100k) records into a database using SQLAlchemy in Python 3. Everything points to using transactions. However, I am slightly confused as to how that is done.

Some pages state that you get a transaction from connection.begin(), others places say it is session.begin() and this page here says it is session.create_transaction() which doesn't exist.

Here is what I am trying to do:

def addToTable(listOfRows):
    engine = create_engine('postgresql+pypostgresql:///%s' % db,echo = False)
    Session = sessionmaker(bind = engine)
    session = Session()
    table = myTable(engine,session)

    for row in listOfRows:
       table.add(row)
    table.flush() ### ideally there would be a counter and you flush after a couple of thousand records


class myTable:

    def __init__(self,engine,session):
       self.engine  = engine
       self.session = session
       self.transaction =createTransaction()# Create transaction code here

   def add(self,row):
       newRow = tableRow(row) ## This just creates a representation of a row in the DB
       self.transaction.add(newRow)
       self.transaction.flush()

   def flush(self):
       self.transaction.commit()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I highly suggest that you do both tutorials before continuing on your trip with SQLAlchemy. They are really helpful and explain many concepts. Afterwards, I suggest you read Using the Session as this then goes on to explain how the session fits into all of this.

To your problem, there are two solutions: One using the ORM and the other using the Core. The former is easier, the latter is faster. Let's take the easy road first. A transaction is only used to wrap all your statements into a single operation. That is, if something fails, you can abort all of it and are not left with something somewhere in between. So you most likely want a transaction, but it would work without one. Here is the quickest way:

with session.begin():
    session.add_all([tableRow(row) for row in listOfRows])

Depending on your data SQLAlchemy might even be able to optimize your INSERT statement in such a way that it executes multiple at a time. Here is what's going on:

  • A transaction is started using session.begin
  • The data is added (using add_all, but a loop with multiple add would also be fine)
  • The session is committed. If something goes wrong here, the transaction will be aborted and you can fix the error.

So this is clearly a good way, but it is not the fastest way, because SQLAlchemy has to go through all the ORM algorithms which can produce somewhat of an overhead. If this is a one-time database initialization, you can avoid the ORM. In that case, instead of creating an ORM class (tableRow), you create a dictionary with all keys (how depends on the data). Again you can use a context manager:

with engine.begin() as connection:
    connection.execute(tableRow.__table__.insert().
                       values([row_to_dict(row) for row in listOfRows]))

This would most likely be slightly faster but also less convenient. It works the same way as the session above only that it constructs the statement from the Core and not the ORM.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...