Beware that for strict compliance with Python's DB-API 2.0 the database driver needs to obey weird rules about transactions:
* All statements start a transaction automatically if one isn't already open.
* You need to call .commit() or .rollback() on the connection object to do the commit or rollback.
* The driver can optionally support an "autocommit mode" where, instead of the above rules, every individual statement is wrapped in its own transaction and committed immediately.
That's the theory. Many database drivers, including the built-in SQLite driver, differ in practice (for example, it doesn't automatically start a transaction for a SELECT statement).
The biggest common practical difference is an implementation detail of so-called "autocommit mode". Often that is NOT implemented by actually issuing a "BEGIN TRANSACTION" before and "COMMIT" after every statement. Instead, it often just issues the statement without any automatic transaction statements whatsoever, relying on the fact that the underlying database engine will immediately commit any statements outside an explicit transaction (i.e. has its own autocommit mode).
That means if you do want complete manual control of transactions in a Python DB-API driver (including sqlite), then - confusingly - you need to set the autocommit option to true!
It's not clear to me whether/how this applies to pg8000 driver in the OP. There is a section on autocommit but it doesn't explicitly say what it does.
I maintain pg8000, and it has two APIs: the standard DB-API 2 accessed with pg8000.dbapi and its own native API at pg8000.native.
The native API is a lot simpler, and so I'd use that if you can. The native API is really my suggestion for the next version of DB-API, so DB-API 3. I'd be interested to know people's thoughts on this, and maybe we could put together a draft specification?
The existence of the autocommit feature in DB drivers is something I never really understood: why an application developer would want to relinquish the direct control of one of the most powerful features of a relational DB?
Unless one's interaction with the database are really basic (i.e. a single SELECT statement), one would want to be as explicit as possible regarding where transaction boundaries are.
Having a default that hides this aspect from my code always seemed a false convenience to me.
edit: this is just a tangential reflection inspired by your comment, not an answer / nor I wanted to detract from your fact statement.
I would instead look at it as turning autocommit "off" is more like turning "automatic transaction begin" "on", which is just a strange semantic. I could appreciate a mechanism that insisted "transactions must always be explicit", but that isn't what autocommit controls, on or off. (You also wouldn't want such unless you were assured that you could put the transaction boundaries inside of multi-statement statements, as you really really want to minimize round trips there as otherwise you can end up--even on the same machine--drastically harming your lock contention by having transactions that last thousands of times longer than the single statement they wrapped... at which point the API is going to get awkward trying to enforce the transaction boundaries... autocommit is essentially just what you should want as the solution to all of this.)
I think this illustrates my point ... but I'm not 100% sure what you mean by "autocommit mode" in your comment! You said "autocommit feature in DB drivers" but I think you mean in the DB itself by the sounds of it.
AFAIK autocommit mode just applies if you don't explicitly open a transaction, so what would you propose instead? That a transaction is silently opened but not committed (potentially leaving a dangling transaction open for a long time after SELECT statement) (like Python's odd default behaviour)? Or an immediate error?
I can certainly see that if you are indeed wrapping every individual statement in their own BEGIN/COMMIT pairs that you'd wonder why the database can't help you for such simple use cases. You can still selectively open an explicit transaction on those statements that do need to be grouped together, so I don't see that you've given up any control.
There are a lot of people who use databases without really understanding anything about them. They mostly won't understand why it is important to control time and execute multiple statements at the same logical moment in database world. That is a difficult concept to intuit naturally, people need to be told and they often don't have time or a teller.
Databases that only do the technically-correct thing are weeded out because they lose the big supportive cloud of novices that experts grow out of. So the good databases support the intuitive way and the the correct way of manipulating data.
There are more computing concepts to learn than there are hours to learn it in. People use systems they don't understand and systems need to support that.
Up until 10ish years ago, the most popular SQL DB didn't support transactions in the default setup. So lots of apps got written without relying on having transactions, either for compatibility reasons or because the devs grew up on said nontransactional DB. So autocommit was an easy way to emulate transactionless SQL on the rest of the DBs.
In practice autocommit works very well. In many workloads most transactions consist of just one query. Additional if you have autocommit it is harder to accidentally create a long running transaction which is a good thing since long running transactions can really tank your performance.
Delighted to see this. Installation of psycopg2 can be a nuisance thanks to the native compilation steps which don't always work as planned and rely on specific versions of libssl.
To all those wondering _why_ that is the recommendation, I did some digging and the answer is "the binary package is compiled against one version of libssl, and if your system happens to use a different one, in some weird edge cases it will lead to a segfault"
Yes, sorry about that and of course the edit window has expired :-( Now I'm even more glad I looked up the URL or folks might have been even more confused
Question, what advantages does a pure python driver give you over the competition? Is it multi threading support? (psycopg2 does not support cursors over threads)
Thanks for the pointer, I wasn't aware of this! I was wondering what the problem is, because I use psycopg2-binary in production without problems for ages. It seems the danger of psycopg2-binary is that one can end up with segfaults, due to the libpq library (and hence psycopg2) might use a different libssl than the system one (use by Python, with e.g. urllib and requests): https://www.psycopg.org/articles/2018/02/08/psycopg-274-rele...
Agreed.
It also unlocks another level of optimisation
- the human one -
I'm much better at reasoning my way through code improvements when I don't have to manage dealing with switching language/runtime models in the middle of the code :-)
That’s surprising if your app spent most of the time in database calls, since 1) native code is still faster than pypy and 2) most of the time spent in database libraries is spent waiting for the database to return.
The programs were ETLs that are generating big REPLACE INTO statements.
Different databases and drivers do prepared statements in different ways. Postgres, for example, can receive the statement and the parameters and do the interpolation in the server. The params still need to be serial used by the client and the template isn’t reused, so it’s no big perf win.
MySQL, on the other hand, needs the driver to do the interpolation. The server only knows how to execute sql strings.
It turns out it is far better to prepare the sql in pure python where pypy knows all about types etc than to have the c MySQL driver do it.
Exactly the same thing plays out in java. The MySQL java drivers are staggeringly inefficient at large prepared statements, and in my profiling massive amounts of time is spent in java functions in the MySQL driver that are trying to work out where the next question-mark is. Sadly the drivers are the poo they are. Iirc the mariadb java drivers are rather more optimized.
> That’s surprising if your app spent most of the time in database calls, since 1) native code is still faster than pypy and 2) most of the time spent in database libraries is spent waiting for the database to return
I would argue a speedup (though maybe not a “massive” one) from pure python over a native extension is not that surprising, specifically because of (2), which makes it possible for the runtime-extension interface overhead to be more significant than (1). Native extensions tend to give a speed boost for computationally intensive tasks, but can’t really wait any faster, and aren’t free of overhead.
Interesting. What sort of optimizations does PyPy do? Could it be that pure python unlocked more options in that department due to not needing to preserve the native bindings?
* All statements start a transaction automatically if one isn't already open.
* You need to call .commit() or .rollback() on the connection object to do the commit or rollback.
* The driver can optionally support an "autocommit mode" where, instead of the above rules, every individual statement is wrapped in its own transaction and committed immediately.
That's the theory. Many database drivers, including the built-in SQLite driver, differ in practice (for example, it doesn't automatically start a transaction for a SELECT statement).
The biggest common practical difference is an implementation detail of so-called "autocommit mode". Often that is NOT implemented by actually issuing a "BEGIN TRANSACTION" before and "COMMIT" after every statement. Instead, it often just issues the statement without any automatic transaction statements whatsoever, relying on the fact that the underlying database engine will immediately commit any statements outside an explicit transaction (i.e. has its own autocommit mode).
That means if you do want complete manual control of transactions in a Python DB-API driver (including sqlite), then - confusingly - you need to set the autocommit option to true!
It's not clear to me whether/how this applies to pg8000 driver in the OP. There is a section on autocommit but it doesn't explicitly say what it does.