Hacker News new | past | comments | ask | show | jobs | submit login

> if your application is architected such that it leaves transactions, implicit or otherwise, open for extended periods.

Can you give any good reason why you need to leave transactions open for extended periods? In my experience, it only happens when the developer in question does not understand the semantics of the DBMS and therefore hasn't suitably designed their solution, in which case it's hardly the DBMS's fault when things go wrong.

I've rewritten plenty of code that "needed" a transaction to be open for a long time into code requiring a transaction for a few milliseconds (e.g. in a bulk import, parsing, caching lookups and reference data outside the transaction, building an XML document to supply into a SP as a parameter which only used a transaction for a single INSERT from the XML).




Can you give any good reason why you need to leave transactions open for extended periods?

Offhand, other than xid-level consistency for a backup as mentioned by the sibling post, no. [1] No, this case was pretty clearly the engineers not understanding what they were doing, as evidenced by it being easier to turn off autocommit, and add explicit BEGIN statements where they were needed, than add explicit ROLLBACKs everywhere they were needed.

That makes an implicitly transactional adapter seem a bit of a foot-gun to me, though. At a minimum, I think it places an unnecessary burden on engineers to have to "roll back" every time they even ask the db something. That's how memory leaks happen, too, and that's why we generally think garbage-collected languages are a Good Idea.

[1] That said, my preferred backup strategy is to take a filesystem-level snapshot of the db volumes, mount that, and start a second Postgres instance against it. It'll "recover" from an "unclean shutdown", and then you can take your backup without incurring the penalty of holding open a transaction that long in the presence of concurrent writes.

You do pay some write-performance penalty for the snapshot while it's open, but upon releasing it, the disk pages are immediately gc-ed, and you're already taking your backup at a low-traffic point anyway, right? Disk pages in a Postgres cluster allocated because vacuuming was hindered by holding an xid for extended periods are merely autovacuumed, however; they're reclaimed for later use, not released.


I'm not familiar with PG but I am planning on using it in my current project, which is why I'm interested in understanding whether these are really DBMS issues.

In the .NET world, for all the database adapters I've used (MSSQL, SQLCE, Oracle, MDB), individual statements are implicitly wrapped in transactions when no explicit transaction has been specified - not sessions/connections.

I agree the "implicit transaction per session/connection" design is confusing and likely to result in bugs. While the standard .NET approach limits bugs, the downside is poorer performance when an unaware developer uses numerous implicit transactions for multiple statements where a single explicit transaction would suffice.

Experts in the area appear to recommend always using explicitly managing transactions (e.g. http://ayende.com/blog/3775/nh-prof-alerts-use-of-implicit-t... ), and this is the approach I follow in my own work now.


Can you point to any open source projects or other examples implementing this pattern? I think I understand where you're coming from on this but I have not seen many examples of this approach in the wild (except religiously using TransactionScope, eg. http://amnesia.codeplex.com/ ).

I'm wondering where it's best to manage the transactions... would they ever be in raw SQL or stored procedures instead of .NET code?


The PostgreSQL official C client library, libpq, (which most language specific wrappers are based on) works the same way as you say the .NET adapters do. Every statement is by default implicitly wrapped in a transaction until you run "BEGIN;".


>In the .NET world, for all the database adapters I've used (MSSQL, SQLCE, Oracle, MDB), individual statements are implicitly wrapped in transactions when no explicit transaction has been specified - not sessions/connections.

That is how the postgresql API works as well. Most high level language modules built on top of it also work that way. Unfortunately python's DB API is not one of them, but you can just set a config option to "act like everyone expects".


The only reason I can see is for taking consistent database backups. But then your transaction should also use a readonly snapshot (pg_dump will do this for you).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: