Hacker News new | past | comments | ask | show | jobs | submit login
Migrating an SQLite based django app to PostgreSQL (stevelord.tumblr.com)
27 points by iuguy on Mar 26, 2009 | hide | past | web | favorite | 13 comments

The biggest problem I have with developing on SQLite is the limited capacity to perform DB schema migrations. I use South.

SQLite doesn't provide much by way of schema changes, at least not in a manner that the Django ORM can make use of. I just run Postgres locally now, because I can be confident that any migrations I test locally will work on the production system.

It would be very nice to have a "postgreSQLite" with all functionalities of postgreSQL but could be used as sqlite when only one application use the database.

I'm not sure if this is one of the things preventing such, but it's an issue I had with our DB:

When an embedded database is interacting with something like, say, Apache, which uses fork to create multiple processes, your locks have to be multi-process-safe as opposed to multi-thread-safe. For something that's designed with single-process, but multi-threaded access in mind, that can be a bit of a chore.

Is there any difference in the logic? Locking access to a variable will work the same way, even if it's multiple processes trying to access it vs multiple threads. Is the difference just in the mechanism of the lock, or is there some type of difference in behaviour in accessing shared items between threads and processes?

In a database you have to lock access at the file level, naturally. To do multi-process locks you need to store your locks either in shared memory on in an mmap block so that they're shared between processes and then you have to deal with the case of processes crashing or being killed without releasing their locks, which can then trigger a deadlock because the lock is held by a process that doesn't exist anymore.

To resolve that you can pop up to the process tree, assuming you saved the PID along with the lock and see if that process still exists (and hope its PID hasn't been reused) and if not iterate over all locks held by that process and free them to break the deadlock, but that sort of overhead at something as fundamental as locking code can be annoying.

Also note that your semaphores then have to usually be implemented in assembly or you have to have a library that implements atomic increments of reference counts. Qt's QAtomicInt will do the trick if abused a little (just because it has an assembly implementations for multiple architectures). pthread_* doesn't have support for this as far as I know.

So, yeah, it's doable and some DBs do handle multi-process concurrent access, but it can be a pain in the ass. I believe SQLite does it by just locking complete tables on write access rather than using more fine-grained locks. MySQL embedded may also support multi-process access -- not sure.

Of the embedded databases I'm familiar with, Berkeley DB with its transaction layer does support multi-process access. It requires some careful coding but does work and detects deadlocks of the type you describe (http://www.oracle.com/technology/documentation/berkeley-db/d...). No idea if this works with language bindings other than the officially-supported C, C++, and Java.

IIRC your last point is the reason that django apps get locked when using sqlite. Depending on what you're doing you may inadvertently update things like object histories etc. whilst your app is trying to update something else.

I see the problem with the process crashing. Why do you need to implement the semaphores in assembly? Threads can pre-empt the same way as processes, no? If it works there, why would it not work across processes?

Because there are no POSIX APIs for bare atomic reference counting. Presumably pthreads implementations do implement atomic reference counting in assembly, it's just not exposed through the API. For more on Linux see:

  man futex

sorry for the stupid question. Why people use PostgreSQL instead of MySQL? Pros and Cons?

ACID: Atomicity, Consistency, Isolation, Durability

See ma.gnol.ia for an example of what can happen with a DBMS that doesn't provide this. That's a big reason why I use Postgres instead of MySQL. With Postgres, the database won't get corrupted by a partially complete operation.

On the other hand, MySQL gets better performance by skipping the whole ACID part... so if you are willing to work with this (i.e. you have a good disaster recovery strategy) then you can theoretically handle a higher load by using MySQL.

ACID isn't an excuse to skip redunancy and backups.

Good point, and I didn't mean to suggest that the one precludes the other. However, the ma.gnol.ia story is a tale of redundancy - they were backing up corrupted data for weeks, and once they realized this, it was too late. The database took weeks to fail, and their backups were worthless.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact