Hacker Newsnew | comments | show | ask | jobs | submit login

I can't second that recommendation enough. Any db migration project that uses sequence numbers to label/sort migrations is badly broken for any team larger than 1.

I've used enough of these migration systems now that I'm convinced any system using sequence numbers likely has other, less noticeable problems since the problem space is complex and this is a known best practice of which the developers somehow remained ignorant.




I can't disagree enough. Time is too brittle of a system to rely on across various machines, especially since you can't be certain all machines can correctly report back the same UTC time. I've seen too many devs using VMs with way out of whack timestamps (VM's clock pauses when it's suspended, and they either forget, or didn't set up a hook to ntp). Also, when is the timestamp set? When the migration code is first written? That could lead to problems when the migration lives in a feature branch I started a month ago. Now that it's time to merge into master, I need to pull in upstream migrations and reorder my own migration to follow those. Using a timestamp here doesn't give any benefit over sequential ordering.

Second, if you have multiple migrations created on the same table by two devs that weren't aware of the others actions, this is exactly when a merge conflict should arise. This is probably less of a deal when both are schema migrations, but if both devs included a data migration, the result could be catastrophic if both migrations touch the same data. Human intervention is necessary at this point to make sure the order the system decided for the migrations is fine.

Third, if you've got a big team and either you botched your planning so much that you need a ton of migrations, or your policy freely allows everyone to create migrations without communicating with each other, you've got some major problems with your team.

-----


The timestamp itself being accurate or not really does not matter at all, at least in the majority of use cases.

The idea is just to avoid a filename conflict which, even though the fix may be easy enough, can cause confusion for both the humans (developers) and for the schema management system itself (they usually just keep track of which versions have / have not been run).

Even just incrementing new migrations by random(0,1000) would, at least in theory, resolve the issue for the most part.

It just seems a lot easier to go ahead & use timestamps though....

-----


Agree on all points. Timestamp is really no better than sequence number, and can cause bigger WTF moments with its automated "conflict resolution".

So, forget about timestamp and sequence number. GUID is the way to go:

http://alembic.readthedocs.org/en/latest/tutorial.html#worki...

-----


Maybe you should have a look to liquibase (http://www.liquibase.org/) then. Although it uses sequence to label the migration changeset, it does not rely on them to ensure that the DB is in a correct state. It relies on a hash of the various changesets so that if a changeset is updated while already applied, the system will complain about it.

-----


That's a very interesting solution, I like the sound of it. I will check it out.

-----


Just make real sure you don't change the formatting of a hard-coded SQL command or you'll have to add a 'validCheckSum' attribute to cover up for your past mistakes.

Otherwise I can also recommend liquibase, especially since they've since added support for non-SQL (JVM code) updates and rollbacks as well.

-----


My DB migration system Phinx (http://phinx.org) uses timestamps. I had too many problems with using sequence numbers in the past.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: