

Database Versioning - ph0rque
http://adam.blog.heroku.com/past/2009/3/2/database_versioning/

======
psadauskas
I'm the original author and current maintainer of DataMapper's migrations. I
had a reply written up, but it ended up being really long, so I wrote a reply
blog post instead. I'm not trying to drive traffic to my blog, because I don't
care, but here's a link: <http://www.theamazingrando.com/blog/?p=86>

Essentially, DataMapper already provides the solution that Adam outlines in
his post; Replace schema.yml with DataMapper model definitions, and have the
discipline to not write data migrations. Write specs for your migrations, like
everything else, and use DM migrations’ sane versioning, rather than AR’s
irritating one, and you should be fine. There a definetly improvements to be
made with DM migrations, to be sure, but I feel like I got the underlying
design mostly right.

~~~
jpcx01
That's what I was thinking while I was reading the original post. Datamapper
solves the problem awesome, provided you think the schema should be driven
based on the generated model properties instead of magically defined in the
schema and automatically mapped.

Datamapper migrations are perfect since they focus on one time runnable data
migrations, or a bulk schema per release, instead of having a huge chain of
tiny schema tweaks that can go back and forward.

------
jasonkester
Database migration has been solved for a long time. This author is having
trouble getting his head around it because he's starting from flawed
assumptions, and is limited by his choice of tools.

Here's how you do it:

    
    
      - every database has a build number stamped into it
      - every schema change goes into a change script called build_105-106.sql
      - changes are applied to the dev database as they are made
      - the automated build pins the change script and runs it against an integration database
      - a schema diff tool is used to compare dev and integration schemas.  if they don't match, the build breaks.
      - the build creates a file called build_106-107.sql and sticks it into source control
    

There's simply no way you can get your schemas out of synch this way. If you
screw up and commit a change straight to dev without scripting out the change,
it will break the next build and it will hurt to fix it. You'll learn quickly
not to try to shortcut the process. (It's really not that hard a process
anyway).

But the big thing the author is getting wrong is that he's trying to define
schema from his application. His ORM seems to be generating both code and
schema from config files, which seems a bit silly from an outsider's
viewpoint. It's causing him a lot of pain and he doesn't understand why.

Define the schema in the schema. Generate your code from that. Problem solved.

~~~
newt0311
Thats for people using SQL. ORMs screw the entire process up. Another reason
why ORMs are a bad idea.

ORMs are basically trying to recreate SQL in their own special non-portable
immature bug-ridden manner. Thus, even the schema has to be defined in the ORM
and that is where the problem comes in.

~~~
jasonkester
Yeah, I never understood why most ORMs do things that way. Why the extra
config file to define everything? You can infer everything you need to build
an object model directly from the database schema. It's just as easy, if not
easier, to read programmatically than XML. And it's every bit as descriptive.

~~~
akeefer
If you're using a config file to define things, it's generally the case that
it's defining the schema as well and the database is built programatically on
top of it. So it's really just a matter of taste: some frameworks have you
manipulate the DB schema directly and then build the ORM representation off of
the DB, whereas others have you directly define the ORM representation and
then build the database off of it. Either way there's no real duplication, and
there's still just one place to modify if you need to change the schema/ORM
mapping.

In our case we have a config file because A) it makes it easier to abstract
the schema across different databases and B) we have extra metadata in there
above and beyond just the schema information.

~~~
jasonkester
I guess that's the luxury of being a shop that builds software vs. a shop that
builds open source ORMs. We can target a single DB and technology, and not
need to cater to thousands of angry devs with different naming conventions and
religious views as to the virtues of Postgre vs MySql.

Wrapping a relational database with an object layer is really a simple,
straightforward problem. It surprising that so many smart people feel
compelled to spend so much time and effort engineering overly complex
solutions to it.

~~~
akeefer
Well, I work at a shop that builds software, but it's installed enterprise
software, which means we don't fully control the deployment platform and have
to support multiple databases. Control of the deployment model is definitely a
huge, huge plus of a SaaS model, but the economics don't really work out for
that in our industry. We also happen to write our own (currently closed-
source) ORM framework to build on top of. One of the nice things about
building our own framework is that we only have to deal with the complexities
that we care about (i.e. our sort of performance requirements, our sort of
object graphs, the databases our customers need).

The problem with pretty much any kind of framework, ORM or not, is that
eventually it becomes bloated by being flexible enough to meet the
requirements of thousands of different developers, meaning that any given one
of them only ever uses a small percentage of the overall functional footprint.
Keeping things simple yet powerful is definitely the hardest part about it
all, especially given that different users have vastly different opinions
about what's simple or what power they need.

~~~
jasonkester
Indeed. Building frameworks for other people just doesn't sound like any fun
at all. We also rolled our own thing in house, and every once in a while we
start thinking thoughts about polishing it up and releasing it to the world.

But the amount of polish that you need to turn an in-house thing into an open
source project is just immense. And worse, by the time you have it flexible
enough to handle every possible use case, you've bloated it out to the point
where it's no fun to use anymore. Pick pretty much any off the shelf ORM to
see the end result of that path.

------
akeefer
What he's imagining is somewhat close to what we do with our currently-
internal (hopefully open source some day) ORM framework (which is on Java).
Our schema is defined as metadata in XML, and we build a checksum off the XML
to compare against what's in the database in order to know when to upgrade. To
ensure the stability of the checksum, it's built off of a filtered, sorted
version of the XML DOM tree, rather than the actual files. Not all changes to
the files will affect the checksum, since certain bits of the metadata don't
affect the data or the schema.

We do still use explicit version numbering for version triggers (our name for
migrations), since it's just too hard if the numbers aren't meaningful. If you
just used a checksum, how would you know if A248B5FC comes before or after
3F56EB2? It's a little easier if you can say "the DB is at version 23, and the
latest is version 26, so we need to run these three triggers." The database
stores both the metadata checksum as a hash and a version number; if the hash
changes without the version number changing, we at least know that's an error
and can detect it. Certain types of changes (like adding a nullable column) we
handle automatically by diffing the current schema against the actual DB
schema, while anything more complicated (and anything that touches the data)
requires an explicit trigger.

Since we build deployed enterprise software that gets upgraded on the scale of
years, not days or weeks, we also follow the course of using triggers only on
production databases. It's still not really a perfect solution for all sorts
of reasons I could go into, but it's interesting that we ended up pretty close
to what he's proposing.

------
bodhi
When I was working with migrations and ActiveRecord a couple of years ago, we
just treated the production database as authoritative. We basically used
migrations as disposable chunks of code that would transform the database from
state a to state b. If a migration from 3 months ago didn't work anymore it
really didn't matter, as the production database was well beyond that point
anyway. When we needed to test the migrations, we'd load the most recent
backup of the production database into our dev dbs and try it out.

But we did have the luxury of running a (fairly small-scale) website rather
than distributing an application.

