Hacker News new | past | comments | ask | show | jobs | submit login

A couple of real world data points.

Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions. So the migrations would run, and then fail during a prod deployment for various reasons. Most notably a dependent script hadn't run in prod yet. They had run in the Dev/QA environments and everything had run just fine. Or Dev/QA had been hand tweaked and that change hadn't made it to the script during prod deployment because they were testing it for a month to see performance before moving to prod, etc... etc...

It was all ugly.

In one of the two companies they finally decided to stop handwriting SQL and started trusting the Django migration system, and the problem with deployments went away.

AND they still were able to hand write sql to optimize the performance of the DB if needed.




I truly believe the Django migrations system eliminates the need for several full-time DBA positions at larger companies that use it. Things that would take a dedicated team of DBAs days to do are automated so reliably that you don't even realize the amount of engineering effort it would take to do manually.


What things does it do besides update the schema, or run a custom python script?


A few other things which are easy to miss manually: identify when you have model changes which are not reflected in the migrations, ensure that the migrations always run, and fail loudly if someone tries to run migrations which started at the same point but haven’t been explicitly merged (which is really handy switching branches or on fast-moving projects with multiple features in-flight simultaneously).

None of this is magic which you can’t do by hand but there’s a substantial benefit in being able to rely on it always happening and not having to spend any time thinking about it in most cases.


"Data QA" isn't a term I hear much of, but it's a super important piece of migrations. A migration that runs successfully is the equivalent of a program compiling successfully.


It does exactly that, in a reliable and repeatable manner. Especially if you are deploying to multiple targets (dev, staging, production) you don't care about which state each server is in.


>Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

> The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions. So the migrations would run, and then fail during a prod deployment for various reasons. Most notably a dependent script hadn't run in prod yet. They had run in the Dev/QA environments and everything had run just fine. Or Dev/QA had been hand tweaked and that change hadn't made it to the script during prod deployment because they were testing it for a month to see performance before moving to prod, etc... etc...

How do companies get this right? My organization has all these problems as well and I lose 10-15% of my time chasing misplaced SQL scripts or SQL scripts with improper tweaks.


Check sql migrations into version control, all changes get checked in.

Run them in a predefined order with a migration tool (something as simple as file name order is fine).

Save metadata in the db to ensure migrations are not run twice and keep a record (again migration tools can do this).

Test migrations multiple times prior to deploy using a sanitized copy of production.

Any of these steps would help, all of them would eliminate your problems.


Django does this.


Django does some of these things (most migration tools do), but my understanding is it guides you in other directions (automated code-based migrations based on models as a source of truth), so if you want simple sql migrations you don't need (or even perhaps want) this part of django.

That's what the article explores.


Django migrations solve this by asking for a dependency graph, and verifying that there's only one leaf node, so to speak.

So if you merge in separate scripts you will need to linearize by hand.

On top of this, there's a table within the database tracking which migrations are run, so you don't accidentally rerun these for example.


What will work in a lot of circumstances is to put your custom scripts into empty Django migrations. Then they'll get run as needed. If you're feeling fancy you can write the backwards script as well.


There are tools that help this: Visual Studio Data Tools, RoundhousE, Flyway/Redgate - I highly recommend investing time in building them into your workflows.


Alembic is pretty great for this as well


They aren't going to give us time to implement those, but thanks. Shall implement them in my own personal projects.


I use a simple framework-agnostic tool for tracking and applying migrations, it also supports dependencies between migrations. Migrations are automatically applied on deploy to stage/production, so you don’t forget to do it.

I prefer this approach very much and even wrote about it here https://vsevolod.net/migrations/


> Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

> The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions.

You still need to have a proper database migration system, even if migrations are defined using hand-written SQL. I can see why someone would prefer migrations written in SQL and there are plenty of migration system that support that workflow.


I've experienced the exact opposite - no end of problems with Rails/Django/Alembic migrations. Abandoning version numbers and chains of migration files for a diff-based approach that works directly on the database level made things so much simpler and more reliable.

Comparing your goal state explicitly with the current production state means you can typically autogenerate the changes you need, and once applied you can compare again to make sure everything matches perfectly.

Everything becomes reduced to:

production state + pending changes = goal state

Where "goal state" might be the state of your ORM models.

I did a talk at PostgresOpen a while back about the problems with Django-style migrations and how to do better: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...


I've done automated diff-based migrations before using SQLDelta. It caused me a lot of trouble and I think it's a bad approach.

These diff tools just don't have the information they need to solve the data problems that are the actual hard part of schema migrations.

Simple example: I want to concatenate these two text columns into one.

I do think SQLDelta and similar tools can still be useful after doing the real migration using incremental scripts. You can use them to check for problems, and to fix up easy things like views and stored procedures.


Of course they don't automatically solve data migrations for you, I'm not sure how any tool could possibly guess your intent like that.

If you have an example of how a chain-based migration tool makes this easier, I'd love to hear about it.


I've never worked on anything that separated data migrations from schema migrations, so I suspect I'm not quite understanding you, but...

Your data migration is just part of your chain of incremental patches?

For my example you just write:

    ALTER TABLE my_table ADD COLUMN my_column TEXT NOT NULL DEFAULT '';
    UPDATE TABLE my_table SET my_column = my_old_column_a || my_old_column_b;
    ALTER TABLE my_table ALTER COLUMN MY_COLUMN DROP DEFAULT;


Beyond a fairly small scale you have to separate data migrations from schema migrations because they just take too damn long.


I believe it and it makes sense, but I've never encountered it myself.

Sounds like your definition of "small scale" encompasses every project I've ever worked on. :D


Right, that's exactly what you might add to a diff-generated script to do the same thing.

So it doesn't inherently seem better or worse to me.


The difference is that - if I understood right - you're generating your diff just before you actually do the migration.

By comparison, the incremental migration scripts are written at the same time as the feature (I like to do them as step 1 at lot of the time) and by the same developer, who currently has the problem in their head.


I don't know of anybody doing that - unless you had automated checks that it was a fast and non-destructive operation, that would be highly risky.

I'm simply talking about using the current production database as a basis for generating the required changes, rather than "here's what I think my production database currently looks like based on a version number and these 75 chained migration scripts I've run on it in the past"


I'm still confused. When do you generate your diff then?

I think migrations ought to be done by the developer at the same time that they write their other code changes, because they have all the context at that time.

However, it doesn't seem like that could work for diffs, because the version of the production database you're diffing against may not be the same as the version you end up deploying to.


Generate it whenever you prefer, probably with the rest of the code changes as you say.

Immediately before applying, check that the database is still in the same state as it was when you generated the diff script. If yes, apply. If no, abort.


I like the idea but haven't been able to make it work in practice. Diffing produces migrations that just can't be run on big production DBs and it just doesn't work properly if you're using schema based partitioning or have partitioning which depends on the inserted data.


These problems are definitely solvable with proper tooling. The specific challenges vary between RDBMS though. For example when running schema changes on large-scale MySQL/MariaDB, the key things are supporting external OSC tools, and supporting sharding. I wrote a diff-based schema management system, https://skeema.io, designed with precisely these concerns in mind.


Which non-diff based migration tools automatically generates migrations immediately suitable for big production databases? I don't think any tool is capable of that, certainly none I've heard of.


> production state + pending changes = goal state

That's exactly what Django is doing, btw.

> I did a talk at PostgresOpen a while back about the problems with Django-style migrations

Good talk generally, and a lot of stuff I agree with.

If the migrations sit out for a year in the code base, it's not a big deal though. It looks like a big deal, but in practice, it's not. Migrations can be reset back to zero if you want in Django.

But when you're trying to develop in parallel developer environments, the Django migration DAG alerts you when you have two leafs that haven't been merged.

And if you ever had to roll back a failed prod deploy, Django supports a backwards migration too.

I get what your saying, but in practice the benefit of using Django where the "code is the configuration" is better than trying to keep code and configuration separate and trying to match them up later.


I don't have experience with Django model migrations so can't comment on that - but running script based migrations without a CI like that in any nontrivial environment sounds like asking for trouble, but how did you not catch the issue when a dev pulls from master and migrations aren't working - everyone just does local patching and ignores that the master is broken ?

I've seen this on a .NET project, we didn't have a CI and this one guy had a tendency to commit dirty migration scripts breaking people's databases - I've seen it slip a few times to the QA deploy process before it gets caught because everyone assumed he would fix his shit and was too busy cleaning up his crap so we would just comment it out locally to get it to run.


Django migration scripts are run by devs after pulling from Master as well as deploying to prod. That gives everyone the same confidence that the migrations are working. If the migrations are borked, they get fixed or the prod deployment will break. Ideally this should be part of the CI/CD process for the PR.

Also it's worth noting that you can run all migrations from the beginning of time to the latest version on a development DB. And I believe the default Django unit testing framework does this by default.


The problem sounds like merging things that are not green on CI, not related to migrations with python Vs sql).

Think on it, you can just use Django on python, but you can use sql everywhere there is a dB.


> Think on it

We did. And with smart people who knew SQL inside and out. It just made sense to not hand roll SQL anymore, because Django caught most of the errors we had.


Learn sql, it's like javascript, it's everywhere.. and will open you expertise from a we framework to a lot of other places..


You make it sound I don't know SQL. That's kind of insulting, you know?


I think the posters are talking about cooking, make sweeter, no make it more like bread.. etc.. The design objective and architectual considerations could make either of these two "right" .. however, some weight given to a repeated mention of "average web developer" .. if that is the ONLY topic, then some clarification could be skipped...

(yes, learn SQL)


Mmm, sorry, if this is how you see it, what I'm trying to express it's that the dB by itself, it's the source of truth (and the state) of your application, Django it's just an implementation detail that you can swap whenever you need.


Think about a cognizant response rather than assuming ignorance, which seems to be a common theme with HN posters.


Sometimes yes.

And sometimes you have many other things to do and you want your database to just work. In this case it is perfectly fine to rely on ORM.

It is like C vs Python: yes, I know C pretty well. Yes, it is significantly faster. I am still going to write the project in Python.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: