Hacker News new | past | comments | ask | show | jobs | submit login
Surviving Django, if you care about databases (varrazzo.com)
172 points by pauloxnet on Aug 6, 2020 | hide | past | favorite | 217 comments

I can't say I can recommend following almost any of this as "general" advice. It is, at best, advice from a complete expert for complete experts who are already pushing the edge of what django should be used for, and who never expect to have to hand off their project to an average web developer. I've had to come to terms with the fact that the average web developer is going to screw up e.g. writing their schema by hand in sql. And I know I'll likely be the person inheriting this two generations of developer after the original wizard left.

For an average web developer, someone just getting into django, or someone who isn't yet running into the limitations that the author apparently is, this is a recipe to just make a huge mess.

It's also very specific advice. Recommending ignoring cross-database concerns.. I'm sure that's fine if you're an end-user developer building and running your own product, but if you're considering writing a library or reusable app you'd like to be useful to other django users, it's probably worth a second thought.

It's ironic though that the author makes such a prominent mention of "You Ain't Gonna Need It", because that's exactly what I'd say about most of the pimped-up tweaks mentioned here. 99% of people aren't going to run into these limitations and will produce a much more maintainable app by following the standard django doctrine.

> I've had to come to terms with the fact that the average web developer is going to screw up e.g. writing their schema by hand in sql

I've been writing Django apps since the 0.96 days (so, well before they officially implemented "migrate-able" schemas) and I'm still nervous af when having to manually edit/modify a schema file inside of ./migrations. As such, things like

> python manage.py makemigrations

that work 99% of the time have been a god-sent to me.

For my company I wrote internal django apps that are completely database agnostic. They can be run in testing with sqlite, postgres, oracle, etc.

That severely limits you though.

And I'm not talking about postgis, hstore or such exotic features. I'm talking about indexes, transactions, locking and other basic feature that is often missing or implemented very different across databases.

You may not need, say, GIST indices that help when sorting "tweets by recent activity" for your proof of concept or even your test-env but you certainly do for production, or else you are truly missing out. And such basic tools are avaialble in most database-engines, but differ very much in how exactly they are implemented, to be used and to be tuned.

Django supports indexes, transactions, and locking, and if I need to specify raw SQL I can do that during a migration say.

90% of what I need is already there. And if I need something special, I can add it in later.

It's not really about what Django supports, but about what all the databases you're using support. If you run it in testing with SQLite, you are not going to have the same set of features for indexing, transactions, and so on as you do in production with, say, PostgreSQL.

The parent comment mentioned GIST indexes; I'd add partial indexes and triggers as two features that can make a huge difference when used even in a very few places in an app (which is what we do in Zulip, on PostgreSQL.)

Sure, but does it matter when unit testing code or developing the 95% where it doesn't?

It matters for production sure if you start getting slowdowns and you can have migrations that do that. But for cranking out APIs on DRF, say, it matters a lot less than you think it does.

For that I guess it comes down to whether the features you're relying on are purely optimizations (like partial indexes, fancy GIST index types, etc.), or go to the semantics of how the code behaves (like triggers).

I think for us it's the case that all the fancy DBMS features we rely on in the core functionality of the app are pure optimizations. So if we wanted to run tests on SQLite, that'd work fine except for when testing the parts of the app that happen to rely on those fancy DBMS features that aren't only optimizations.

But I'd consider that a pretty unstable situation -- it'd mean that we had one test setup for most of the app, and then still needed another test setup (with Postgres) in order to test some parts of the app. When I've heard from people working on apps that use different DBMSes for test and production, generally they don't take this strategy and instead they just limit themselves to the lowest-common-denominator features that exist in both. You can totally do that (many people have!); but as berkes's original comment above said, if you do you're missing out on some really valuable features.

And if you ever want to use even a little bit of those fancy DBMS features, beyond pure-optimization index features, in some core part of the app -- boom, you can't test on the more limited DBMS at all.

I'm not familiar enough with Django, but assume it being very similar to Rails (Ruby).

The migrations and ORM allows for a lot of indexing, but lacks the last 20%, which -logically- is DB specific indexing. But also the indexing that gets the majority of the improvements.

Locking is another thing: lowest commnon denomimator is bad. But works, for sqlite, mysql and postgres (and probably oracle, never tried). Yet is so naive that it does more harm that good. In Rails they've worked on this and moved far beyond the lowest-common-denominator with Adapters and other patterns.

That adds such an amount of indirections that it does more harm than good, when debugging where that weird "why did it choose to use a GIST and not a partial BTREE here".

(edit: kept story in line with one usecase. sorry)

TBH, I can't tell if you're complaining about Ruby or Django here. They're two very different things with different methodologies. You can't complain about how Ruby does it's database in a discussion about Django.

Like I said elsewhere, Django offers a lot, and what it doesn't offer can always be added with raw SQL.

I was pointing out issues that stems from "supporting multiple databases", which are issues that are the same in Django, Rails (not Ruby), PDO and basically any other ORM out there.

The problem is "lowest common denominator".

Think of it in an extreme way. as a thought experiment: what if you build an ORM that supports flawless migration from PostgreSQL to Redis and back. That abstracts away all the differences.

That ORM would bery extremely limited: it could basically only do key-value storage; would have no WHERE filters on values, for example, or if it did, the performance would be horrible: because redis does not have this.

Most RDBses follow most of ANSI SQL, so on the surface, they appear very similar, and a lot of times a developer never gets to the pieces that differ so much that either the ORM does not support it, or that the ORM gets terrible performance issues when using this.

More practical: if you use `ltree` or `GIS` features, you are now bound to postgres; which is why most ORMS don't offer support for this; or if they do, they only offer support for it when using the underlying engine (in other words: you cannot use e.g. sqlite anymore). Maybe that is fine and you can avoid using such "database specific features": good for you. But they are there, and are begging to be used; so please don't cludge around with two `BigDecimal` columns `lat:` and `lon:`: you are going to run into problems that have been solved for decades. Please just use PostGIS instead, and accept that this means you are now locked to postgres: you are going to thank yourself later.

I don't ever agree with this. You end up using only the (minimal) features that they all support. I can't imagine going from PostgreSQL features to MySQL.

And you should always dev/test/stage on the same database that you'll use in production.

Everything you're saying is true, but if the devs can run unit tests on sqlite, that's a big win. And you shouldn't discount this feature off hand, or limit yourself from being able to do this.

It's one thing if you want to run postgres locally, that's fine, but who really wants to run Oracle on their laptop just to run unit tests, or to test an api locally?

This seems to have minimal upside and heaps of downside - you can only use the features that every database has.

You should be running the same database in test as in production anyway - there are heaps of problems you won't catch if you don't.

The majority of apps need a very small subset of database features; they need somewhere to park strings and numbers in a reliable way, and retrieve them (sometimes in order). The Django ORM supports this across all of the backends it supports.

Strong disagree - even the basic features and data types of databases have subtle but critical differences.

You should be free to use the features available, not waste your time optimising for databases that don't have those features but that you will likely never switch to anyway.

95-99% of your app is data and query optimization. And django seems to do all of this just fine really.

I would say the same. It is about team size:

If you are alone and don't care whether your client succeeds after you've got fired: Go for it.

If you are in a really complex structure, you most likely have dedicated database people, who want to do their thing anyway.

In any other circumstance, it is a really bad idea to ignore the standards of your framework and build your own thing. If you have to manually change the database, at least stick it into a standard migration. Don't use your own thing. Documentation will be lacking, edge cases will be ignored, the old standard will still stick around and new people will try to use it.

I'm skeptical that giving someone that doesn't understand databases a tool to automate their use will result in a better situation long term. A determined programmer can always make a huge mess.

I will admit: I'm an ORM disliker, preferring to write all my SQL by hand. SQL is a wonderful language and it's often far easier to do things in SQL than in whatever imperative language you happen to be using. I also enjoy far better performance than most (any?) ORMs can offer, simply by using prepared statements. I have far greater control over batching, should I wish to employ it.

When things are going wrong, it's simpler in my experience to audit human-written SQL than it is to audit whatever stuff an ORM decides to emit for a given situation.

I agree with ignoring cross-database concerns, as I feel that you will be leaving the finer features of your database on the table if you stick to the basics. Here's the real counterpoint though. If you write SQL, then you can swap out your slow, single-threaded, interpreted programming language for one with a bit more punch. And you can even do it piecemeal. To me, that's a much more compelling hedge than swapping out a DBMS.

In my 8 years of working on large Python codebases, I think the Django ORM is one of the best pieces of software I've had the joy of using. I often recommend it independent of the rest of Django (e.g. to people building Flask apps that need an ORM+migrations).

Especially when working on a team with many junior developers who don't know SQL (myself included in the early days), the Django ORM has been a shining beacon of reliability and consistency allowing us to do complex database schema changes without ever having to worry about messing them up or not being able to revert our changes.

I don't think I can say the same for any other ORM I've worked with, in particular I have really bad memories of trying to match Django's migration reliability and flexibility using SQLAlchemy + Alembic.

Also in regards to switching databases, the ability to swap out PostgreSQL for SQLite on smaller projects without having anything break is a feature I've relied on several times, and I'm extremely grateful maintaining that clean break in the layers of abstraction was prioritized by the Django team.

Obviously the author of this article has had different experiences (and has considerable authority in this field), but I caution that their experiences are not universal among Python devs.

I came here to post essentially this exact comment. Give me Django with only routing, middleware, the admin panel, and the ORM and I'd be happy. The time I've saved because of these things so I'm shipping code and not writing SQL by hand is astounding. I've made the decision to use Python and Django over other languages and frameworks solely because of the ORM.

I honestly can't think of a single other piece of software that has actively saved me as much time.

> Give me Django with only routing, middleware, the admin panel, and the ORM and I'd be happy.

Very much this. I tend to lean towards Flask for building simple APIs. I think it's easier and faster to start a project and get something done.

But I always end up missing the admin features and the far superior ORM, and always regret midway that I did not use Django instead.

> the ability to swap out PostgreSQL for SQLite on smaller projects without having anything break is a feature I've relied on several times

Absolutely. Take the extra minute to add a natural key to your Django models and now you can do amazing things like dumpdata from your Prod or Staging database and then loaddata into your local sqlite database for dev. I've never had to run Postgres on my laptop for developing. As long as your CI tests use your production database server type then you don't have to worry about subtle differences between developing locally with sqlite and deploying on another RDMS.

That's really a so bad practice, a big code smell.. how you test all the dB logic, with a different dB? Constraints, indexes, tsvs..

With CI that tests both, and a dev team that runs a mixture of both.

Also Django abstracts the differences away well enough that we've never noticed meaningful differences that aren't already well documented.

> As long as your CI tests use your production database server type

And give the dev the ability to switch out to a different database if needed. Thankfully, in Django it's just a config switch.

But for 99% of the business logic tests I write, sqlite is fine. Sure you might have an issue with Oracle, but for day to day stuff, sqlite is good enough.

That's because you are not using the power of postgresql.. (if you can test and run locally with sqlite).. are you using transactions, no? Because Django doesn't enforce them O_O

What do you mean Django doesn't enforce them? Are you saying that transaction.atomic doesn't actually run in an atomic transaction?

No I'm saying that if you mutate related models and didn't wrap them on a transaction it can leave you (if something fails) on an inconsistent dB state.. this is something it can't happen on sqla

Then wrap the things that you want to run in a transaction.atomic. Is there something I'm missing?

That's the thing, you need to explicitly run on a transaction.atomic, with sqlalchemy, that's something that just works..

Which is reasonable with sqalchemy because it operates at a lower level, where you manually handle sessions anyway. You don't do that in Django, by design. You can change commit behaviour if you want, though, and then you can just use transaction.atomic and still don't handle the session manually.

If you'd want atomic transactions for the whole Django project, you could just set the ATOMIC_REQUESTS flag in the settings.

Does still django lacks a connection pooling? If I'm not wrong sometime ago I saw people using the sqlalchemy pool on Django projects.

Not since Django 1.6 from 2014.

That's a nice feature, when landed it?

At least in Django 1.6 from 2014, couldn't find older docs.

> That's because you are not using the power of postgresql...

Isn't that a potential positive?

I mean, yeah, and those darn programmers with their 'platform independent languages'. Why are you not using the power of assembly?

The very best ORMs are not anywhere near as good at 'compiling' their abstractions into platform-specific SQL than your standard C/C++ compiler is at producing optimized platform-specific machine code. Part of the reason for this is that SQL is already at a high level of abstraction comparable to the ORMs themselves; the relationship between Python and SQL is not comparable to the relationship between Python and assembly (or C and assembly, etc.)

No, it's really not.

A better analogy than the language one: Your web app has to run on Firefox, Chrome, and... IE6 yay! You're not allowed to use any modern Javascript, or most of the API's you'd like to use.

Because being able to switch to IE6 at any point in development is more useful than any of those fancy-dancy API's, right?

Meh. All analogies are partial.

If there was an ORM that dealt with everything but a flaky end of life database, sure start working on it.

But this is advocating picking one platform. Like creating Chrome-only JavaScript. I guess the nearest thing with Electron.

Yeah the analogies breakdown at some point.

I don't think anything is black-and-white. I just think the vast majority of Django apps are fine with Django's ORM. This article smacked of premature optimisation.

I always get the feeling that people love ORMs because they don't want to learn SQL.

Because if you do learn some SQL, then the problems with ORMs become annoyingly apparent.

I agree that most Django coders will be fine with the ORM. I guess I don't see that as a good thing, though.

> I always get the feeling that people love ORMs because they don't want to learn SQL.

I suspect that's the problem. It's neither my feeling or my experience.

Every platform/language/OS/chip I know has annoyingly apparent problems. In my experience high-self-conidence-low-self-awareness folks are too keen to drop-level and 'make it right'. And love feeling how smart they are ('everyone who disagrees is obviously not as smart!) Too often 'smart-guys' cost more, long term. And I'm not ragging on you. That was me in my 20s. Embarrassingly.

hehe, I'm in my 50's and have been doing this for >25 years. I still don't know where that line is

Agree, at least for me, I don't discovered the sqlsuperpowers till I got inside it... I would love to see some of this pretty clever orm migrations applied on a busy potgresql server.. with a bit of concurrency... :)

imagine hating working with databases so much you'd want to spin not being able to use modern databases amazing features as a "potential positive" just so you don't have to deal with them...

"Imagine hating databases" ... "just so you don't have to deal with them"

Do you normally respond to issues by trying to characterise people?

Imagine this fictional world where anyone that disagrees with me is incompetent.

In my experience, I think working with database specific functionality is like dropping to lower-level of code generally. Something I have done in ORM->Raw SQL and C->Assembly. There is a point where the price in development cost, testing development time, platform lock-in, reduced code reuse IP, subset of experienced programmers within the team/salaries/training, loss of access to future updates and bug fixes, and so on is worth it. But almost never in my experience. Almost but not entirely.

If people had the same opinion when it came to web browsers we'd be in a pretty sorry state.

Are you talking about databases or decibels?

Haha, dB = database. My android keyboard is trolling me :/

This is absolutely true, especially if the dev environment believes that sqlite should be a first class citizen during development of unit tests.

If you'll forgive a rather insubstantial ramble about a pet-peeve of mine:

> many junior developers who don't know SQL

Are you really a web developer if you don't even know SQL? Sounds more like an apprentice. You wouldn't call yourself a junior Japanese translator if you didn't speak Japanese. SQL is a basic competency, not an advanced specialist skill.

This sounds like gatekeeping to me. Web development is a huge and rapidly expanding industry. You don't need (in fact, cannot) be an expert in all areas.

I've been a web developer for quite a few years now. Mostly front end but I do use Django as well.

I also do know some SQL so I guess I pass your arbitrary test. However, I can count the number of times I've used SQL in the last five years on one hand, and in each case it was something simple like create a role/database, and I had to Google the commands anyway due to a lack of practice.

> This sounds like gatekeeping to me

Not really. It isn't right that the software industry use the same title for both an untrained dabbler and for a CS graduate with a year of industry experience. The former is not a developer, they're a student, or an apprentice. The latter has several years of relevant formal education but only a little industry experience.

'Traditional' engineering fields have legally protected terms like Chartered Engineer. I'm not suggesting software development need be that rigid, but I don't think it makes sense to be so inclusive with a title like 'developer' that it tells the reader almost nothing about the person's level of competence.

Legally protected terms aren't necessary for this. If you call yourself a junior Japanese translator but you don't speak Japanese, you'll be laughed out of the room by real translators. That's as it should be.

> I had to Google the commands anyway due to a lack of practice

But your understanding of the underlying concepts (i.e. the relational model) was still there, and that's no small thing. Familiarity with syntax isn't really the point.

It's not really gatekeeping.

If you're doing any kind of backend work, you need to know how your persistence layer works. You don't need to be an expert, but you need to realize the implications of what you're doing with the ORM. You need to be able to diagnose issues with your database.

There is a grey area between 'don't know' and 'expert'. and what the if someone says it's bad that jr devs don't know something, it doesn't mean they need to be an expert.

frontend don't really need to know, but most backends talk to an SQL DB so I'd agree.

That said, what does "web developer" mean relative to those?

One of the things I've had trouble is that sqlalchemy also supports cross db but not completely (I never figured how to do date diff that works between postgres and sqlite). Does django orm help with that? And have you come across any flask+django ORM repos you can point to ?

Absolute second this

This is more "how to replace your car's stock transmission and cluch for dubious benefits and higher costs"

Django works fine out of the box. The example of "needless migrations" are corner cases more than anything and it's fine to have them. It works. They might be useless but if they're correct in the end, that's fine. DB migration is the one of the last places where I will look for performance over risking correctness.

And if you really need performance in one specific complex query, Django lets you do raw SQL queries just fine. Number of times I saw this being actually needed (in a big project): One.

The Django ORM is excellent as long as you stick to the well-trodden path. But if you want to squeeze every ounce of performance or you know that the execution plan a query generates is not the best or it's not using the right indexes, it's like wearing shackles.

SQLAlchemy on the other hand makes it very easy to do exactly what you need, without even having to drop to writing raw SQL. The way the ORM and the core are structured is the perfect balance of abstraction and control.

I'll agree that the Django ORM isn't very good once your queries get moderately complex. But I have been underwhelmed by SQLalchemy having tried it recently. It's not intuitive compared to Django's and in the end for complex query I just ended up in raw SQL anyway - as I would have in Django's ORM.

A lot of people rant about SQL alchemy. I have used it recently and it isn't anywhere close to Django's ORM for ease of use.

A couple of real world data points.

Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions. So the migrations would run, and then fail during a prod deployment for various reasons. Most notably a dependent script hadn't run in prod yet. They had run in the Dev/QA environments and everything had run just fine. Or Dev/QA had been hand tweaked and that change hadn't made it to the script during prod deployment because they were testing it for a month to see performance before moving to prod, etc... etc...

It was all ugly.

In one of the two companies they finally decided to stop handwriting SQL and started trusting the Django migration system, and the problem with deployments went away.

AND they still were able to hand write sql to optimize the performance of the DB if needed.

I truly believe the Django migrations system eliminates the need for several full-time DBA positions at larger companies that use it. Things that would take a dedicated team of DBAs days to do are automated so reliably that you don't even realize the amount of engineering effort it would take to do manually.

What things does it do besides update the schema, or run a custom python script?

A few other things which are easy to miss manually: identify when you have model changes which are not reflected in the migrations, ensure that the migrations always run, and fail loudly if someone tries to run migrations which started at the same point but haven’t been explicitly merged (which is really handy switching branches or on fast-moving projects with multiple features in-flight simultaneously).

None of this is magic which you can’t do by hand but there’s a substantial benefit in being able to rely on it always happening and not having to spend any time thinking about it in most cases.

"Data QA" isn't a term I hear much of, but it's a super important piece of migrations. A migration that runs successfully is the equivalent of a program compiling successfully.

It does exactly that, in a reliable and repeatable manner. Especially if you are deploying to multiple targets (dev, staging, production) you don't care about which state each server is in.

>Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

> The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions. So the migrations would run, and then fail during a prod deployment for various reasons. Most notably a dependent script hadn't run in prod yet. They had run in the Dev/QA environments and everything had run just fine. Or Dev/QA had been hand tweaked and that change hadn't made it to the script during prod deployment because they were testing it for a month to see performance before moving to prod, etc... etc...

How do companies get this right? My organization has all these problems as well and I lose 10-15% of my time chasing misplaced SQL scripts or SQL scripts with improper tweaks.

Check sql migrations into version control, all changes get checked in.

Run them in a predefined order with a migration tool (something as simple as file name order is fine).

Save metadata in the db to ensure migrations are not run twice and keep a record (again migration tools can do this).

Test migrations multiple times prior to deploy using a sanitized copy of production.

Any of these steps would help, all of them would eliminate your problems.

Django does this.

Django does some of these things (most migration tools do), but my understanding is it guides you in other directions (automated code-based migrations based on models as a source of truth), so if you want simple sql migrations you don't need (or even perhaps want) this part of django.

That's what the article explores.

Django migrations solve this by asking for a dependency graph, and verifying that there's only one leaf node, so to speak.

So if you merge in separate scripts you will need to linearize by hand.

On top of this, there's a table within the database tracking which migrations are run, so you don't accidentally rerun these for example.

What will work in a lot of circumstances is to put your custom scripts into empty Django migrations. Then they'll get run as needed. If you're feeling fancy you can write the backwards script as well.

There are tools that help this: Visual Studio Data Tools, RoundhousE, Flyway/Redgate - I highly recommend investing time in building them into your workflows.

Alembic is pretty great for this as well

They aren't going to give us time to implement those, but thanks. Shall implement them in my own personal projects.

I use a simple framework-agnostic tool for tracking and applying migrations, it also supports dependencies between migrations. Migrations are automatically applied on deploy to stage/production, so you don’t forget to do it.

I prefer this approach very much and even wrote about it here https://vsevolod.net/migrations/

> Two Fortune 500 companies I worked at had projects in Django that used native SQL to do migrations. This led to errors in prod during deployment.

> The order of the scripts had to run in a proscribed order. They had to be listed on the deployment instructions.

You still need to have a proper database migration system, even if migrations are defined using hand-written SQL. I can see why someone would prefer migrations written in SQL and there are plenty of migration system that support that workflow.

I've experienced the exact opposite - no end of problems with Rails/Django/Alembic migrations. Abandoning version numbers and chains of migration files for a diff-based approach that works directly on the database level made things so much simpler and more reliable.

Comparing your goal state explicitly with the current production state means you can typically autogenerate the changes you need, and once applied you can compare again to make sure everything matches perfectly.

Everything becomes reduced to:

production state + pending changes = goal state

Where "goal state" might be the state of your ORM models.

I did a talk at PostgresOpen a while back about the problems with Django-style migrations and how to do better: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...

I've done automated diff-based migrations before using SQLDelta. It caused me a lot of trouble and I think it's a bad approach.

These diff tools just don't have the information they need to solve the data problems that are the actual hard part of schema migrations.

Simple example: I want to concatenate these two text columns into one.

I do think SQLDelta and similar tools can still be useful after doing the real migration using incremental scripts. You can use them to check for problems, and to fix up easy things like views and stored procedures.

Of course they don't automatically solve data migrations for you, I'm not sure how any tool could possibly guess your intent like that.

If you have an example of how a chain-based migration tool makes this easier, I'd love to hear about it.

I've never worked on anything that separated data migrations from schema migrations, so I suspect I'm not quite understanding you, but...

Your data migration is just part of your chain of incremental patches?

For my example you just write:

    UPDATE TABLE my_table SET my_column = my_old_column_a || my_old_column_b;

Beyond a fairly small scale you have to separate data migrations from schema migrations because they just take too damn long.

I believe it and it makes sense, but I've never encountered it myself.

Sounds like your definition of "small scale" encompasses every project I've ever worked on. :D

Right, that's exactly what you might add to a diff-generated script to do the same thing.

So it doesn't inherently seem better or worse to me.

The difference is that - if I understood right - you're generating your diff just before you actually do the migration.

By comparison, the incremental migration scripts are written at the same time as the feature (I like to do them as step 1 at lot of the time) and by the same developer, who currently has the problem in their head.

I don't know of anybody doing that - unless you had automated checks that it was a fast and non-destructive operation, that would be highly risky.

I'm simply talking about using the current production database as a basis for generating the required changes, rather than "here's what I think my production database currently looks like based on a version number and these 75 chained migration scripts I've run on it in the past"

I'm still confused. When do you generate your diff then?

I think migrations ought to be done by the developer at the same time that they write their other code changes, because they have all the context at that time.

However, it doesn't seem like that could work for diffs, because the version of the production database you're diffing against may not be the same as the version you end up deploying to.

Generate it whenever you prefer, probably with the rest of the code changes as you say.

Immediately before applying, check that the database is still in the same state as it was when you generated the diff script. If yes, apply. If no, abort.

I like the idea but haven't been able to make it work in practice. Diffing produces migrations that just can't be run on big production DBs and it just doesn't work properly if you're using schema based partitioning or have partitioning which depends on the inserted data.

These problems are definitely solvable with proper tooling. The specific challenges vary between RDBMS though. For example when running schema changes on large-scale MySQL/MariaDB, the key things are supporting external OSC tools, and supporting sharding. I wrote a diff-based schema management system, https://skeema.io, designed with precisely these concerns in mind.

Which non-diff based migration tools automatically generates migrations immediately suitable for big production databases? I don't think any tool is capable of that, certainly none I've heard of.

> production state + pending changes = goal state

That's exactly what Django is doing, btw.

> I did a talk at PostgresOpen a while back about the problems with Django-style migrations

Good talk generally, and a lot of stuff I agree with.

If the migrations sit out for a year in the code base, it's not a big deal though. It looks like a big deal, but in practice, it's not. Migrations can be reset back to zero if you want in Django.

But when you're trying to develop in parallel developer environments, the Django migration DAG alerts you when you have two leafs that haven't been merged.

And if you ever had to roll back a failed prod deploy, Django supports a backwards migration too.

I get what your saying, but in practice the benefit of using Django where the "code is the configuration" is better than trying to keep code and configuration separate and trying to match them up later.

I don't have experience with Django model migrations so can't comment on that - but running script based migrations without a CI like that in any nontrivial environment sounds like asking for trouble, but how did you not catch the issue when a dev pulls from master and migrations aren't working - everyone just does local patching and ignores that the master is broken ?

I've seen this on a .NET project, we didn't have a CI and this one guy had a tendency to commit dirty migration scripts breaking people's databases - I've seen it slip a few times to the QA deploy process before it gets caught because everyone assumed he would fix his shit and was too busy cleaning up his crap so we would just comment it out locally to get it to run.

Django migration scripts are run by devs after pulling from Master as well as deploying to prod. That gives everyone the same confidence that the migrations are working. If the migrations are borked, they get fixed or the prod deployment will break. Ideally this should be part of the CI/CD process for the PR.

Also it's worth noting that you can run all migrations from the beginning of time to the latest version on a development DB. And I believe the default Django unit testing framework does this by default.

The problem sounds like merging things that are not green on CI, not related to migrations with python Vs sql).

Think on it, you can just use Django on python, but you can use sql everywhere there is a dB.

> Think on it

We did. And with smart people who knew SQL inside and out. It just made sense to not hand roll SQL anymore, because Django caught most of the errors we had.

Learn sql, it's like javascript, it's everywhere.. and will open you expertise from a we framework to a lot of other places..

You make it sound I don't know SQL. That's kind of insulting, you know?

I think the posters are talking about cooking, make sweeter, no make it more like bread.. etc.. The design objective and architectual considerations could make either of these two "right" .. however, some weight given to a repeated mention of "average web developer" .. if that is the ONLY topic, then some clarification could be skipped...

(yes, learn SQL)

Mmm, sorry, if this is how you see it, what I'm trying to express it's that the dB by itself, it's the source of truth (and the state) of your application, Django it's just an implementation detail that you can swap whenever you need.

Think about a cognizant response rather than assuming ignorance, which seems to be a common theme with HN posters.

Sometimes yes.

And sometimes you have many other things to do and you want your database to just work. In this case it is perfectly fine to rely on ORM.

It is like C vs Python: yes, I know C pretty well. Yes, it is significantly faster. I am still going to write the project in Python.

> The motivation for writing this article comes from knowledge sharing with the team I'm currently collaborating with, who are using Django the canonical way. I am convinced they can better use the tools they have. Let's see if they can be persuaded to drop Django migrations...

I've built and worked on enough large Django projects to know that if you don't do Django the canonical way, then I don't want any part of the project. If I joined a team and saw this weird hodge-podge of custom migration framework and shadowing ORM fields I would nope-out immediately. It sounds like this post author just doesn't want to use Django's ORM. And that's fine, you don't have to.

There's nothing worse than random teams reimplementing a migrations system because they think they know better.

The Django migrations system is far from perfect, but it's light years ahead of anything a small team can build.

> random teams reimplementing a migrations system

I think that can apply to most software. Django has plenty of things that could be done better, but if it's part of Django it will be well tested and documented, there will be people who can help on Stack Overflow, which is far better than most of the in house stuff I have had to work with.

I don't think these practices scale well. One person becomes the default "migration person" (with processes that are likely not as well documented as Django) and the team becomes reliant on tribal knowledge.

How is that different from one person becoming the default "migration person" on a team that doesn't use Django?

The difference is that Stack Overflow has a ton of answers to even fairly complicated questions and that reading the Django docs is an order of magnitude easier than delving into some random person's badly written code.

Daniele is a great guy and I deal w/ him regularly as he's the author of psycopg2. However the mistake he makes here is in the realm of database abstraction tools having the sole rationale of "database agnostic code". That's really not the main rationale, nor is it really "hide the database so people don't have to learn SQL". The main one is "automation of work". A lot of the comments here refer to situations where hand-rolling everything always was just too error prone. There's a thing we do in computers when a job is very repetitive, tedious, error prone, and follows a complete pattern that you can define algorithmically, which is, write a program to do it!. that said I'm sort of glad Django took the heat on this one and not SQLAlchemy so I can get to bed without obsessing :)

Thank you for creating SQLAlchemy and being so active on its mailing list.

> Django needs it because a web framework not tied to a single database vendor is more valuable than one tied to a specific one - and that's fair enough. But you don't: your web program, most likely than not, will not have to switch from one database to another

This is missing the forest for the trees.

Sure, my one app won't need to migrate from MySQL to Postgres a year down the road, but as a developer, I'd much rather be allowed to abstract away each vendor's idiosyncrasies and be able to focus on my Python code regardless of whether I'm using sqlite, PostgreSQL, MariaDB, MySQL, Oracle or even MS-SQL

The idea that some library can abstract away database concerns to this extent is wishful thinking.

You inevitably need to understand the SQL that your ORM is generating and what the database is actually doing. Otherwise you're flying blind.

Realistically what ends up happening is that the DB gets abstracted away to a painfully feature-free lowest common denominator.

Django handles this quite nicely. You get your basic lowest common denominator subset by default. If you want something more powerful, you can explicitly import a db-specific feature.

And for some features, you'll get a NotImplemented exception. In the ten years I've been using Django, I've only seen these a handful of times, and only when using SQLite.

I never claim it replaces SQL entirely and that therefore users can "fly blind", as you put it. But the whole point of the ORM is to abstract away part of it

It also misses a more common issue which is upgrading a DB. Teams might not switch from mysql to postgresql often, but I sure hope they're upgrading the db version from time to time.

This article is trying too hard to dress the author's limited experience up as absolute truth. For example:

> Have you got PostgreSQL in production, but you want to test with SQLite because it's easier to set up? If so, your tests are just a tick-box exercise: you are not testing anything remotely plausible and resembling your live system.

This is not completely wrong but it's just telling you that the author has never worked in an environment where it makes sense. Anyone who has knows that this approach can be used to catch a high fraction of errors locally, while not precluding a full slower CI build.

> Even if you added some form of manual auditing to each save() method, it will not capture changes made outside Django. It wouldn't be very secure either: Django uses a single user to access the database so if someone manages to hijack that user, they would be able to change the data in the database and alter the audit tables to hide their traces.

Similarly, while it is true that Django defaults using a single user for everything you can easily configure multiple connections (containing, say, an issue with your normal web code but not your management commands), you can grant access to only SELECT and INSERT on the audit log table but not UPDATE or DELETE, etc. There's nothing with the separate approach, of course, but it's not like it's some great discovery that frameworks are not intended to cover every possible use-case and sometimes you want to turn them off.

The extended “I hacked around because understanding the migrations module seemed like more work and then I was too far in to reconsider” section has the same problem. Being surprised that the migration system triggered an update when you renamed a field is, well, exactly what it's designed to do. That's exactly why there is an entire section in the documentation which tells you exactly what to do in that situation and how to avoid data loss — in that case, if the problem wasn't making foo.bar magic (which it almost always is) you'd create multiple migrations to add the new column, move (and usually convert) the data, and drop the old column. You can even run raw SQL if there's something like an efficiency hack which makes it worthwhile. This takes a couple of minutes and is reliable.

>This is not completely wrong but it's just telling you that the author has never worked in an environment where it makes sense. Anyone who has knows that this approach can be used to catch a high fraction of errors locally, while not precluding a full slower CI build.

Sqlite isn't faster though. The only reason I can think of why people do run tests with sqlite instead of postgres or whatever is because you don't have to install it.

This is an undeniable benefit but it does mean that instead of having the up front cost of installing postgres wherever you have tests you have the ongoing cost of "oh, it looks like my test failed because sqlite and postgres aren't actually alike".

It is faster (same process, memory tables) but that’s my point: you can say you always need to test on your target database, which is true, but not needing to install things is a benefit, and that can matter when you’re doing integration tests with lots of data. I have seen plenty of projects where fast local SQLite + CI Postgres with lots of test data is reasonable.

It can suffice, but it will never be ideal.

I had a project just a few weeks ago where somebody tried to do that and hit a brick wall nearly instantly because one table had an array column. SQLite instantly became useless.

IME unrealistic tests bite much harder than slightly slower tests or needing to install the odd bit of software. Realism is pretty much the only thing that gives you confidence you'll actually catch some damn bugs...

You’ll note that I said the same thing. The point was just that the set of bugs people write includes both those which will affect any database and those which depend on certain configuration and behaviors. There are enough people who find it useful to do both that I’m comfortable saying any position which pretends they don’t exist is incomplete.

As with the original author, the problem here is forgetting to add “On my projects” or “In my experience” rather than speaking in absolutes.

> This article is trying too hard to dress the author's limited experience up as absolute truth.

Sure you're not doing the same thing? As Vivekseth writes below, the author wrote psycopg2.

--- Allowing the DB user that the web app uses to have schema modification privileges is a massive security hole. If you're not hacked, someone will eventually drop the production database.

So, migrations in anything but SQL are a Bad Idea.

But... you don't have to use the same user for migrations as you do for your webapp. Like, at all.

`manage.py migrate --settings=my_super_secure_settings.py`

Just only run this from a super-secure location during a deployment with a completely different set of credentials.

Yes, you’ll notice that I’m not saying his ideas have no place but that they’re not universally applicable. For example, the security “hole” is trivially avoidable by using a different settings file to do migrations on a separate server/container which doesn’t get normal web traffic.

If you want SQL, you can also generate it from your migrations and send it over to someone else to run. This is not uncommon in enterprise IT.

To reiterate the point of the GP, you said:

> This article is trying too hard to dress the author's limited experience

Saying the author has “limited experience” is condescending and untruthful given his career and accomplishments so far.

Limited does not mean he hasn’t accomplished anything, only that he’s not speaking for the entire community and wording his post that way doesn’t add anything to it.

He’s obviously very proficient, which is going to shape your perspective of what’s easy and how much control you want just like the scale of the projects you work on and the number and skill levels of your team.

There’s nothing wrong with his opinion - my objection is the overly broad framing. It would have been just as good as “here are some things which worked for us” and letting the reader decide whether they are in the same situation.

> Allowing the DB user that the web app uses to have schema modification privileges is a massive security hole

> So, migrations in anything but SQL are a Bad Idea.

Those two points have nothing to do with each other.

Or just the all orm thing.. why I can't use a using(field) when I need to query a Django dB? How many applications are poor modeled because the orm?

If production is Postgres, why wouldn't you just run tests locally with a temporary Postgres database?

Why would you make dev/prod parity a whole lot worse just to make your tests perhaps marginally faster?

Maybe you have a lot of busy tests and you can get feedback twice as fast using less RAM on a SQLite memory database on your laptop, which is not a maxed-out current model, and you know that every commit you push will go through a CI build which uses the same database as production.

Maybe you are consciously trying to avoid locking your project into one database accidentally.

Maybe you work in an enterprise environment where you can’t run Docker or network services on your local system.

Maybe you know that your CI process will run against more data than fits on your laptop and so you’re not worried about missing something.

My point wasn’t that you shouldn’t test in the same environment but that it’s too quick to say that because you have never had this need nobody else does either.

> Maybe you are consciously trying to avoid locking your project into one database accidentally.

People keep bringing this up, but for applications (not libraries), how many times has this mattered (for FOSS DBs), and how many times has upholding agnosticness required weird hacks or limited something?

The reason people bring it up is that it's real for some people, even if it's not a universal requirement. Anyone who sells software has customers who have standardized on one database. Some places have senior management wanting to avoid lock-in (Oracle migrations will almost always be mentioned in this case) and may have pressure or outright requirements — and if you're a contractor working for them this will be written into their requirements. Some products have a demo/lite mode to make it as easy as possible to try – e.g. SonarQube can run in a single container (with prominent warnings that your database is not persistent) just to make it easy for someone to try it.

It's certainly not the most common and I would definitely seriously question it if you know you're going to benefit from a Postgres-specific feature (the last big Django project I started used JSONField and several other features heavily), but the world is a big place and there are many features which only some people use but it's critical for them. This discussion reminds me a lot of when people would question why you need a localization framework because they've never used one.

Not mentioned in the article, but the author also built psychopg2 (https://www.varrazzo.com/software/)

Thank you, that explains the willingness to entirely ditch the migrations framework in favour of raw sql!

Thank you. Relevant.

For those who do not know Django uses said library for PostgresSQL support. Mind you, if you don't like Django's ORM, you don't have to use it.

Isn't it the standard Python library for PostgreSQL? A co-worker just installed it for some work having nothing to do with Django.

It's the most popular, if you mean standard by that metric then yes, if you mean standard as in built-in to Python, then no (I only mention this because SQLite does come with Python OOTB).

Yes, it's the most popular driver for PostgreSQL in Python.

This sounds like a "golden hammer" argument[1], basically saying that SQL can do everything Django can do, and if you know SQL really, really well you can do powerful stuff which would be clunky in Django. Of course that's true, because it's true of any two tools which have overlapping functionality. But for the vast majority of things you need to do on a daily basis, and for developers with >10x more experience with Python than SQL it just doesn't make sense to use raw SQL any more than absolutely necessary. As soon as you use raw SQL in Django most of the maintainability advantages of an ORM disappear.

All that said, I totally agree about throwing out DB portability for anything but the simplest apps (or libraries, of course). Portability is extremely costly in terms of which features we have available to and how maintainable the resulting code is, so it should only be a goal when absolutely necessary.

[1] https://en.wikipedia.org/wiki/Law_of_the_instrument

> Much of the complexity of this system is designed to give you a specific feature: migrations abstracted from your database. In my opinion, if there is something less likely to happen than writing a complex app to run on interchangeable databases, it's the need to repeat the same history of migrations.

Unless you have a dev, staging, and prod setup. Then there's a good chance you'll encounter a wide range of variation in the current database schema compared to your models. I use the ability to roll back and forward migrations all the time. Django ORM and the migrations (and the admin site) are the features that make it impossible for me to justify the use of other frameworks or languages in the vast majority of cases. The productivity boon is just too much.

With that said, I have a personal policy of not letting the ORM get in the way. If I find myself reaching out to the documentation for how to write a particular query (something like window functions, for example), that's a good indication that I may be better off writing the raw SQL.

I support a very large database, containing product data for a major US retailer. If I understand the history of this database correctly (I am not directly on the team that manages it, but my team supports them in various ways) the database began life as an MS SQL DB. Many years ago it was migrated to MySQL. Now, they are in the process of migrating it to Postgres. So yes, migrating data to different DB's does happen, especially when you consider an applications life over decades.

Why was the underlying database changed?

The most obvious reason I can think of being licensing concerns with both MS SQL and MySQL.

The first migration from ms sql to mysql was licensing costs, and the desire to host on Linux. The more recent plan to move to postgres is motivated in part by features and in part by standardizing on postgres.

Features that you wouldn't be allowed to use if you try to remain database agnostic.

DB migrations happen, but not often enough to use a lowest-common-denominator db agnostic library and try to remain compatible with all systems.

Well, that is another area the article got partially wrong. There are many postgres specific features included in Django, and other available via third party packages. And for most things that the ORM doesn't specifically support, it gets out of the way and lests you do them anyway, by executing your own custom SQL, right from the ORM.

> And with comments! Explaining why a certain index or constraint exist!

You can put comments in Python files...

> With constraints named meaningfully, available for manipulation, not auth_group_permissions_group_id_b120cbf9_fk_auth_group_id.

You can name your migrations with:

    manage.py makemigrations --name "prettier_name"
> In my opinion, if there is something less likely to happen than writing a complex app to run on interchangeable databases, it's the need to repeat the same history of migrations.

Maybe he never worked in a team? Or wrote any integration tests at all? Or implemented rollback? Because those are pretty much basic things for every serious project.

> I have a patch_db.py script that I have used, with small variations, in several projects.

So you are basically wrote your own schema migrations tool, but inconsistent across projects, and you maintain it yourself. Not a good idea IMO.

> You can put comments in Python files...

only if the comment pertains to a django feature, not to a schema feature that cannot be expressed in django (e.g. a partial index designed for a specific query).

> You can name your migrations with:

In the example there is a foreign key name, not a migration name. It is persisted in the database, it's not ephemeral like a migration name, for which choosing a meaningful name has only a temporary value.

Just two factual corrections; for the rest our experiences diverge, that's fine.

> for the rest our experiences diverge, that's fine.


This article misses the point of the django orm and migration system completly.

They are meant to allow for the dev of an ecosystem.

This is why you have so many django plugins, and why they are so easy to integrate and play together.

This is why it's easy to have a dev envs, tests, signals and api generation with django.

It trades performances and flexibility for this. It's nothing more than a classic technical compromise.

You may need raw sql later on in your django project. But if you feel like starting with it, just go with flask. Django makes no sense.

The other funny thing is how easy it is to write migrations in Python via the RunPython capabilities of the Django ORM you can migrate data over. I've yet to need to write any SQL aside from stored procedures and Database Views.

You can even export all the contents of your DB to JSON or YAML and import it in Django, so you probably very well could go from SQLite to MySQL / Postgres if you really wanted to.


The other actually funny thing is, Django doesn't force you to use its own ORM. You can write code that doesn't use its ORM. You can override anything and everything.

> how easy it is to write migrations in Python via the RunPython capabilities of the Django ORM you can migrate data over.

Yes! This functionality is extremely useful in a number of cases and isn't all that discoverable. At least in my experience, devs often don't learn about it until they've spent at least a bit of time working on a Django project. Once you get the (basic) syntax down it's quite handy.

> You can write code that doesn't use its ORM.

You can, but what's the point using Django then? Everything is built around the ORM API, that's how admin is automatically generating pages, etc, etc. You lose the main advantage of Django if you don't use it.

ASP .NET Has the same issue, but you could always override the admin bits. If you really want another library to manage Django's database usage, you can go to town, nobody has bothered coding alternatives because the ORM is sufficient. Web frameworks are not what you use if you're even concerned with micro managing how you access a database, web microframeworks on the other hand are designed for that level of detail. Something like CherryPy or Flask is more what you'd want in that case.

For me it's too much easy to write them with sql or plpgsql.. and later I can use them from node, rust, go.. or whatever.. for me the dB (the model) it's the source of thruth that will lasts.. independent of the language that interacts with it...

Also the problem gets worst if you add to the mix something like elastic search. That it also has it's own schema and it's also has to be migrated..

But now you've got the issue that the code needs to match the DB. Django takes care of this for you. It's really quite nice.

That's the trade-off if you decouple the dB from the web framework you have to take care of both.. :) anyway, I prefer to trust the dB (real data on it) that a bunch of complicated things on the Django side.. (I suspect that happened something similar to me when I started reviewing crazy (shoot your food) class based elaborations..

You can always tell Django's ORM not to manage any of your data and write all your queries. You can also tell Django to give you the query it generates, tweak it (if / as necessary) and run from there.

But why? Granted if you have super complex queries, sure. But 95-99% of the queries are simple typically.

Agree. This article misses the use case of people using third party apps with migrations. For example. The Wagtail content management team can create migrations that can be applied to MySQL and Postgres databases. Not sure if that is possible writing SQL files.

Django is incredibly flexible, even if you decide not to use the ORM, or even to just use part of it and write a lot of raw sql, Django is a great choice. It can still be a great choice.

Or even if you ditch the SQL database entirely and use Elasticsearch or some weird custom back-end.

And then (with elastic) why you will use a synchronous web framework ;)

Synchronous web frameworks have been powering some of the largest websites on the internet for decades.

That said, Django is actively working toward async views and ORM. It's happening.

In fact, it was just released a few days ago.

Then you still need to upgrade Django.. (had seen so many sprint points spent on it.. :/)

I agree with you but I think Sqlalchemy/alembic do the compromises better than Django ORM

With Django I always ended up hitting brick walls and ugly hacks. Whereas Sqlalchemy more easily fit in any use case that I had over the years, mainly because of it having multiple layers of abstraction so I can choose and mix as applicable. Learning curve is a bit steeper i admit.

I haven't used Sqlalchemy drop in replacement for Django ORM yet, but would consider it for a future project.

I think one issue with SQLAlchemy is that, somehow, they are the worst of all worlds. They have multiple ways to run queries (iirc, the "model" way, the non-model way, and the raw way) but it is all the "model" way underneath so you end up with all kind of weird hacks if you don't want to use Python as the source of truth (i.e. if you run a query, have no data model in Python but want to turn the data into a dictionary). It seemed like a very flabby codebase (i.e. they started out doing X, and tried to add stuff on top...lots of classes that appear to do the same thing).

The only ORM that I have used that handled more complex stuff well was Jooq (Java). Having your database as the source of truth (and running a script to update Java Objects) works far better than the SQLAlchemy approach. I use Django now for a simple app but I used both Jooq and SQLAlchemy on something more complex (the issue was references to parts of the data models across components), and the former was brilliant (although learning curve) and the latter was just awful. I don't think SQLAlchemy docs are good either (again, started out as X and they just kept adding and adding).

That is just my experience. I used Django right now, there are no crazy cross-cutting concerns between components, and it works fine.

It is the other way around. The ORM sits on top of core, the sql abstraction.

This separation is what I like about it, if I know how to do something in SQL I can use the tools sqla provides and it maps very close to 1:1. Or I can use a more ORM driven approach similar to what Django does. I can adapt it over time as the project evolves.

But yes to write the queries in Python and not raw strings, you need to have either a Table or ORM Model defined, which you could generate from the database [1]. But I admit, except for some scripts, I have never used it where Python wasn't the source of truth, so don't know what it all involves.

I don't know about very old versions of SQLAlchemy, but at least in the last 4.5 years since I'm using it, I have been much happier with it for medium sized projects over any other similar tool in Python I have tried because of this flexibility

[1] Edit: Or use reflection to load schema information from the database

  >>> messages = Table('messages', meta, autoload=True, autoload_with=engine)
  >>> [c.name for c in messages.columns]
  ['message_id', 'message_name', 'date']

That's the nice thing around sqlalchemy, it's a lib, a toolkit.. you can start using it on whatever postgresql schema you found.. Django can't only work if things are done like it wants... You will also use Django for an old legacy PHP project with postgresql? You can use SA for taking on it.

You can absolutely use Django for an old Php project with PostgreSQL, just fake the initial migrations and it'll pick up right where PHP left off with the same schema.

Many of the points made apply equally to Flask/SQLAlchemy or any other framework (Rails, ...).

They optimise the wrong things and decrease security.

All of them optimize for different things (E.g. SQLalchemy and Django have very different philosophies), so are they all wrong? What should they focus on instead?

> They optimise the wrong things and decrease security.

They're not meant to optimize for every edge case, they're meant to be as generic as sanely possible.

Migrations can, and do, run arbitrary SQL. At the end of the day the migrations framework is just a SQL dependency graph, it can be used without models if that's what you wish. By itself that is way better than ad-hoc scripts.

> At the end of the day the migrations framework is just a SQL dependency graph.

It's not quite. During operation it actually mutates an in-memory shadow of your model structure. And that's really clever, but if you've got a large set of models it can be really slow.

If it gets too slow you can always squash migrations. But since Django 2 I haven't seen the need to do it, runs in a few seconds even with hundreds of migrations.

It also breaks in interesting way: I discovered just today that constants defined on the Model subclass are not available when you use 'get_model()'. I suspect methods wouldn't be accessible either?

As far as I know, this is by design; your migrations are supposed to operate on a frozen state of what your models and code _were at a point in time_.

If you would rely on code outside of said migration, you would be breaching that frozen state and potentially end up with unintended side-effects (e.g. running a migration created 2 years ago that imports your code that changed today). This is why you might have to sometimes copy-paste logic to your python migrations, but you also guarantee that the migration always runs the same way.

That's built on top of the dependency graph though. You can use it without any models - just a bunch of inter-dependent `RunSQL` statements across different apps in your project.

I wish I could work on these magical projects that people with blogs work on, in which everything works like it's supposed to do. In my experience, though, it's not unusual to be forced by management to migrate to a new RDBMS a year into a project. The last time it happened to me, we were using Rails and had very little difficulty migrating from PostgreSQL to MySQL. We didn't need to alter application code or rewrite tests and only had to manage config changes and migrate the data. I imagine the transition wouldn't be much different if we had been using Django.

Seems like the author should've just used a much less opinionated framework. The last thing I want to do is choose a batteries-included framework and then take on the responsibility of maintaining my own system to manage SQL migrations. Also unclear where the author has worked, but I've had to replay a bunch of migrations in order on many occasions.

The implication of the article is that if you're using a relational database management system, you need to know SQL.

I agree. I wholeheartedly recommend Stephane Faroult's SQL Success as a beginner book. Read that with attention, and you'll be ahead of 95% of people messing around with RDBs.

Also agree, sql it's like javascript, it's almost on all projects I worked last 20 years.. (Django it's just another tool, and implementation detail).

A lot of problems stated are characteristic of any ORM not just Django. People pick ORMs specifically for trading convenience and tooling with loss of control. Any escape hatches that the ORM provides is by definition hacky.

These days when I know it’s a long term project, I prefer laying out the domain entities as plain classes and wrapping database operations as class methods with raw database statements.

This lets me use the underlying databases to its full potential and gives me enough control without leaking database code into application logic.

> How many times have you worked on a project and, after 1-2 years of development, you have changed to a different database? > > > I can tell you how many times it happened to me, I counted them: exactly never.

I guess this is the crux.

You hardly know in advance this is going to happen.

I experienced this transition (mysql>pgsql), though not with Django.

The ORM was perfect. If I recall correctly, only the customized SQL queries (only a few present) had to br adapted, everything else just worked.

11/10 would chose such a framework again.

I've migrated a production app from mysql to pgsql although not with django but a different orm and language. The app also had some custom sql. The reason was for proper json support as we have a 2tb table with unstructured data (1tb at the time) that was slowing mysql down severely (most used table by far). A later version of mysql with json support was not available on aws so wasn't an option. This was about two to three years in so the app was actually almost finished and in maintenance mode.

It worked. It took many months and copying the data and doing the migration was a pita. The orm helped in that I had to modify fewer queries by hand, but modifying the queries we did have was trivial also. It's one of only two times off the top of my head that the orm was a net benefit rather than a liability in the apps that use it. Would I use an orm in the future in case this comes up? Hell no. I'd pick a proper db first, aka pgsql (that decision was before my time). But even if I didn't, I'd still prefer to convert the sql one by one from mysql to pgsql rather than incur the horrific penalties and extremely slow run time and development time the orm imposed on us. Developing with the orm was many times slower and the runtime was ten times slower than the actual query time due to hydration. Rewriting a few thousand queries is nothing compared to losing months to the slow development speed and the slow runtime speed which required a completely separate rewrite of most of our functionality alongside the orm just so we could serve api requests in a manner somewhat reminiscent of not completely slow.

So yeah, even when switching from mysql to pgsql, the orm wasn't worth it. It never is or has been.

We've got a core product running on Django, and one thing the author doesn't mention is testing migrations. The migration system is wonderful, except for the difficulty in testing migrations. There's no sane/official way to do this. And it's such an important thing that I don't get why the Django crew haven't tackled it.

I've used django-test-migrations before and was really happy with it. https://pypi.org/project/django-test-migrations/

We've played around with that and it's on the roadmap. But it just adds another set of fixtures to herd.

What sort of proposal would you put forth? I generally install one of my production backups onto a staging server (secured appropriately for production data), and run the migrations against it to test.

That's not the intention on the code, I thought he is talking around writing a migration and a test that ensures the data migration is correctly applied

My strategy here is to have a staging server where migrations are applied before they can hit production.

As someone else pointed out, migration without CI can be dangerous and difficult to coordinate with deployment.

This is what we do but it's too manual for my taste.

Pretty good read! But I don't like how the author tries to characterize generic ORM's like Django's as "Portability at all costs". I don't think the point is that you can switch DB's a few years down the road if you choose. He's right, nobody does that. Ever :) The advantage is that you can start a new project with whatever DB you already have, or are comfortable with. So, if you want to begin a new Django project, you aren't locked into using MySql, or Postgres, or what-have-you. It is true you sacrifice being able to leverage the full gamut of features a particular engine might offer, but that is a compromise you make with any ORM right? Or, if the limitations are to much to bear, just don't use Django ORM, it's not required, it's just the path of least resistance.

I agree Django's migration system has a few unfortunate quirks (most notably the help-text-change-migrations), but it's also worked great for Zulip across over 300 database migrations, and I wouldn't recommend anyone do their own migration framework on raw SQL files by hand.

The issues mentioned here in this article are real (E.g. we have a lint rule blocking importing other code inside migration files), but the author is also ignoring the benefits of Django's system, namely that it's a well-documented system that does most of the work of managing migrations for you, and does a good enough job that the vast majority of developers working on a large software project (like Zulip, with 96 people who've contributed changes to database migrations) will do it correctly without special training just from the Django documentation.

And the downsides are certainly minor compared to the obvious downsides with rolling one's own system with raw SQL (E.g. the fact that nothing validates that the model definitions match the database schema in the author's proposal!). I wouldn't recommend anyone take the author's advice for what to do instead, but this article is helpful reading for anyone curious about what issues one will encounter after a few years working with Django.

> How many times have you worked on a project and, after 1-2 years of development, you have changed to a different database?

FWIW, Zulip was originally developed with MySQL, and then about a year into developed we switched to Postgres. It also had a phase where we ran SQLite in development environments. Building on Django made our life a lot easier doing this.

> Once the migrations have been applied to all your testing and production systems, and all your developers' databases, just remove them.

This is only true if you don't care about being able to use tools like `git bisect` where you might want to switch a development environment to older versions and back again (I do this often as part of).

> If you do it in a loop, it results in a "ripple load", meaning you'll run 50 queries, each one fetching one record, while you could have run a single query fetching 50 records, or just adding an extra JOIN to another query.

This is my biggest complaint about Django's ORM -- lazy fetching of foreign keys is great in a management shell but rarely desirable in production, and it results in inexperienced developers doing database queries in loops all the time. We address this with a bit of middleware that formats log lines like this e.g. `41ms (db: 10ms/2q)` (showing how much time was spend doing database queries and how many queries were done) -- which lets one easily identify problematic cases of this both in development and in production.

But if Django had a setting that just disables the lazy fetching of foreign keys feature (making accessing an unfetched foreign key an error), I'd turn it on for Zulip.

See also https://zulip.readthedocs.io/en/latest/subsystems/schema-mig....

I really appreciate your comment, I think I will add your log line idea to my project at work!

I also wanted to disable lazy fetching of foreign keys, and I wrote something for it, then decided it was too risky to use in production (monkey patching) so I ended up just running nplusone [0] locally and fixed the issues it detected. Really would like Django to add this feature so it can be enforced properly though!

[0] - https://github.com/jmcarp/nplusone

> This is my biggest complaint about Django's ORM -- lazy fetching of foreign keys is great in a management shell but rarely desirable in production, and it results in inexperienced developers doing database queries in loops all the time.

I may be misunderstanding you, but isn't this what select_related and prefetch_related exist for?

Those are how you optimize it, but first you have to recognize the need. The easiest way to make this visible is to increase logging or install Django-debug-toolbar, so people can see when the query count jumps, but the best way to prevent regressions is to use something like the testing framework’s assertNumQueries() method to prevent regressions:


I’ve used that with a fuzzy number class which will say it’s __eq__ to any number below a specified max so you can say e.g. my homepage should take less than 10 queries and not have to touch it for minor changes while catching when someone implements a new feature in a way which breaks your select_related usage.

Interesting comparing this to Ruby on Rail's ORM, which takes a relatively hands-off approach. ActiveRecord will produce a migration for you when you generate a model from the command line, but otherwise migrations are entirely the developer's responsibility. And the way he suggests handling patches sounds a lot like how ActiveRecord does it: keep track of what migrations have already been applied and only apply newer ones, and for a new database just create the up-to-date schema rather than applying the entire history of migrations.

There's nothing stopping you from hand-writing Django migrations if you choose. Django compares the state of the models to the state of the database after all migrations have been run to determine if new migrations are necessary, so as long as you've covered all of your model changes in your hand-rolled migration you'll be fine.

Ah, good to know. I have no experience with Django and the article gave me the impression that Django migrations can’t be handled manually.

All these systems are patch based systems (for an alternative approach see the comment in this thread from the Migra developer).

However, he is suggesting the patches should be written in SQL. By default in Django the patches are written in Python, which is then used to automatically SQL.

> keep track of what migrations have already been applied and only apply newer ones

The Django migration system does this for you, and it does it really well.

I made a custom Django system for my personal website in 2013. It became to old to work with my web host. So, now I just run the django locally long enough to run a wget mirror. Voilà, static content.

This is tricky, on the one hand keeping everything related to the database close to or in the database is probably cleaner (in the clean architecture sort of way), as it eases a separation between the business rules and the data saving and loading.

On the other hand, doing this loses a lot of the sugar and development benefits of using the ORM (either Django or SQLAlchemy, which I prefer), that are often enough—and perhaps preferred—for 95% of projects.

If I were to choose, I'd pick the ORM, though trying to place it as high in the stack as possible.

Terrible article, especially the part about migrations. The automatic migration generation of Django of course cannot infer exactly how you want your schema to change if you start renaming or altering columns.

The author's solution, instead of adding on to migrations then opts for completely abandoning a tried and tested migration framework for arbitrarily executing sql files? This is madness

May Django work with multiple tenants on different postgresql schemas? Can Django migrations handle this in an easy way (better than sql).. When projects grow old (+10years), things trend to complicate (lot of hands, lot of changed requirements.. ), and that's when the advices on the article had a lot of sense.. on the end data is there and the model is the database..

History Lesson (if you care about migrations):

TLDR; I disagree: Django migrations are the best, and make reusable apps possible!

I've been using Django since before they introduced migrations and it was hell! Django only had "sync_db" then which would create your tables initially, but then you were on your own to write and run SQL manually for any model changes. This became especially annoying when using 3rd party apps, because if those apps changed the schema, you'd have to manually update your database for the 3rd party changes.

Then came along django-south which was a reusable app which provided migrations and solved all of the above problems. It became almost the standard approach, and third party apps often provided south migrations.

South became so popular, and was such a big improvement to the development workflow, that it's codebase was merged into Django (after making some significant API improvements) to become Django migrations. I can still remember having to covert all the files in the migrations directory from South to Django migrations.

Now we have a defacto standard of migrations work, and they allow easy use of customer SQL. Seriously, if you hate python based migrations as much as the author does, write custom SQL but put it in a Django migration so you get all the benefits of being able to rollback migrations etc.

Summary: when writing all of your own code for a single project then there is only minor benefits from migrations. But they are still worth it! In fact, I write lots raw SQL in my migrations, eg for constraints, triggers, notify, and just modifying my data. If I run SQL, it goes in a migration.

But migrations shine the brightest for reusable first and third party apps, which lets admit, we all use in most projects. This allows a single reusable app to be used for all projects regardless of database. Also, as a consumer of a reusable app, I can let migrations do their thing, and not worry if an update will break my database.

I would also add to the article, that when you try to alter a busy table add it a timeout (otherwise you will lock the world). SET statement_timeout = 50; Alter .... ;

My advice is to avoid Django ORM altogether

Probably they should try django-north instead of their patch_db.py

Another thing important that the author doesn't mention is that if you try to keep your sql model on the postgres side.. you will be able to introduce other languages (for example to optimize hot paths)

I agree 100% with the author, if you check other open source projects (on other languages) they mostly are based on the patch version sql feature. Sorry, the migrations feature from my point of view it's absolutely over engineered..

I also agree with him, that doesn't make too much sense to keep Django+drf, when if you use fastapi+pydantic+asyncpg you got all the real innovation on the python side..

Also it's not a good argument that following the Django religion will let to proper future maintenance.. think on channels and the big mess around them :)

Sorry, I'm perhaps a bit biased, started with Django 0.9 till 1.9.. and right now I'm not using it for anything new.. on the way I swapped for go, and later come back to python thanks to asyncio.

What mess around channels are you talking about? I've never heard anything about such a mess?

Not everyone uses Django to build a REST API that connects with a javascript SPA. And even if I was, I would still be skeptical that FastAPI could do everything Django does for me. (I do like Pydantic a lot, and actually use it some times in my Django apps.)

Look at what will happen to channels on the next releases when Django will be async...

To have a websocked popping info from a redis pubsub doesn't make any sense all of this crazy abstractions.. it's just an async view reading from aioredis..

If you need an orm (a real one) use sqlalchemy.. the Django orm has it's own design flaws that a lot of of dbas will complain about it..

Anyway, latest thoughts I have it's that it's better to keep you model logic as near as sql as you can.. you will swap before python or Django than postgresql :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact