Hacker News new | comments | show | ask | jobs | submit login
Schema migrations merged into Django master (github.com)
394 points by Spiritus on Aug 23, 2013 | hide | past | web | favorite | 108 comments

I'd like to point out that this is just a first working version, and there's still some work left to do, but I'm very relieved this is finally merged!

Andrew, is there info somewhere comparing the new stuff to current South in terms of what features differ etc? I'm particularly curious if there have been any changes to the way the orm freezing infra works.

ORM freezing is basically gone, in favour of a smarter approach where the previous state is derived from reading all of the previous migrations in to an in-memory data structure.

Andrew's blog has some great write-ups of how the new system has evolved: http://www.aeracode.org/category/django-diaries/

wow. I peeked, and that's gonna take me some time to wrap my head around. Really interesting approach. Does this mean though that if you ever fall out of favor w/ the db.migrations API and for instance make a manual change to a piece of schema, that the faked orm will be forever broken because it won't know about that change?

Is this pluggable? Can users stick with South (or whatever migration backend) and still use the same API?

No, this is a complete rewrite and a different way of doing migrations than South currently does. There should also be a South 2 eventually that is mostly compatible with this new migration design, for those stuck on older Django.

thank you kindly for this Andrew. South has been an indispensable tool for me and its wonderful to see it in django core.

This is exciting for Django, and it's exciting for OSS in general, as it demonstrates the viability of crowdfunding (certain) OSS development.

Congratulations to Andrew and to all involved.

Migrations are per-app, but the command is shown as

  python manage.py makemigrations
Using South, it was

  python manage.py schemamigration app_name
How does django know which app to build migrations for?

It does them all at once. The new system knows about interdependencies, so if you have ForeignKeys between apps it'll make all of them in the right order.

Has it solved that issue where say I have a TinyMCE field, then I ditch TinyMCE and use CKEditor, that I'd still need TinyMCE installed in order to do the old migration, despite it just being a TextField database-wise?

So if I understand correctly, "makemigrations" will convert all apps to use migrations except apps which were created using previous versions of Django?

It will prompt you if an app doesn't yet have migrations and ask if you want it to. That prompt is currently a bit aggressive - asking you every time - but that will be fixed soon.

If any devs read this, for MySQL, you may want to consider using the update method adopted by many larger companies - something like pt-online-schema-change (or make the alter method pluggable so someone else can).

Basically, create a table as a copy of the old table, set up triggers to update the new table as data pours into the old table, alter the new table, and then do a rename.

This buys you some rollback capabilities, but more importantly it limits the impact on production traffic as the alters run. Of course, this is only really an issue on tables with hundreds of thousands of rows, but it's better than the naïve approach.

I've come to think this is something that I'll do earlier rather than later on projects moving forward. The last few projects I've worked on have put this off "until we get bigger", which inevitably ends up affecting agility and the types of migrations you're willing to do and when you're willing to release them. As complex as the approach is, I think I'd rather bite it off early in future projects.

Or, use postgres!

Personally, I don't start worrying about it until it actually becomes a problem (i.e. when an alter actually takes more than a few seconds to run). At which point it's easy to migrate to using pt-online-schema-change.

Then again, I do most of my DB maintenance manually (I have DBA experience), so while I do use Django in production, I have less need to rely on generic automation tools.

Or use PostgreSQL, Oracle, MSSQL, or... lots of options there.

[EDIT] As a final thought on the topic, never be afraid of making schema changes, even in MySQL. It may require a bit more work to implement, but there are a bevy of solutions (most of which can be automated) which can limit or eliminate the "pain" caused by such alters.

You still suffer a brief outage though, no? We thought about this a lot in a previous life of mine, and the problem with what I think you describe is that once everything was said and done, you'd be left with:

  ALTER TABLE RENAME foo foo_old;
  -- You're down right here!
  ALTER TABLE RENAME foo_new foo;
  -- …and we're back!
(For those new to MySQL, ALTER TABLE is immune to transactions.) Admittedly, it might be less than applying the migration direct, so it might be worthwhile. (We just ate the downtime, and tested things to avoid surprises.)

Or, assuming you already run, at least, one secondary write or read-only server, use Percona XtraDB Cluster / Galera. There are caveats for pt-online-schema-change that make it not usable in all cases, and they are not necessarily edge cases.

There are caveats for PXC as well, which may prevent you from using it regularly which are completely unrelated to schema maintenance.

Plus, PXC uses innodb under the covers as well, which means you still have to wait while alters on tables occur, and that they can not be done in transactions (well, not entirely true, but close enough for these purposes).

I started coding Django a couple of months ago and was blown away when I attempted to add a field to an existing model only to get yelled at. StackOverflow comments told me that South was the only way to go. I guess I just couldn't believe it wasn't an already existing feature. It seems that adding fields is common enough to be needed early on...

Happy about the news!

In the past (not sure about now) the Django docs (and thus, I have to assume, the devs) were somewhat against automated migrations. The rationale was that the developer should understand his or her data models and do migrations manually to avoid problems caused by mistakes made by the migration machinery. Obviously this attitude has changed, perhaps because of the success of South and the fact that most people would rather have the process automated, even if it causes occasional weirdness.

It was documented as something that was desired, but not something that would go into Django core until the dust settled as to what the best approach was; it's not something to hurry into.

What's been implemented now is significantly better than what South had. If they'd just decided to adopt South, that wouldn't have happened. Thus, it's good that it happened. :-)

> I guess I just couldn't believe it wasn't an already existing feature.

It existed. Django is able to generate SQL DDL on a per-app basis since the first versions. The problem is that people wanted an automatic way of doing that, but there a couple of different approaches. South (and the proposed built-in feature) is not the best approach to everybody.

I too was amazed when I first tried out Django to find out that there was no way to change my model, migrate and test. I guess that's one of the things they don't teach you upon experiencing ORM for the first time.

I'd be curious to read that question... I have also been working Django on the side and have encountered migration issues.

Fantastic work! I've been following the blog posts and it was interesting to understand the decisions you made along the way.

Also, thanks for supporting Oracle straight away. When you first mentioned the kickstarter I noted the possibility that oracle support would either lag or be missed entirely, effectively making oracle support in django a second-class citizen.

I'm starting a new project in a few weeks (on Oracle RAC), so I'll try to test as much as possible.

I'm also using Oracle, and have long lamented the lack of South (or anything like it). I will be a happy test user for this as well.

A bit OT, but how do you guys handle database changes between multiple branches? Say I added some fields in branch A through multiple apps and then I need to go back to the master to do a fix. How do you revert the changes before checkout master?

If you write backwards migrations as well as forwards, South will do that for you, so presumably the Django migration system will do as well.

I know this. But I'm wondering if there is an easy way to do that, like a checkpoint that I can rollback all migrations to a specific point.

Some people just choose to use separate database names for each branch.

"Migrations specify which other migrations they depend on - including earlier migrations in the same app - in the file, so it’s possible to detect when there’s two new migrations for the same app that aren’t ordered."

This is a fantastic feature. I've worked on projects that use a South fork that only uses a migration number, eg 0003.py, specifically to cause version control to trigger a merge conflict.

Finally, but too little too late in my opinion.

I love python, but I would rather use Rails for web stuff at this point. So many 3rd party libraries are needed to do what Rails can do right out of the box.

That's fine. Some people consider Django bloated. Some prefer flask. Others will prefer something with more stuff included than Django. It sounds like you never were really the target audience for Django?

What exactly is the target audience for Django? I actually love using Python and the Python design philosophy. If I'm not part of the target audience for Django, who is?

Well django is already very big.. so the target audience is people using Django ;-) It's a very stable, python-based, web framework with a huge community and widely supported. Some python hackers find it too big with too much bloated features. As far as I'm concerned, anytime I start with a smaller web-framework I need to add mostly all django functionaly the next few days so meh.

IMHO, for new projects, DJango is clearly one of the top contestant.

Yep. If half the world complains it does too much and the other half that it does too little then maybe it's got it about right ;-)

I've asked why people prefer Rails to Django repeatedly on HN. The best answer I've gotten [1] is that they're more-or-less feature-equivalent now, but historically speaking Rails beat Python to the niche and got entrenched due to network effects.

Can you give me some specific examples of stuff Rails does but Django doesn't?

Also, Ruby sucks. I know over a dozen languages, but every time I've tried to learn Ruby it's been an awful experience that's ended with me quitting in frustration at the totally unnecessary obtuseness and complexity of the core language's syntax [2] [3], before even getting into the enormous number of additional moving parts in a web framework as complex as Rails.

[1] https://news.ycombinator.com/item?id=5784117

[2] https://news.ycombinator.com/item?id=5157886

[3] https://news.ycombinator.com/item?id=5872899

Most Rails projects that I've seen rely on way more third party "gems" than the average Django project.

Regardless, using 3rd party libraries anywhere is pretty straightforward these days, so I don't see how the inconvenience of adding a library to your requirements list could outweigh the fact that Rails means you are using Ruby when you could be using Python.

Dumb question here. Instead of keeping all these migration files why can the migration just compare the models to the database directly and make he appropriate changes?

For a couple of reasons that I can think of. You might want to do a rollback. You may have multiple copies of the database (various dev versions, staging, production), and multiple changesets might build up in one environment before progressing to another which must be specifically ordered. There may be changes that need to be manually coded or tweaked, and you need some intermediate structure for that to happen.

So what about data migrations? In the docs I couldn't find something about it. And it's pretty important for a complete migration tool, isn't it?

Not sure what you mean. South has something called a "data migration". However, the only difference between "data migrations" and "schema migrations" is that data migrations do not include a dry run. Dry runs require transaction support. Not all databases have it (MySQL).

Other than that, what is your definition of a "data migration"?

data migration involves mutating data in the table. for example, consider a timestamp column that was originally recorded in some local timezone but that you need to convert to store everthing in UTC + offset format. No changes to the table itself, but a fairly significant change to the data.

you could manage that operation using a Python script, but it would potentially be slow and might make it hard to take advantage of database specific features. A data migration tool allows you to describe this operation in succinct declarative code and then the migration tool will figure out how to get the database to do that with maximum safety and efficiency.

or you could do it in raw SQL as well, but thats a bit uncomfortable if your project has used the ORM interface for all database operations. you'd also have to write different SQL for each backend. a data migration tool can generate the correct SQL for any backend you plug in.

I've always done data migrations like that, by hand, with straight up SQL. It would seem to me that there'd be way too many combinations of changes to properly implement in a generic way.

Although, after reading your comment, I'd never taken 3rd party libraries into account. Adding/changing timezone support to a datetime field is a really good example of when that might be necessary.

The groundwork has been laid with an API and supporting infrastructure for schema migrations though. It looks like it should be possible to implement data migrations by adding Operations to django.db.migrations.operations, and suitable sql and methods to each of the backends. You could then hand-craft the migration files.

I'm assuming that the logic behind migrations wouldn't verify the database/model structure before running such a data migration which is probably flat out wrong though.

Understood. But I cannot imagine a data migration framework that would be anything more than a scaffold for custom functions.

Imagine the difficulty involving forward and backward data migrations. I cannot imagine a way to automate this.

In any case, you can't have it both ways. Either you do it in SQL, or you do it in Python using the ORM. There isn't a third option.

People still argue over Flask vs Django.

Django is a working car, the comes with a lot of features, and is more difficult to customize (and you eventually have to start hacking at it for custom features).

Flask is the assembled chassis, engine, steering (no frame, body). It's up to you to build it into a car of your choice.

Eventually though you will have to look at tuning the base system for your special special system.

So is this slated for 1.7 now? Or is it still on track for 1.6?

It's for 1.7, always has been.

Ok. But it hasn't always been, according kickstarter (http://www.kickstarter.com/projects/andrewgodwin/schema-migr...) 1.6 is mentioned, and "1.7 at latest". Considering 1.6 isn't out, I was hoping for it!

Noted, apologies. 1.6 is in beta and feature frozen, and it's certainly been the plan for a while to target 1.7.

I wouldn't sweat it tho - the 1.5-1.6 cycle has been pretty quick, and I'm sure as soon as the migrations work is ready and tested the Django core team will be pretty eager to push a 1.7 release.

I'll work on tempering my excitement :)

I briefly read the doc in the changeset, but couldn't find if the migration files from South are compatible with this new bultin migrations. Does anyone know that?

No, south migrations are not compatible, you'll have to collapse your south migrations down and start with these.

Yes! My £50.00 was well spent :-)

Awesome, Django might be usable out of the box now. :)

Yes, for people who couldn't type

    pip install south

That's why I specified "out of the box." It doesn't make sense for there to be completely mandatory extensions like South -- they should be bundled in, fitting with the "batteries included" culture of Python.

I don't really feel out of the box when I type

    pip install django south

Regrettably, how you feel doesn't change the well-understood and long-standing definition of "out of the box."

You have to A) know about this ridiculously named package called "south" and B) remember to install it everytime though. That's why it's not "out of the box".

Here, Google even backs us up on this - https://www.google.com/search?q=define%3A+out+of+the+box

"Out of the box is the term used to denote items, functionalities, or features that do not require any additional installation. ..." - via http://en.wikipedia.org/wiki/Out_of_the_box

South didn't (really) support Oracle, which is a django officially supported backend. I know there aren't many django users using Oracle - comparatively - but south wasn't a perfect solution.

django-extensions is still missing, especially the shell_plus and runscript commands.

Welcome to Rails circa 5 years ago! :)

Except no, Rails doesn't have this. Not even remotely close.

Rails still makes you write your migrations by hand. It also forces you to distribute your critical constraints, validations and dependencies over a usually incomplete model class, and incremental migration files.

In Rails/AR there is no place in the filesystem to find out about the currently declared schema.

It's a fundamentally broken design.


schema.rb contains an _incomplete_ reflection on the current SQL schema. No references, no relationships.

If you don't know the difference between AR and a declarative ORM you may want to refrain from dropping smarty oneliners.

At absolutely no point in our dev team's existence did we find a huge need to have more than the model files and/or schema.rb as a schema reference. If you can't infer what you need to from there, then maybe web dev's not your bag, baby

then maybe web dev's not your bag, baby

Over the past 15yrs I've written web-apps in 7 different languages and a wide range of frameworks.

From your comment I can infer that you know Ruby on Rails, and that's pretty much it. Maybe you should reconsider your tone?

He is a Ruby hipster. They are all about smart one liners.

Rails has had decent migrations for at least 7 years, probably longer. (I only started 7 years ago, and it was already pretty standard)

Man, time flies...

Heck, even PHP frameworks have had migrations for ages!

Somewhat related: Are there any plans to integrate Alembic into SQLAlchemy?

It's about as integrated as it could get while still being a separate package, since it's written by the author of Alchemy.

Most users of SQLAlchemy tend to use several libraries anyway, as opposed to one framework (the Django style).

I hope not. Not everything has to be a kitchen sink.

This is great news for Django - congratulations to all involved!

Neat. Once this hits a stable release, does it mean that South should be considered deprecated for new projects?

For anything on Django 1.7 (the version it will come out in), yes, South will no longer be needed. Older versions of Django should get a South 2 in the next few months with a lot of this code backported.

But does it work with custom ("initial") sql?

This was one of the failing points of south in my opinion.

Why wouldn't Django just use South instead of creating ever more dependencies on itself?

The migrations support for Django is being written by the author of South, based on everything he learnt on that project. It's essentially South 2 but baked in to Django.

Now let's get rid of WSGI and make async signals possible!

Are you trying to combine WebSockets or realtime communication with Django? I'd look at Cody Soyland's excellent work on this. http://codysoyland.com/2011/feb/6/evented-django-part-one-so... or look at the Django example in gevent-socketio https://github.com/abourget/gevent-socketio

I absolutely love python and I'm using it daily but Django seems so....outdated?! Biggest web framework needs someone to raise funding via kickstarter to build a schema migration?

2013 and you still can't build a REST application without installing 3rd party code? I mean c'moon there's angularjs, emberjs, knockout, extjs etc etc etc out for years don't you see a light at the end of the tunnel or you're the type of team that build more tunnel?

Sorry to say but news like this just make me sad. Still happy I chose flask + sqlalchemy for my website. Django is somewhere in 2004-2005 still. I lose the admin of course but I hate general things anyway (one size fits all type of thing).

I'm struggling to understand why you're saying you can't build a REST api with Django out of the box. It's easy to serialise python data to pretty much any format. Why do you need Django to do any of that for you? Things like RESTapi add extra features, not core REST functionality.

I recently built (more or less) a REST API and a Backbone front-end using Python. Why would I want to use Django to do so when there are so many other (more appropriate) options? Does Django really need to be a framework that does everything?

I started before Flask supported Python 3, so I used bottle, psycopg2 and a bunch of other python packages. Knowing what I was interested in accomplishing, I was pretty easily able to stitch together these libraries to create a solid, to-the-point codebase.

Having spent a bunch of time working on various backbone-heavy apps with Rails I don't have much interest in using a big, full-stack, opinionated framework to act as a REST API. There is so much overhead and so many opinions to work against.

Couldn't agree more, especially when you're basically speaking JSON with some authentication/authorization. I even question myself when using Flask-Restful vs just making my views return straight up JSON....but...that's a different discussion.

I especially avoided the word "rails". You know why

And the rest of the world who see Django as an amazing RAD tool with a huge app ecosystem keep plugging along ... don't get me wrong, I'm a huge advocate frameworks like AngularJS, but that approach is just not suitable for most meat-and-potatoes public-facing web applications.

> there's angularjs, emberjs, knockout, extjs etc etc etc

That answers your complaint. There are many choices, no clear best choice. Frameworks need to balance being not too opinionated (making choices for you) and being convienent/batteries included.

South, over time, became the obvious, best choice.

When a REST plugin gets to same level it probably will be added.

In meantime, doing pip install REST plugin of your choice and adding a single line to installed apps is easy enough even an incompetent developer can do it.

How does Flask + SQLAlchemy make it easier to build a REST interface than out-of-the-box Django?

Because the Django ORM is so limiting in so many aspects? It's great if your queries are simple...

Do you have any examples of queries you've found difficult to express using the Django ORM? I'm geniuiny interested in hearing - I've been using it for 5+ years now and I rarely find the need to step outside it. I frequently write queries that do joins across 6 or more tables with count aggregations in single lines of Python code with it.

I don't with me. A common workflow ended up being for me was write the complex query in SQL (because it wasn't obvious to get going in the ORM), and then spend a long time reverse engineer it to get it to work in the Django ORM. Lots of joins, sub-selects, ordering, grabbing the latest single record from time sorted in the middle, etc. In the end, I had to think about how to use the ORM, and then I still needed to know the SQL for optimizations...

I think the question was more along the lines of what bundled module or functionality in Flask + SQLAlchemy specifically and uniquely makes REST interfaces easy to build.

Because I know of no such thing -- it's still very much roll-your-own or use a third-party library.

> 2013 and you still can't build a REST application without installing 3rd party code?

Of course you can. There are some third party apps that might save some time.

rick-rolling? I know it's not hard to plug-in a module. It's the fact that you have to turn for support to yet another company/person if django upgrades it's codebase breaking things. I'm playing right now with meteorjs and I know what it means to wait months for one thing to be fixed and, when it is fixed, wait a few more months for the plugins to catch-up.

It's mind-boggling that somebody would criticize the use of third-party libraries for a REST interface shortly after advocating an approach based on gluing together third-party libraries.

Not advocating it at all. Read again please.

  Still happy I chose flask + sqlalchemy for my website.
OK ...

Over Django? Any day! I'm not an advocate of that. It's just the alternative I had at that time (which doesn't mean I advocate it). I'm still watching Django very closely and, as I said, fundraising for a schema migration makes me sad.

You just stitched together at least two different modules there, making your initial comment seem quite silly. I think the proper response would be, "Why doesn't Flask have an ORM? It's 2013. I can't even build REST API without importing a third party module!"

Disclaimer: I like Flask, I like Django. I've used both for cases where they were the most appropriate choices. Different philosophies, apples to oranges. The poo-poo'ing Django for not baking schema migrations in is silly. The poo-poo'ing a great kickstarter project is even more silly. We chipped in without a thought to make this happen quickly, and it was worth every penny for us.

There have been a number of attempts at schema migrations with Django, and the devs correctly weren't in a rush to main-line them before things shook out. The community eventually settled around South and Andrew's excellent leadership.

Now that we had a methodology picked out, the only barrier was finding the time to get this baked into mainline. We essentially paid for Andrew to get this done fast and well, and he has delivered. If anything, I'd like to see MORE Kickstarting of these larger projects in the open source world. I'll gladly chip in if I'm building a business using your software. It's a no-brainer investment with a high probability of good returns.

In case my meaning was unclear, what I meant was that it's perfectly possible to make a REST application without third party code. The existence of third party code does not mean that it's impossible, or even particularly difficult, without them.

I built a RESTful API without installing 3rd party code in Django, don't know why you think this can't be done.

what's wrong with using plugins to extend functionality? these aren't dinky/crappy/unreliable side projects we're talking about. these are solidly engineered, production ready plugins that are used by hundreds or even thousands of projects.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact