Andrew, is there info somewhere comparing the new stuff to current South in terms of what features differ etc? I'm particularly curious if there have been any changes to the way the orm freezing infra works.
wow. I peeked, and that's gonna take me some time to wrap my head around. Really interesting approach. Does this mean though that if you ever fall out of favor w/ the db.migrations API and for instance make a manual change to a piece of schema, that the faked orm will be forever broken because it won't know about that change?
No, this is a complete rewrite and a different way of doing migrations than South currently does. There should also be a South 2 eventually that is mostly compatible with this new migration design, for those stuck on older Django.
Has it solved that issue where say I have a TinyMCE field, then I ditch TinyMCE and use CKEditor, that I'd still need TinyMCE installed in order to do the old migration, despite it just being a TextField database-wise?
If any devs read this, for MySQL, you may want to consider using the update method adopted by many larger companies - something like pt-online-schema-change (or make the alter method pluggable so someone else can).
Basically, create a table as a copy of the old table, set up triggers to update the new table as data pours into the old table, alter the new table, and then do a rename.
This buys you some rollback capabilities, but more importantly it limits the impact on production traffic as the alters run. Of course, this is only really an issue on tables with hundreds of thousands of rows, but it's better than the naïve approach.
I've come to think this is something that I'll do earlier rather than later on projects moving forward. The last few projects I've worked on have put this off "until we get bigger", which inevitably ends up affecting agility and the types of migrations you're willing to do and when you're willing to release them. As complex as the approach is, I think I'd rather bite it off early in future projects.
Personally, I don't start worrying about it until it actually becomes a problem (i.e. when an alter actually takes more than a few seconds to run). At which point it's easy to migrate to using pt-online-schema-change.
Then again, I do most of my DB maintenance manually (I have DBA experience), so while I do use Django in production, I have less need to rely on generic automation tools.
Or use PostgreSQL, Oracle, MSSQL, or... lots of options there.
[EDIT] As a final thought on the topic, never be afraid of making schema changes, even in MySQL. It may require a bit more work to implement, but there are a bevy of solutions (most of which can be automated) which can limit or eliminate the "pain" caused by such alters.
You still suffer a brief outage though, no? We thought about this a lot in a previous life of mine, and the problem with what I think you describe is that once everything was said and done, you'd be left with:
ALTER TABLE RENAME foo foo_old;
-- You're down right here!
ALTER TABLE RENAME foo_new foo;
-- …and we're back!
(For those new to MySQL, ALTER TABLE is immune to transactions.) Admittedly, it might be less than applying the migration direct, so it might be worthwhile. (We just ate the downtime, and tested things to avoid surprises.)
Or, assuming you already run, at least, one secondary write or read-only server, use Percona XtraDB Cluster / Galera. There are caveats for pt-online-schema-change that make it not usable in all cases, and they are not necessarily edge cases.
There are caveats for PXC as well, which may prevent you from using it regularly which are completely unrelated to schema maintenance.
Plus, PXC uses innodb under the covers as well, which means you still have to wait while alters on tables occur, and that they can not be done in transactions (well, not entirely true, but close enough for these purposes).
I started coding Django a couple of months ago and was blown away when I attempted to add a field to an existing model only to get yelled at. StackOverflow comments told me that South was the only way to go. I guess I just couldn't believe it wasn't an already existing feature. It seems that adding fields is common enough to be needed early on...
In the past (not sure about now) the Django docs (and thus, I have to assume, the devs) were somewhat against automated migrations. The rationale was that the developer should understand his or her data models and do migrations manually to avoid problems caused by mistakes made by the migration machinery. Obviously this attitude has changed, perhaps because of the success of South and the fact that most people would rather have the process automated, even if it causes occasional weirdness.
> I guess I just couldn't believe it wasn't an already existing feature.
It existed. Django is able to generate SQL DDL on a per-app basis since the first versions. The problem is that people wanted an automatic way of doing that, but there a couple of different approaches. South (and the proposed built-in feature) is not the best approach to everybody.
I too was amazed when I first tried out Django to find out that there was no way to change my model, migrate and test. I guess that's one of the things they don't teach you upon experiencing ORM for the first time.
Fantastic work! I've been following the blog posts and it was interesting to understand the decisions you made along the way.
Also, thanks for supporting Oracle straight away. When you first mentioned the kickstarter I noted the possibility that oracle support would either lag or be missed entirely, effectively making oracle support in django a second-class citizen.
I'm starting a new project in a few weeks (on Oracle RAC), so I'll try to test as much as possible.
A bit OT, but how do you guys handle database changes between multiple branches? Say I added some fields in branch A through multiple apps and then I need to go back to the master to do a fix. How do you revert the changes before checkout master?
"Migrations specify which other migrations they depend on - including earlier migrations in the same app - in the file, so it’s possible to detect when there’s two new migrations for the same app that aren’t ordered."
This is a fantastic feature. I've worked on projects that use a South fork that only uses a migration number, eg 0003.py, specifically to cause version control to trigger a merge conflict.
That's fine. Some people consider Django bloated. Some prefer flask. Others will prefer something with more stuff included than Django. It sounds like you never were really the target audience for Django?
Well django is already very big.. so the target audience is people using Django ;-) It's a very stable, python-based, web framework with a huge community and widely supported. Some python hackers find it too big with too much bloated features. As far as I'm concerned, anytime I start with a smaller web-framework I need to add mostly all django functionaly the next few days so meh.
IMHO, for new projects, DJango is clearly one of the top contestant.
I've asked why people prefer Rails to Django repeatedly on HN. The best answer I've gotten  is that they're more-or-less feature-equivalent now, but historically speaking Rails beat Python to the niche and got entrenched due to network effects.
Can you give me some specific examples of stuff Rails does but Django doesn't?
Also, Ruby sucks. I know over a dozen languages, but every time I've tried to learn Ruby it's been an awful experience that's ended with me quitting in frustration at the totally unnecessary obtuseness and complexity of the core language's syntax  , before even getting into the enormous number of additional moving parts in a web framework as complex as Rails.
Most Rails projects that I've seen rely on way more third party "gems" than the average Django project.
Regardless, using 3rd party libraries anywhere is pretty straightforward these days, so I don't see how the inconvenience of adding a library to your requirements list could outweigh the fact that Rails means you are using Ruby when you could be using Python.
For a couple of reasons that I can think of. You might want to do a rollback. You may have multiple copies of the database (various dev versions, staging, production), and multiple changesets might build up in one environment before progressing to another which must be specifically ordered. There may be changes that need to be manually coded or tweaked, and you need some intermediate structure for that to happen.
Not sure what you mean. South has something called a "data migration". However, the only difference between "data migrations" and "schema migrations" is that data migrations do not include a dry run. Dry runs require transaction support. Not all databases have it (MySQL).
Other than that, what is your definition of a "data migration"?
data migration involves mutating data in the table. for example, consider a timestamp column that was originally recorded in some local timezone but that you need to convert to store everthing in UTC + offset format. No changes to the table itself, but a fairly significant change to the data.
you could manage that operation using a Python script, but it would potentially be slow and might make it hard to take advantage of database specific features. A data migration tool allows you to describe this operation in succinct declarative code and then the migration tool will figure out how to get the database to do that with maximum safety and efficiency.
or you could do it in raw SQL as well, but thats a bit uncomfortable if your project has used the ORM interface for all database operations. you'd also have to write different SQL for each backend. a data migration tool can generate the correct SQL for any backend you plug in.
I've always done data migrations like that, by hand, with straight up SQL. It would seem to me that there'd be way too many combinations of changes to properly implement in a generic way.
Although, after reading your comment, I'd never taken 3rd party libraries into account. Adding/changing timezone support to a datetime field is a really good example of when that might be necessary.
The groundwork has been laid with an API and supporting infrastructure for schema migrations though. It looks like it should be possible to implement data migrations by adding Operations to django.db.migrations.operations, and suitable sql and methods to each of the backends. You could then hand-craft the migration files.
I'm assuming that the logic behind migrations wouldn't verify the database/model structure before running such a data migration which is probably flat out wrong though.
That's why I specified "out of the box." It doesn't make sense for there to be completely mandatory extensions like South -- they should be bundled in, fitting with the "batteries included" culture of Python.
Except no, Rails doesn't have this. Not even remotely close.
Rails still makes you write your migrations by hand. It also forces you to distribute your critical constraints, validations and dependencies over a usually incomplete model class, and incremental migration files.
In Rails/AR there is no place in the filesystem to find out about the currently declared schema.
At absolutely no point in our dev team's existence did we find a huge need to have more than the model files and/or schema.rb as a schema reference. If you can't infer what you need to from there, then maybe web dev's not your bag, baby
For anything on Django 1.7 (the version it will come out in), yes, South will no longer be needed. Older versions of Django should get a South 2 in the next few months with a lot of this code backported.
I absolutely love python and I'm using it daily but Django seems so....outdated?! Biggest web framework needs someone to raise funding via kickstarter to build a schema migration?
2013 and you still can't build a REST application without installing 3rd party code? I mean c'moon there's angularjs, emberjs, knockout, extjs etc etc etc out for years don't you see a light at the end of the tunnel or you're the type of team that build more tunnel?
Sorry to say but news like this just make me sad. Still happy I chose flask + sqlalchemy for my website. Django is somewhere in 2004-2005 still. I lose the admin of course but I hate general things anyway (one size fits all type of thing).
I'm struggling to understand why you're saying you can't build a REST api with Django out of the box. It's easy to serialise python data to pretty much any format. Why do you need Django to do any of that for you? Things like RESTapi add extra features, not core REST functionality.
I recently built (more or less) a REST API and a Backbone front-end using Python. Why would I want to use Django to do so when there are so many other (more appropriate) options? Does Django really need to be a framework that does everything?
I started before Flask supported Python 3, so I used bottle, psycopg2 and a bunch of other python packages. Knowing what I was interested in accomplishing, I was pretty easily able to stitch together these libraries to create a solid, to-the-point codebase.
Having spent a bunch of time working on various backbone-heavy apps with Rails I don't have much interest in using a big, full-stack, opinionated framework to act as a REST API. There is so much overhead and so many opinions to work against.
Couldn't agree more, especially when you're basically speaking JSON with some authentication/authorization. I even question myself when using Flask-Restful vs just making my views return straight up JSON....but...that's a different discussion.
And the rest of the world who see Django as an amazing RAD tool with a huge app ecosystem keep plugging along ... don't get me wrong, I'm a huge advocate frameworks like AngularJS, but that approach is just not suitable for most meat-and-potatoes public-facing web applications.
Do you have any examples of queries you've found difficult to express using the Django ORM? I'm geniuiny interested in hearing - I've been using it for 5+ years now and I rarely find the need to step outside it. I frequently write queries that do joins across 6 or more tables with count aggregations in single lines of Python code with it.
I don't with me. A common workflow ended up being for me was write the complex query in SQL (because it wasn't obvious to get going in the ORM), and then spend a long time reverse engineer it to get it to work in the Django ORM. Lots of joins, sub-selects, ordering, grabbing the latest single record from time sorted in the middle, etc. In the end, I had to think about how to use the ORM, and then I still needed to know the SQL for optimizations...
rick-rolling? I know it's not hard to plug-in a module. It's the fact that you have to turn for support to yet another company/person if django upgrades it's codebase breaking things. I'm playing right now with meteorjs and I know what it means to wait months for one thing to be fixed and, when it is fixed, wait a few more months for the plugins to catch-up.
Over Django? Any day! I'm not an advocate of that. It's just the alternative I had at that time (which doesn't mean I advocate it). I'm still watching Django very closely and, as I said, fundraising for a schema migration makes me sad.
You just stitched together at least two different modules there, making your initial comment seem quite silly. I think the proper response would be, "Why doesn't Flask have an ORM? It's 2013. I can't even build REST API without importing a third party module!"
Disclaimer: I like Flask, I like Django. I've used both for cases where they were the most appropriate choices. Different philosophies, apples to oranges. The poo-poo'ing Django for not baking schema migrations in is silly. The poo-poo'ing a great kickstarter project is even more silly. We chipped in without a thought to make this happen quickly, and it was worth every penny for us.
There have been a number of attempts at schema migrations with Django, and the devs correctly weren't in a rush to main-line them before things shook out. The community eventually settled around South and Andrew's excellent leadership.
Now that we had a methodology picked out, the only barrier was finding the time to get this baked into mainline. We essentially paid for Andrew to get this done fast and well, and he has delivered. If anything, I'd like to see MORE Kickstarting of these larger projects in the open source world. I'll gladly chip in if I'm building a business using your software. It's a no-brainer investment with a high probability of good returns.
In case my meaning was unclear, what I meant was that it's perfectly possible to make a REST application without third party code. The existence of third party code does not mean that it's impossible, or even particularly difficult, without them.
what's wrong with using plugins to extend functionality? these aren't dinky/crappy/unreliable side projects we're talking about. these are solidly engineered, production ready plugins that are used by hundreds or even thousands of projects.