Django Diaries / 23rd Oct 2013

Flat as a Pancake

After a wonderful relaxing week at /dev/fort on the Isle of Eigg - somewhere I can heartily recommend if you're after amazing landscapes in remote places - work continues on migrations, in amongst the preparations for my move to the US.

The last two weeks has been entirely focused on migration squashing - the ability to take an existing set of migrations, replace them with just one migration, and then have all new installations use that one migration instead of taking minutes running all the small migrations you committed that evening when you just felt like adding some random new columns.

But how?

The way this is done is reasonably simple - we simply take the existing migrations, extract all of their operations, and concatenate them into one big list.

This is, clearly, going to do the same thing as all the smaller migrations, and it's quite nice not having a hundred names flying past you when you first run migrate. It's still almost as slow, though; behind the scenes you're still issuing loads of ALTER TABLE commands on brand-new tables.

This is where the optimiser comes in. It takes that big list of operations, and starts optimising away some of them. If it sees a matching CreateModel and DeleteModel, it'll remove them both. If it sees an AddField and a CreateModel for the same model, it'll combine them into an updated CreateModel with that new field.

This means that the resulting migration is both simpler and faster than the ones it replaces, greatly speeding up initial migration time, especially on databases that aren't so great at running ALTER TABLE (I won't name names. You can guess what I'm referring to.)

The new squashed migration file looks like a normal migration file except for a new replaces attribute that lists all of the migrations this file squashed; of course, something has to read that new attribute...

Graph modelling

As I've mentioned previously, Django represents migrations as a digraph, with the migrations as nodes and their dependencies as edges. Obviously, squashed migrations don't quite fit into this model; you can only use them if all of the migrations they replace are either applied or unapplied, as there's no way to start running operations from the middle of a migration.

Instead of representing them as part of the graph, then, they're dealt with as the graph is constructed. Django first loads all the migration nodes, and then goes through the squashed migrations. If it finds a squashed migration with every single migration it replaces marked as either applied or unapplied, it cuts them out of the graph and replaces them with the new squashed migration, re-pointing all the incoming dependency edges to it as well.

This means that doing a squash has no impact on any existing installations; if they're only halfway through the set of migrations you squashed, they'll keep using the old files until they're all the way through, and then they'll switch over to the new, more optimised graph.

Workflow

Essentially, then, that means the workflow goes like this:

  • Squash migrations into a new, single migration
  • Commit/release with the new and old migrations both in place
  • Get all deployments upgraded past the squash point
  • Delete the old migrations from the codebase

If you're a single-product company, the gap between committing the squash and removing the old migrations is probably quite short - about a week or so. If you're a popular third-party app, on the other hand, you'll only want to move them after a whole major release, to allow people time to upgrade.

The goal here is to handle the hard job of keeping multiple histories tracked for you; at no point should you need to drop down and start fiddling with the files and history table manually, as South needed you to do for the same kind of operation.

If you want to, though, you still can; everything is designed to be human-readable and easy to understand and debug - the last thing you want is some weird problem occurring due to a migration file and have no way to work around it. I'm not in the practice of locking people in.

The code is all in Django's master branch and there's a few optimiser rules in place, though more optimisations and the ability to squash squashed migrations are still coming. The command is pretty simple; just pass it an app name and the migration to squash up to (it always starts from 0001):

./manage.py squashmigrations authors 0003

Up Next

The next piece of work is somewhat related, in that it's another workflow fix; in particular, the ability to have initial migrations auto-applied when you're converting to use migrations with an existing database. It'll be a crucial part of getting migrations into django.contrib too, something that probably excites me more than it should; look for the next article for more detail on that!