Django Diaries / 22nd May 2013

South 0.8, Migrations and DjangoCon

I've wanted to get a new release of South out for ages, so I'm delighted that I've finally done so. South 0.8 is now available on PyPI - there's not a great many new changes, the most notable (and the reason for the major version bump) being Python 3 support.

Aymeric Augustin was instrumental in getting that support implemented, so I'd like to thank him for his work on it. On a related note, support for Python 2.5 is being dropped - if you still need that, you'll need to stick with the 0.7.x series.

The other notable change is support for index_together, one of the new improvements in Django 1.5 and something that should have been released a while ago. There's still no first-party support for AUTH_USER_MODEL - it'll work fine as long as you're not distributing third-party apps with migrations. The overall solution to that is something that will have to be implemented in the rewrites that are underway.

db.migrations

Those rewrites are coming along well, however. Last week I was at DjangoCon EU, in Warsaw, Poland, and I had a fantastic time, as you can read in my blog post about it. In particular, I had some good discussions with fellow core developers and Django and South users, to clear up some more thoughts I was having.

At the sprints, I got quite a bit more code implemented for db.migrations - as always, you can see the progress on my GitHub branch.

Most progress was on the "state" code and field freezing, so I'd like to discuss that.

State

The "state" part of db.migrations is the part which is responsible for the in-memory running of migrations to build correct versions of models.

In essence, it runs each of the actions in your migrations on fake versions of models (represented by a class called ModelState) in memory, and at the end it can then render those states into full models, to use for a data migration or pass to the schema migration functions.

The basic format is reasonably simple - there's just a class that represents a model, with attributes for all the things models can have, like their options (the things you put in Meta) and their name.

Fields, however, are more tricky. The problem South has faced since its inception is how you take a set of fields and serialise them - something that has finally been fixed.

The Good, The Bad and the Source Code Parser

You see, there's no way, given an instance of a Field, to tell how you reconstruct it. Sure, you can tell what class it is, and some values are obvious (like field.max_length), but getting the value that you passed in to a ForeignKey for its relationship is trickier.

The first versions of South solved this in a very simple way - they opened up your models.py file, read the source code, and chopped out the field definition using string manipulation. Needless to say, this was very fragile, and didn't work with any kind of conditional around fields.

The next (and currently shipping) approach was to inspect the fields' attributes using something called modelinspector. This was a built-in set of rules which South has about how to work out a fields arguments just by inspecting its attributes.

While this works well for core Django fields, there's no way of knowing how third-party fields work without shipping rules for them with South (which a few apps have) or by declaring them yourself when you declare the field.

The way these custom rules were declared was difficult to understand and not immediately obvious, and so there have been a lot of complaints with the current method about custom fields and South not really playing well together.

In particular, South wouldn't just accept a custom field even if it was a simple subclass - you had to tell South it was safe to use using a list of regular expressions on field path names. While it's worked till now, it's clearly not the best solution.

Introducting deconstruct()

The new solution is now in my branch - passing this responsibility onto the fields themselves. The API a field is required to provide has grown an additional function: deconstruct().

This function takes no arguments, and returns four arguments needed to recreate the field: its attribute name (what field name it was assigned to on the model), a path to import it from, positional arguments and keyword arguments.

The base implementation of this on Field is the most complex one and handles all the default arguments. New field classes will just need a simpler override, like the one for DecimalField, which adds on the new arguments.

I'll be writing up full documentation on this into the Django docs as part of my branch, but just keep in mind that all custom fields will need to provide this method soon, or they will not be usable with migrations. I plan to submit pull requests to a decent number of third-party apps that use custom fields with this method implemented for them, to help kickstart adoption.

Back to State

This all means that the state tracking can now work - it has methods to take either a model or a whole AppCache and turn it into a ModelState or ProjectState object, which can then rebuild models or AppCaches respectively.

This is what will power the autodetection - South will render the most recent version of the models it has, and compare them to the ones you currently have in your project. If there's any material database differences - a new field, a model has gone, db_table is changed - it will generate the appropriate migration.

Some changes don't affect the database, of course. verbose_name never touches the database, and much to people's surprise, neither does default - Django implements all defaults in Python rather than in the database, as otherwise there's no way to allow arbitary callables as a default value (something which is causing some pain doing serialising, let me tell you).

Context Managers

The other change that might effect users is that I've changed SchemaEditor to be a context manager, as suggested by a few people last week. That means that you now use it like so:

with connection.schema_editor() as editor:
    editor.create_model(Foo)
    editor.delete_model(Bar)

What's next?

Now that's all in place, the work of getting migrations to load from disk, create in-memory models and then run them through the schema editor is next - essentially, bringing together the past few weeks' work into a functioning whole.

Some of that code is already in place - a disk loader already reads classes from disk, and a recorder already has code to mark migrations as applied or not - but there's some more work in deciding the user interface for migrations in terms of commands.

Should the migrate command stay? Should it all be rolled into syncdb? Should they both go in favour of a third option? Some planning is needed. Any opinions are welcome, either via email or Twitter.