buy the book ribbon

Appendix C: Testing Database Migrations

Django-migrations and its predecessor South have been around for ages, so it’s not usually necessary to test database migrations. But it just so happens that we’re introducing a dangerous type of migration—​that is, one that introduces a new integrity constraint on our data. When I first ran the migration script against staging, I saw an error.

On larger projects, where you have sensitive data, you may want the additional confidence that comes from testing your migrations in a safe environment before applying them to production data, so this toy example will hopefully be a useful rehearsal.

Another common reason to want to test migrations is for speed—​migrations often involve downtime, and sometimes, when they’re applied to very large datasets, they can take time. It’s good to know in advance how long that might be.

An Attempted Deploy to Staging

Here’s what happened to me when I first tried to deploy our new validation constraints in [chapter_deploying_validation]:

$ cd deploy_tools
$ fab deploy:[email protected]
[...]
Running migrations:
  Applying lists.0005_list_item_unique_together...Traceback (most recent call
last):
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py",
line 61, in execute
    return self.cursor.execute(sql, params)
  File
"/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py",
line 475, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: columns list_id, text are not unique
[...]

What happened was that some of the existing data in the database violated the integrity constraint, so the database was complaining when I tried to apply it.

In order to deal with this sort of problem, we’ll need to build a "data migration". Let’s first set up a local environment to test against.

Running a Test Migration Locally

We’ll use a copy of the live database to test our migration against.

Be very, very, very careful when using real data for testing. For example, you may have real customer email addresses in there, and you don’t want to accidentally send them a bunch of test emails. Ask me how I know this.

Entering Problematic Data

Start a list with some duplicate items on your live site, as shown in A list with duplicate items.

This list has 3 identical items
Figure 1. A list with duplicate items

Copying Test Data from the Live Site

Copy the database down from live:

$ scp [email protected]:\
/home/elspeth/sites/superlists.ottg.co.uk/database/db.sqlite3 .
$ mv ../database/db.sqlite3 ../database/db.sqlite3.bak
$ mv db.sqlite3 ../database/db.sqlite3

Confirming the Error

We now have a local database that has not been migrated, and that contains some problematic data. We should see an error if we try to run migrate:

$ python manage.py migrate --migrate
python manage.py migrate
Operations to perform:
[...]
Running migrations:
[...]
  Applying lists.0005_list_item_unique_together...Traceback (most recent call
last):
[...]
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: columns list_id, text are not unique

Inserting a Data Migration

Data migrations are a special type of migration that modifies data in the database rather than changing the schema. We need to create one that will run before we apply the integrity constraint, to preventively remove any duplicates. Here’s how we can do that:

$ git rm lists/migrations/0005_list_item_unique_together.py
$ python manage.py makemigrations lists --empty
Migrations for 'lists':
  0005_auto_20140414_2325.py:
$ mv lists/migrations/0005_*.py lists/migrations/0005_remove_duplicates.py

Check out the Django docs on data migrations for more info, but here’s how we add some instructions to change existing data:

lists/migrations/0005_remove_duplicates.py
# encoding: utf8
from django.db import models, migrations

def find_dupes(apps, schema_editor):
    List = apps.get_model("lists", "List")
    for list_ in List.objects.all():
        items = list_.item_set.all()
        texts = set()
        for ix, item in enumerate(items):
            if item.text in texts:
                item.text = '{} ({})'.format(item.text, ix)
                item.save()
            texts.add(item.text)


class Migration(migrations.Migration):

    dependencies = [
        ('lists', '0004_item_list'),
    ]

    operations = [
        migrations.RunPython(find_dupes),
    ]

Re-creating the Old Migration

We re-create the old migration using makemigrations, which will ensure it is now the sixth migration and has an explicit dependency on 0005, the data migration:

$ python manage.py makemigrations
Migrations for 'lists':
  0006_auto_20140415_0018.py:
    - Alter unique_together for item (1 constraints)
$ mv lists/migrations/0006_* lists/migrations/0006_unique_together.py

Testing the New Migrations Together

We’re now ready to run our test against the live data:

$ cd deploy_tools
$ fab deploy:[email protected]
[...]

We’ll need to restart the live Gunicorn job too:

elspeth@server:$ sudo systemctl restart gunicorn-superlists.ottg.co.uk

And we can now run our FTs against staging:

$ STAGING_SERVER=staging.ottg.co.uk python manage.py test functional_tests
[...]
....
 ---------------------------------------------------------------------
Ran 4 tests in 17.308s

OK

Everything seems in order! Let’s do it against live:

$ fab deploy:host=superlists.ottg.co.uk
[superlists.ottg.co.uk] Executing task 'deploy'
[...]

And that’s a wrap. git add lists/migrations, git commit, and so on.

Conclusions

This exercise was primarily aimed at building a data migration and testing it against some real data. Inevitably, this is only a drop in the ocean of the possible testing you could do for a migration. You could imagine building automated tests to check that all your data was preserved, comparing the database contents before and after. You could write individual unit tests for the helper functions in a data migration. You could spend more time measuring the time taken for migrations, and experiment with ways to speed it up by, for example, breaking up migrations into more or fewer component steps.

Remember that this should be a relatively rare case. In my experience, I haven’t felt the need to test 99% of the migrations I’ve worked on. But, should you ever feel the need on your project, I hope you’ve found a few pointers here to get started with.

On Testing Database Migrations
Be wary of migrations which introduce constraints

99% of migrations happen without a hitch, but be wary of any situations, like this one, where you are introducing a new constraint on columns that already exist.

Test migrations for speed

Once you have a larger project, you should think about testing how long your migrations are going to take. Database migrations typically involve downtime, as, depending on your database, the schema update operation may lock the table it’s working on until it completes. It’s a good idea to use your staging site to find out how long a migration will take.

Be extremely careful if using a dump of production data

In order to do so, you’ll want fill your staging site’s database with an amount of data that’s commensurate to the size of your production data. Explaining how to do that is outside of the scope of this book, but I will say this: if you’re tempted to just take a dump of your production database and load it into staging, be very careful. Production data contains real customer details, and I’ve personally been responsible for accidentally sending out a few hundred incorrect invoices after an automated process on my staging server started processing the copied production data I’d just loaded into it. Not a fun afternoon.

Comments