[section_title title=Overview]
I have been working on a django application that I had to deploy on a server with only MySQL. The database has evolved over time and for future preparation we finally installed PostgreSQL on the server. I am not going to go over why I decided to go with PostgreSQL because this article is focused on the actual migration. Below is some additional background and the requirements.
- The application does not use any MySQL specific functionality like enumerations.
- I only want to migrate the data (i.e. not the structure) because django is pretty good at defining the structure and corresponding constraints, and by migrating the structure I would lose some of this in translation.
- I will rebuild the migrated database on a separate machine and then copy it back to the final server.
- Note that by default django also populates some special tables when you run syncdb (e.g. auth_permissions). I am going to remove these rows and load the ones from source so I don’t affect any foreign key references.
- Along the way I will be cleaning up some data (e.g. django_session).
- I will use the MySQL to Postgres Migration Wizard by EnterpriseDB v1.1 tool to move data from MySQL to Postgresql.
To accomplish this, here is a summary of the steps (some kung fu required) that we will be performing.
- Dump the MySQL database and restore in a local MySQL instance.
- Remove any unnecessary data. This is where we will clean django_session and django_admin_log tables.
- Create the PostgreSQL database and using the EnterpriseDB migration tool do a full migration. FYI, the migration tool does not create any sequences and many of the constraints are not propagated either.
- Once the migration is complete you will have a separate schema in the PostgreSQL database.
- Dump only the data (i.e. no structure) with complete inserts.
- Run the django-admin utility to create the PostgreSQL-specific tables in the ‘public’ schema.
- Note that syncdb will also populate the auth_permission and django_content_type tables, which are also in the dump that we created. So, we are going to delete the new ones and stick to the ones in the dumps.
- Edit the dump file…
- Rename the scheme to load to ‘public’ schema.
- Encapsulate in a transaction and add the deferred constraint checks.
- Source the edited file into the ‘public’ scheme (default).
- Drop the unnecessary schema.
- Update the sequences because sourcing does not update any sequences.
- Export the database so we can reload it on the final, destination server.
- Load it on the remote server.
Whew! Only after all these steps will we meet our objectives and have a well defined PostgreSQL database with the existing data from MySQL 🙂
Note, along the way you might encounter some issues because MySQL isn’t very strict on checking its constraints. For instance, earlier in the development, if you started out by allowing a column to be null, but later decided that this shouldn’t have been the case, then, unless if you remembered to convert the existing nulls appropriately, the earlier rows will still have those columns as null, which, obviously, PostgreSQL isn’t going to like. So, the local MySQL instance will give you a chance to correct these without impacting any running applications.