[section_title title=Details]
Let’s dive in and get this done. This is tailored towards my specific needs so if your needs are different then feel free to jump around. Also, note that I have left all of these instructions on a single page so you can easily reference back without having to go back and forth on multiple pages.
1. Dump the MySQL database and copy to local machine.
1 2 |
mysqldump database_name -u username > database_name.mysqldump scp user@host:/home/user/temp/database_name.mysqldump . |
2. Create the new, local MySQL database and load the data.
1 2 3 |
CREATE DATABASE database_name; USE database_name; SOURCE database_name.mysqldump |
3. Clean the database.
1 2 |
DELETE FROM django_session; DELETE FROM django_admin_log; |
4. Create the new postgres database.
1 2 3 4 |
psql \c postgres DROP DATABASE database_name; CREATE DATABASE database_name WITH ENCODING='UTF8' OWNER=owner_username; |
5. Using the MySQL to Postgres Migration Wizard by EnterpriseDB v1.1 tool do a full migration.
Unfortunately, this tool doesn’t come in a pre-compiled package so you have build it yourself. If you don’t have a Java development environment setup then you will need to install JDK and setup ant. Once you have these two things set up then build and run the migration tool.
1 2 |
ant dist java -jar dist\MigrationWizard.jar |
Follow the wizard and do a full migration. FYI, this migration will not create any PostgreSQL sequences. Also, the tool will create the new tables in a new schema based on the name of the database that you are importing, and not the usual ‘public’ schema.
6. Dump only the data (i.e. no structure) with complete inserts.
pg_dump database_name -n database_name --data-only --inserts --column-inserts > database_name.pgdump
7. Now we are going to edit a few things in this dump file.
Edit the dump file and rename the schema to load to ‘public’ schema, instead of the new schema that the migration tool created. To do this, change…
1 |
SET search_path = database_name, pg_catalog; |
…at the top of the dump file to…
1 |
SET search_path = public, pg_catalog; |
In addition to this change, we are going to add deferred constraint checks and encapsulate the complete load in a transaction so we can run this again easily if something goes wrong. Add the following to the top of the file.
1 2 |
BEGIN; SET CONSTRAINTS ALL DEFERRED; |
Lastly, add the following to the end of the file.
1 |
COMMIT; |
8. Now that we have the data-only dump how we like it, let’s go ahead and create the structure.
Note that by default django will create the tables in the ‘public’ schema, which is what we want.
1 |
django-admin.py syncdb |
9. Let’s remove a few unneeded things from the new schema.
syncdb
will populate the auth_permission
and django_content_type
tables, which are also in the dumps. So, we are going to delete the new rows and stick to the ones in the dumps (the update sequence code at the end accounts for this).
1 2 |
DELETE FROM auth_permission; DELETE FROM django_content_type; |
10. Load the data into the ‘public’ schema (default).
1 |
psql -d database_name -f database_name.pgdump |
After this we will be left with the usual ‘public’ schema.
11. Drop the now unneeded schema.
1 |
DROP SCHEMA database_name CASCADE; |
12. Finally, we are going to update the sequences.
I am assuming that there is only one sequence per table. Furthermore, I use vim so this step is based on its functionality. You can modify this to match your favorite editor.
Instead of manually writing the update statements we are going to get a list of objects from the psql
command line client and then use a regex to create the SQL statements for us. Using ‘psql’ connect to your new database and get a list of the objects using ‘\d’ command. This will produce something like the following.
Schema | Name | Type | Owner --------+-------------------------------------+----------+------------- public | auth_group | table | thebitguru public | auth_group_id_seq | sequence | thebitguru public | auth_group_permissions | table | thebitguru public | auth_group_permissions_id_seq | sequence | thebitguru public | auth_message | table | thebitguru public | auth_message_id_seq | sequence | thebitguru public | auth_permission | table | thebitguru public | auth_permission_id_seq | sequence | thebitguru public | auth_user | table | thebitguru public | auth_user_groups | table | thebitguru public | auth_user_groups_id_seq | sequence | thebitguru public | auth_user_id_seq | sequence | thebitguru public | auth_user_user_permissions | table | thebitguru public | auth_user_user_permissions_id_seq | sequence | thebitguru
Make sure that every table is followed by the corresponding sequence. As you can see in the output above, this isn’t the case for auth_user* objects. Also, remove any tables that don’t have any sequences (e.g. django_session).
Once you have those rearranged and removed then massage the remaining text so it likes the following.
auth_group auth_group_id_seq auth_group_permissions auth_group_permissions_id_seq auth_message auth_message_id_seq auth_permission auth_permission_id_seq auth_user auth_user_id_seq auth_user_groups auth_user_groups_id_seq auth_user_user_permissions auth_user_user_permissions_id_seq
Once you have this alternating table/sequence text then using the following search/replace regex merge these into SQL statements that will update the sequences.
:%s/^\(.*\)$\n\(.*\)$/SELECT setval('\2', (SELECT MAX(id) FROM \1));/gc
Encapsulate these in a transaction (BEGIN/COMMIT), if you prefer, and you will end up with something like the output below.
1 2 3 4 5 6 7 8 9 |
BEGIN; SELECT setval('auth_group_id_seq', (SELECT MAX(id) FROM auth_group)); SELECT setval('auth_group_permissions_id_seq', (SELECT MAX(id) FROM auth_group_permissions)); SELECT setval('auth_message_id_seq', (SELECT MAX(id) FROM auth_message)); SELECT setval('auth_permission_id_seq', (SELECT MAX(id) FROM auth_permission)); SELECT setval('auth_user_id_seq', (SELECT MAX(id) FROM auth_user)); SELECT setval('auth_user_groups_id_seq', (SELECT MAX(id) FROM auth_user_groups)); SELECT setval('auth_user_user_permissions_id_seq', (SELECT MAX(id) FROM auth_user_user_permissions)); COMMIT; |
Execute this SQL and all your sequences will be updated to the correct index.
13. Finally, dump the new database with the new structure and old data, copy it to the final destination machine and restore.
pg_dump database_name -f database_name.pgdump
Followed by…
scp user@host:/home/user/temp/database_name.pgdump . psql -d database_name -f database_name.pgdump
Conclusion
There you have it. I am always interested in improvements so if you have any suggestions for improvements then please let me know (by leaving a comment or emailing me directly). I hope I don’t have to do this again, but who knows 🙂