Written By: Steve Zagieboylo, Senior Architect at Calavista


In my last blog, I talked about how we estimate new projects, and I included the offhand comment that Data Migration is always harder than you think it will be. The purpose of this blog is to provide a few examples of why I find this to be the case.

What is Data Migration?

Data Migration, in the context of web projects, is needed when you are making a major change to the product, such that the data model has significantly changed. Tools like Evolve or Liquibase aren’t going to cut it, because those tools are meant for point changes to the application, with small changes to the data model that you could describe in a couple of commands. You need a migration if you are going to be switching your users over to a completely rearchitected version of the application. It’s still the same tool your customers are paying for, but with the major revision you’ve fixed those areas that are clunky, and (hopefully) simplified the abstraction to make lots of new functionality possible.

However, you still have a ton of data that needs to be transferred to the new model. Your customer wants everything that they already appreciate about your product to remain. Of course, the most important part of that, to them, is their data. If any of their data is lost, it won’t matter how much better the new version of your application is, they will be unhappy with it. Getting the migration right, therefore, is an absolute requirement to keeping your customers happy.

Practice, Practice, Practice!

Specific challenges are addressed below, but the first and most important takeaway from this article is that you need to have practiced the migration step dozens of times before you do it for real. If you haven’t done several complete migrations on a true copy of production data, and successfully brought up the new system with the migrated data, then you aren’t ready to try it on production. You need to start with a staging system that is functionally identical to your production system, including versions of peripheral elements and some horizontal scaling of the pieces that scale horizontally. (You can have just three instances instead of nine, but if there are multiple instances in production, you need to test with multiple instances.) You should replicate the entire process on the staging server, all the way from shutting it down (with some active users at the time being politely booted) to switching the DNS entries so that the next users will log on to the new system.

Testing, Testing, Testing!

Part of the importance of doing all the practices is the testing that goes along with it. Identify what the key indicators are that represent correct data and automate the comparisons. For instance, if you have Customers who each have Orders, you should be able to get the count of Orders for each Customer, sorted by customer name on both systems, and the arrays should match exactly. It’s no good to migrate all your data if it introduces inaccuracies in the data when you do it.

Challenges

Data Model Mismatch

There are lots of ways in which the data model can be a mismatch, and the approach to each is very different.

1. Indirection Layer Added.Perhaps a many-to-one relationship is becoming many-to-many, or references are being bundled so they can be changed together. You’ll need code to create the new layer of indirection.

2. Data Being Collapsed. You had a hack in the old system where you made extra records because it couldn’t account for some more complex entities that your new data model can represent. There are two stumbling blocks here. First, make sure you’re not double-processing your complex records, once for each sub-part you run into. Second, make sure you compensate for the differences in record counts when you do your testing.

3. Less-used Columns Abstracted. You may have made a number of columns in a table that are only used in certain cases, and you find yourself making more and more of these. Instead, you roll them all into a separate table of named properties.

4. Special Cases Abstracted. Special cases, such as dependency requirements, which used to exist in the code with hard-coded references are now abstracted in the data. Not only do these need to be captured correctly, but old data that didn’t meet the dependency but was “grandfathered in” now has to be handled. (This is a very real and very painful issue that came up for us recently. Our solution was a spreadsheet import that referenced specific records by ID that had to be modified as they were transferred.)

5. Old Data no Longer Tracked. In a complete rewrite of a client’s product, there were some old features which were no longer supported, but we did not want to lose the old data associated with them. It would only be referenced by internal users in unusual circumstances, but it shouldn’t disappear. Our solution was to dump it into a text file and keep it as ‘attached documentation,’ a field we had anyway.

 

Sheer Quantity of Data

You will need to know how long it will take to migrate the data to know how long you’ll be shutting down theData Migration Anecdote system. Of course, by the time you’re doing it for real, you’ll have practiced the whole process many times, and you’ll know exactly. But before you get there, you might want to perform an experiment or two to have an idea how long it will take. This might surprise you, and it may affect the entire approach. (See the sidebar.)

If you find that your migration is too slow, there are some tricks you can do.

  • Bigger Transactions. If you are wrapping your rows processed each in a transaction, you are spending a lot of time starting and stopping transactions. Instead, bundle a group of rows into a single transaction. You don’t want to get too large, though, because one error means the whole transaction needs to be rolled back. A bundle of 50 or 100 will mean that the transaction overhead is small compared to the real work.
  • Multi-Process. If you are reading from one database, doing work, then writing to another, all done very serially, there are two ways you can process in parallel that are pretty easy to do. First, if your data is pretty independent, you can break it up according to the key of the source data and kick off several threads, each of which are assigned a block of the source data. Second, and this turns out to be easier than you think, is to read and process in one thread and then write in another. Either way, the goal is to make the database the bottleneck (typically the one you’re writing will max out first), rather than your processes. Be careful, because you don’t want to create too many threads, all eating up memory, when all they are doing is waiting for their chance to write to the database. You really only need two, so that one thread is reading from A while the other is writing to B and vice-versa. If there is a fair bit of processing, perhaps use three threads.

 

Corrupted Data

If you’re migrating from a system that has lived and evolved for several years, don’t assume that the data is perfect. It isn’t. Minor errors have crept in, from failed operations that weren’t properly in a transaction that would roll back; from hand-editing the SQL; from requirements changes on the data, where old records are being ignored; from who knows what. Your migration had best not choke and die on some corrupted data. Your first full pass through the data will probably die, and your second, and … That’s why we practice the migration several times.

 

Data Changing While Migration is Occurring

If you do not have the luxury of shutting down the application while you are migrating the data and moving to the new system, then you must deal with the possibility of data changing after the migration has started. This is not too bad if you have immutable data that is all time-stamped, but if you had thought that hard about your data model before, would you really be migrating off of it? There is no one-size-fits-all answer here, but consider these approaches.

  • Take a backup, restore to another instance, and migrate from that. At least then you’ll have a clean snapshot to work from and a very clear timestamp.
  • Try harder to convince the bosses that access to the system can be shut down from 3 to 6 am on Sunday morning. It’s a horrible weekend for you, but it’s better than data loss.
  • If all data-changing events are done through an event bus, you can replay the feed (probably through its own translation to the new system), but if you had that architecture, I wouldn’t have to tell you.
  • A poor man’s version of the last point: Add to the old system code that writes all the changes to a file at the same time it is saving it, such that you can read the file back and replay it. Combined with the first bullet point, this can give you a safe transfer.

 

DevOps

In addition to the coding effort for the migration, there’s going to be a big DevOps challenge for coordinating the migration and then the entire changeover from the old production system to the new one. Part of the practice has to include automating the entire process, from creating the database instances, firing up the migration code with the correct accesses to old and new databases, executing it, firing up the new application code, and switching DNS or load balancer to direct new connections to the new code. If any of these are being done manually, make sure there is a checklist, and stick to it religiously in your practices as well as, of course, in the real thing. In the final practices, don’t let the DevOps person do any task that isn’t on the checklist, no matter how obvious and everyday the task is.
 

Summary

When you’re making a major change to an entirely new version of your web application, you are already going to have some users who metathesiophobic (afraid of change — I had to google it.) so they are already likely to grumble. The last thing you can afford is to have data issues as well. You need to budget lots of extra time to make sure that the data migration goes smoothly, and you need to practice the whole process, end to end, several times before the D-Day when you do it on the production system. Have a well-rehearsed process governed by a checklist that is so complete anyone could follow it. And GOOD LUCK!