Database Migrations
Desk of Contents
I contemplate database migrations one of the annoying issues to take care of throughout a software program engineer’s life. Not solely that, if it goes improper, because it typically does, individuals are likely to develop anxiousness associated to any schema adjustments.
So why is it so annoying? It appears easy at first look however is deceptively difficult once you begin fascinated with it.
My main framework of alternative is the Python-based Django, which launched schema migrations over a decade in the past and which I nonetheless contemplate probably the greatest migration engines that I’ve seen. (and I labored with many various frameworks throughout many languages).
However even with a superb framework that routinely generates migration information, shops the migration references within the database, and permits for simple rollbacks — there are a ton of issues to think about once you do database migrations.
Think about a novel strategy for splitting up the ‘title’ area into ‘first_name’ and ‘last_name’ within the person Desk. All goes high quality. You do the schema migration, run the migration script of splitting previous knowledge into two fields, and deploy the most recent API that works with these adjustments. Nonetheless, one thing goes improper with the brand new fields. Your monitoring exhibits many distinctive customers with failed saved resulting from some validation error, e.g., not all characters are being accepted. This difficulty is deemed crucial, so that you and the workforce determine to do a rollback. Sounds good. You revert the schema migration and revert the app to the earlier state, however you then discover that writes had been taking place to the brand new fields throughout that small window of deployment, and now some customers have lacking knowledge within the title area, which ends up in knowledge inconsistency.
And now add extra complexity with the requirement of doing all that with zero downtime.
???? This instance is hypothetical, however what I’m attempting to say is that database migrations are advanced issues that needs to be approached with a multi-step course of. I can also relate to those that actually don’t need to strategy such issues — it’s a tedious course of the place you should confirm the information consistently.
So I’ve talked about that the information migrations are laborious, and no one needs to work with them, however why? Listed here are a number of causes:
- When creating a product, you possibly can solely see a number of months, perhaps a 12 months upfront, how the software program will evolve and how one can put together for that. A 12 months into the longer term, a product proprietor may come to you and say, “Okay, our monetary app is not based mostly on transactions; every thing is a subscription,” — which is clearly an enormous database migration to suit the case (or construct workarounds).
- Doing migrations is like working with reside wires. You may have a brand new Lamp that you should cling on the ceiling, however you’re doing that with out turning off electrical energy. I took the Lamp for example as a result of I hung one yesterday, so it got here to thoughts.
- Each migration that you simply’re implementing (and sure, you should consider migrations as code, not simply database adjustments) must work with three totally different situations:
- Upgrading (Migrating up) – new characteristic will get constructed, knowledge mannequin will get added/modified/eliminated. The brand new and previous software variations are nonetheless anticipated to perform as anticipated.
- Downgrading (Migrating down) – One thing went improper, there are knowledge inconsistencies — you want a means to return to the earlier secure state in a managed method. Not with handbook adjustments within the DB.
- Every thing in between – that means all the information transformation logic must be taken care of. Although these days, there are various methods to do the information transformation over extra prolonged intervals of time with twin writes or an idea of “ghost tables” with the assistance of the gh-ost library from Github; we are going to speak about this later.
- Information migrations are additionally problematic as a result of it’s normally not a one-person job. The larger the information mannequin change, the extra individuals needs to be concerned. It could be finest to have totally different individuals on standby when deploying, prepared to leap in WHEN issues go improper. I wrote when as a result of it’s extremely doubtless that one thing won’t go as deliberate on not less than one of many steps.
Easy Deployment
Right here’s essentially the most easy strategy which you could take with deploying a brand new characteristic that has database adjustments in it, that’s, in case you’re sufficiently small and may enable your self to have a number of seconds of downtime:
- Push your code to Bitbucket/Github/Gitlab.
- Deployment will get triggered.
- New docker containers get constructed
- Database migrations and all of the associated scripts are run
- Docker containers are restarted on the server
I’ve seen a variety of hate for this straightforward strategy, however I’ve obtained to say, it’s high quality. The deployment doesn’t must be a headache till you attain a stage the place zero downtime is business-critical.
The strategy above is totally legitimate if:
- You may have a single software occasion.
- You possibly can enable your self a number of seconds of downtime.
- You’ve already examined the migrations on staging.
This isn’t a legitimate strategy if:
- You’re working a number of app cases, which could lead to a race situation for the migrations and an invalid database state.
- You may have numerous knowledge and want to rework it — which can block the deployment course of or probably timeout.
- Downtime will not be an possibility.
Now that we’ve cleared up essentially the most easy strategy on deploying the adjustments, let’s take a look at some totally different migration situations you’ll need to take care of throughout your profession as a software program engineer.
Situations
Including a New Area
That is the easiest case, as including a area to a database is mainly a no-op from the angle of your app and mustn’t influence the present logic in any means. The brand new area will probably be accessed and used solely as soon as the brand new software will get deployed.
After including the sector to your ORM, generate an immutable migration script. Check this script regionally to make sure it behaves as anticipated. Forwards and backward migrations needs to be examined. Don’t overlook so as to add fixtures to your assessments on your new knowledge.
If the brand new area is non-nullable, present a default worth. That is essential to keep away from points with present data that will not have this area populated. If logic is concerned, e.g., the sector is an mixture of different fields — add the logic to the migration script. (I point out beneath why this idea solely works for databases with low quantity ~ a number of million rows and gained’t work for giant databases.)
Give attention to the 2-phase deployment strategy. First, deploy the database adjustments. As including a brand new area mustn’t break any adjustments in your app, this could run easily even with the present model of the app. When you’re assured that the migration has been profitable and hasn’t launched any points, deploy the applying adjustments.
After deployment (additionally earlier than), monitor the applying and database efficiency. Look out for any surprising behaviors associated to your new area.
Now, let’s return to the non-nullable fields with defaults. This is why default values will be notably problematic for giant datasets:
- Speedy replace of all rows – when you add a brand new column with a default worth, most DBs might want to replace each row with that worth. With billions of rows, which may take some time.
- The “ALTER TABLE” command requires a “LOCK” on every desk so long as the migration operation will not be completed. (For the reason that newest model of Postgres, that is not true for including new fields with default, however nonetheless true for altering fields, because it updates the metadata.) This LOCK is unique and doesn’t enable extra writing on the relations being modified.
- You probably have replicas and billions of rows – they’ll begin lagging behind.

Eradicating a Area
Now we’re getting slightly into more complicated stuff, as you’re eradicating one thing already getting used within the software, so the strategy right here is considerably totally different.
Step one occurs lengthy earlier than the database migration occurs. Marking/highlighting the code that makes use of the “field-to-be-removed.” After all of the locations are marked, begin with phasing out using that area within the software — first, with commenting out.
???? (Elective) Earlier than bodily eradicating a area, contemplate if the information in that area is likely to be wanted sooner or later. In that case, archive this knowledge in a distinct location.
If, within the first instance, we deployed the database adjustments first, on this case, we start by deploying the applying adjustments that not use the sector. Guarantee the applying is secure with commented-out use of the deleted area. Then, deploy the database migration to take away the sector. This two-step course of ensures no makes an attempt to entry the sector after it is eliminated.
Our focus is to keep away from knowledge inconsistency in any respect prices, so in case you deploy the applying adjustments and see the sector nonetheless getting up to date — you missed one thing.
Altering Area with enterprise logic connected
Now we’re attending to the great things. Altering a area intertwined with enterprise logic is among the most intricate migration situations. Related circumstances embody splitting a area from one desk into a number of fields in a brand new desk and transferring the information to a distinct database altogether. All of those circumstances are comparable in how they have to be dealt with.
These are in all probability the circumstances that individuals attempt to keep away from and discover workarounds. The reason being — the implications of such adjustments can ripple via varied (unknown to you) elements of the applying. That’s why this case:
- needs to be dealt with as a workforce
- needs to be dealt with with a dual-write logic
- needs to be dealt with with multi-phased deployment (that helps preserve every thing working with zero knowledge inconsistency)
Earlier than any coding is completed — perceive the complete scope of the change. Determine all related learn paths within the software and mark them as soon-to-be deprecated.
Refactoring all code paths the place we mutate subscriptions is arguably essentially the most difficult a part of the migration. Stripe’s logic for dealing with subscriptions operations (e.g. updates, prorations, renewals) spans hundreds of traces of code throughout a number of companies.
The important thing to a profitable refactor will probably be our incremental course of: we’ll isolate as many code paths into the smallest unit doable so we will apply every change fastidiously.
Our two tables want to remain in line with one another at each step.For every code path, we’ll want to make use of a holistic strategy to make sure our adjustments are protected. We are able to’t simply substitute new data with previous data: each piece of logic must be thought of fastidiously.
– Article Stripe Subscription Migrations, hyperlink beneath
As you possibly can see from the Stripe instance, a variety of preparation occurs earlier than the migration begins. The code paths are recognized and refactored to help twin writes/reads.
What dual-write migration means will be primarily break up up into these steps:
- Add the brand new area to the database (Zero influence on the working code).
- Deploy new, refactored software code, the place you begin writing to each previous and new fields, with the corresponding new enterprise logic utilized. Studying continues to be performed from the previous path. Writing to each fields should occur as a part of a single Transaction.
- Evaluate the information and ensure it’s constant.
- Write migration code that transforms the remainder of the information from the previous area into the brand new area within the appropriate format. (Or use gh-ost from Github)
- Deploy the migration and alter the learn path to the brand new area. The write path continues to be to each fields.
- Confirm the applying and the information consistency.
- Take away writing to the previous area. At this level, studying/writing occurs completely within the new area. The previous area nonetheless exists within the database however ought to obtain no writes.
- Confirm the applying and the information consistency.
- Take away any code associated to the previous area.
- Confirm the applying and the information consistency.
- Deploy migration script to drop the column from the database.
- Shake fingers together with your teammates.
I hope this detailed breakdown helps you visualize the move of executing advanced database adjustments. Every step will be rolled again individually, providing larger stability than making all adjustments directly. This strategy not solely reduces the chance of errors but additionally ensures that if errors do happen, they are often rectified with none knowledge loss or inconsistencies.
Cell app + database migrations
Whereas researching the subject, I discovered a speak about how DoorDash split their Postgres database into a number of smaller ones, they usually urged a distinct strategy (A variation of dual-writing)
As DoorDash is a cell software, the previous variations of the app are nonetheless in use once they launch a brand new model. Therefore, a database migration turns into a giant difficulty as you have to be backward suitable with the information coming from many older app variations.
They tried totally different variations of dual-write and a shocking third strategy:
- API Primarily based Twin Write – API wrapper across the previous Service and the brand new Service
- Database-Primarily based Twin Write – Identical API, however writes/reads are triggered to 2 totally different databases
- New app model + New Endpoint + New Database – New Software model with a distinct endpoint that reads/writes to totally different databases based mostly on previous/new logic. This may be known as an App-Degree Twin-write, as the brand new app model defines the place the previous/new knowledge will go.

I discover it fascinating how different corporations do advanced database adjustments, so I might extremely counsel studying some extra articles on this matter:
- Stripe Case: Migrating Subscriptions
- Fb Case: Migrating Messenger
- Gusto Case: Migrating Waivers
- Field Case: Moving from HBase to Google BigTable
Zero downtime
Let’s discuss extra about zero downtimes.
Not all functions, particularly these serving a world viewers, have the posh of a upkeep window. For world platforms like Google, Fb, LinkedIn, and Netflix, there is not any “off-peak” time. The solar by no means units on their person base, making any downtime detrimental to person expertise and income. If one in all these goes down – it’s normally the #1 story on HackerNews.
I’m generally stunned by emails from fee gateway suppliers stating they’ll have a upkeep window subsequent week. That’s mainly telling your clients you’ll not be getting any cash throughout this time… sorry for that.
I’m attempting to say that the larger you’re, the extra business-critical your service turns into, the much less upkeep time you get. If there’s no upkeep window – zero downtime is the one means. Sure, they typically take longer as a result of they’re deployed in a number of phases to make sure that at no level is the service interrupted. However this phased strategy, whereas extremely time-consuming, is a needed trade-off to make sure service continuity and, extra importantly, knowledge consistency.
Nevertheless, not the entire companies are enterprise crucial, and the first purpose for downtime for non-Google stage corporations is the failure to take care of code backward compatibility with their new adjustments and long-running db migrations.
In the event you don’t have large groups of SREs that can assist you together with your deployments, I’d counsel making your life only a tiny bit higher with some instruments that enable on the spot, non-blocking schema adjustments in addition to ghost tables for knowledge migrations:
- in case you’re on MySQL — think about using framework agnostic gh-ost from Github or MySQL Online DDL or pt-online-schema-change or Fb OnlineSchemaChange which works in a similar way to gh-ost.
- in case you’re on Postgres and are utilizing Django — check out the django-pg-zero-downtime-migrations or Yandex zero-downtime-migrations additionally listed here are some HN comments concerning the library.
- There’s additionally SchemaHero — an open-source database schema migration device that converts a schema definition into migration scripts that may be utilized in any atmosphere.

Conclusion
Let’s wrap up the article with some finest practices:
- No handbook database adjustments. All the time generate immutable migration scripts.
- Database model needs to be contained within the database itself. (Django does this routinely)
- In the event you don’t have upkeep home windows — give attention to the dual-write course of.
- When constructing options with vital database adjustments – consider backward compatibility and correct abstractions.
- Think about using the most recent instruments to make your migration life simpler.
- What else? Recommend some within the feedback 🙂
References:
https://gist.github.com/majackson/493c3d6d4476914ca9da63f84247407b
https://pankrat.github.io/2015/django-migrations-without-downtimes/
https://enterprisecraftsmanship.com/posts/database-versioning-best-practices/
https://habr.com/ru/articles/664028/
https://www.martinfowler.com/articles/evodb.html
https://dev.to/kite/django-database-migrations-a-comprehensive-overview-5dk8
https://teamplify.com/blog/zero-downtime-DB-migrations/
https://news.ycombinator.com/item?id=22698154
https://news.ycombinator.com/item?id=16506156
https://news.ycombinator.com/item?id=27473788
https://news.ycombinator.com/item?id=19880334
https://stackoverflow.com/questions/1482738/how-is-database-migration-done
Different Newsletter Points:
Reactions
New!
In the event you’re discovering my articles useful, contemplate sharing it with mates, or buying me a coffee.
Additionally, ask me anything in a personalised query, if you would like my experience.