For the last few years I’ve been on a journey of breaking down a monolith application. Quite often monolithic and legacy applications are subject to a painfully slow release cycle. It’s quite common for products to have a bi-annually or monthly release cycle, even with frequent daily releases there are still some glaring issues with a scheduled deployment cycle.
To set the scene: An application I’ve been working with for a couple of years has two sets of deployment cycles, running across two branches in a git repository: Beta and Live. Beta sits under a subdirectory for our product and Live is the version of the application that most of our users use. We use Beta as kind of a testing grounds for new code and bug fixes. Bug fixes are often then cherrypicked to Live after reaching the end of the Beta deployment cycle.
The first is a daily deployment that happens Monday — Friday, based off the Beta branch and the Live branch. This is a managed deployment with a representative roster from the development teams. The code being deployed everyday is the results of a 3 day cycle:
- Code pushed to the repository on day 1.
- A release candidate placed into a test environment on day 2.
- The release on day 3.
The release will account for all the code in the Beta branch by COB day 1.
The second deployment is a monthly release where the Live rebased onto a commit in the Beta branch the week before.The code is then deployed midweek. This requires release notes, a fair bit of developer organisation and a representative from each team during the ‘Live deployment’.
I’m aware that daily deployments for such a large SAAS product are somewhat uncommon and there are devs that would kill for deployments this frequent, but, there are still issues associated with developing this way that I think are worth discussing.
Issue #1: Bugs can have a long and eventful life.
Imagine you have found a bug in live-production on a Wednesday where a component crashes the submit page when certain requirements are met. The bug has been there for a few days now so it’s too late to rollback the deployment. What you would do: Write, review and commit the code for the fix to the Beta branch. On Friday the fix is deployed to Beta-production and you cherrypick the fix to the live branch. By Tuesday the next week the bug is now fixed in production, a full 6 days after the bug was discovered.
Of course depending on the developer experience and severity of the bug, certain shortcuts might be taken. Lets say its a very-severe bug found sometime in the morning and the developer adds the change straight into the Live branch: The developer checks in his code and rebuilds the release candidate (Time permitting.). There will still be almost 20 hours between finding the potentially data-corrupting bug and actually releasing a fix. In this time we are looking at potentially hours if not days of data being created that will need correcting. This is the shortest turnaround our daily deployment can afford us.
Issue #2: New development can suffer when environments don’t match up.
Back I first started working on the application the environments in which Development and Production ran on were very different, there were a few times where I would deploy core pieces of functionality only to have it fail spectacularly in production because x regkey wasn’t present or y cookie doesn’t quite work with that domain, etc. This means that the project I was currently working on was pushed back a full 2 days further than where we were expecting to be.
This was an issue that we solved a while ago by gradually making our development environments almost identical to the production systems (Including distributed services). The pain was still there at the start and I’m sure there are plenty of other developers in the same situation.
Issue #3: Commits can stack up and increase the risk of deployments very quickly.
The number of commits to our Monolith repository can range from zero to well over 20 commits a day. With each commit the risk of the next days deployment having a bug that causes a deployment to be rolled back increases. In a system with as much complexity and as many years of legacy as some monoliths, bugs are almost inevitable.
Ok so lets say a severe bug is found, the deployment is rolled back to the previous days deployment and then the release candidate for that morning is rebuilt with a bugfix. But wait, we may not have detected every bug from this mornings release and we have another full days worth of commits also in this release. The problem then begins to snowball with more and more risk.
Issue #4: A deployment rollback can have a huge impact
Rolling back a deployment can cause huge delays for development teams: Deadlines can be missed and bugs can be given a longer life span. Rolling back deployments can have ripples across all levels of the business and in a scheduled deployment world, this is almost our only option to deal with site-breaking issues.
How does moving away from scheduled deployments help alleviate some of these issues?
What I’m talking about here as a solution is: continuous delivery. A deployment on each commit to the ‘deployment branch’.
#1 and #4: Bugs can be fixed almost right away. Being able to deploy on demand when you need to has a huge benefit to fixing issues. Not only is that bug now in production for a much lower amount of time and you don’t need to roll back days worth of code to fix a single bug. This allows for the option of rolling forwards to fix issues, rather than hitting the rollback panic button, even if that includes reverting the problem commit(s).
#3 and #4: The impact and risk in deploying is significantly reduced. You aren’t piling up commits to release anymore. Each deployment only contains the code you’ve just committed. Code can even be deployed outside of ‘normal’ deployment hours, allowing more risky changes to be tacked with the least user impact possible.
There are more issues to be discussed with a scheduled deployment, but they become more specific to this particular application and this paper is quickly becoming a novel. This article has the lens of working on a Monolith application, but, what’s said here rings true for any application with a scheduled deployment. Not just our large, legacy, balls of mud.
I’m a big advocate of continuous delivery at the commit level. Throughout all the projects I’ve worked on, having the agility that comes with CD is so valuable to the success of the project it’s not even a question if it should be part of the cycle or not. There plenty of other benefits associated with CD that aren’t mentioned here but hopefully I will be able to cover some of them in the next parts to this series!