The Anatomy of a Rollback Deployment Workflow
Your new release tested fine on staging, but it’s not playing nicely with applications and services in the wild. Your monitoring application notices something going wrong and raises the alarm.
But often raising the alarm isn’t enough – to solve complex issues, you might need to roll back to the last good deployment while you figure out the root cause and get multiple people working together on the solution.
That’s where a rollback workflow can help ensure speedy detection, enriched alerts, consistent actions, and quick resolution and restoration.
Detect
It may go without saying, but to receive an alert about a problem, you need to detect that there’s a problem. Discovering problems manually, that is to say through human intervention, has been long been one type of detection method, but it’s not a best practice. Nearly all modern businesses rely on integrating monitoring software to ensure service uptimes and detect problems, but that data needs somewhere to go where it can be effectively actioned on.
In a rollback deployment workflow, starting with monitoring tools that can detect problems is the starting step. Once the alarm is activated, the workflow kicks into gear and the workflow can get to work.
Enrich & Alert
Sure, we’d all like to get to the point where it’s our machine overlords who have to spring to action in the middle of the night when an issue occurs. But for now, the vast majority of businesses are still relying on humans to take action when an incident occurs. Finding the right people to respond to an incident can be difficult however, especially if it happens outside of regular working hours.
Here at xMatters, we recommend to our customers building rollback deployment workflows to target alerts to groups or their associated services. This lets you use your on-call rotation and escalation rules to find the right person with the right knowledge to receive and respond to an alert, whether it happens at 2 am or 2 pm. These rules are set by admins or managers, so their team should know when they’re on call, and not be too perplexed if they get a series of alerts in the middle of the night.
While quickly alerting the right on-call resolvers is great, what’s even better is enriching those alerts with the details they need to triage and create next steps.
Adding automated enrichment to your workflow makes sure the right information with the right level of detail is available to triage the issue. This allows resolvers to have crucial information up front, and helps them to determine if the issue is a high seventy, lights out concern, or something that can wait a few hours or longer. Alternatively, a resolver would need to go from dashboard to log, collecting and collating information before they could make a decision about next steps.
In Flow Designer, this part of the workflow looks something like the below. Here, the steps gather the information from the original alert but also details on the latest commit and most recent deployment and adding those to the notification before it gets sent to the on-call resources.
With the enriched alert, that person now has the details to triage the issue and determine next steps from your response options – in this case, initiating a rollback.
Roll back, Record & Restore
So, you initiate a rollback. In xMatters, Flow Designer can send the request to your build automation application (Jenkins in this example).
While this is going on, you probably want to keep others informed, whether that’s calling on your teammates for help, updating business stakeholders on the progress, or letting customers know you’re aware of an issue and working to resolve it.
You might also need to make notes to your future self, either to improve the development process and service resiliency or to harden the incident response process. For example, you might create tickets in a service desk, post tasks on a virtual board, or add notes to the incident to include in a post-incident report.
Your specific rollback workflow will depend on your applications and processes, but this example might give you an idea of what to consider. Learn more about xMatters workflows here. Or, if you’re interested in xMatters workflow templates, find more details and a walkthrough video here!