It looks like I've linked you here myself. Linking people to a blogpost I wrote is often a bit
akward, especially at work.
I likely shared this blog in an attempt to further a conversation. Usually the post does a better
job at succinctly sharing information
than I could by talking.
In any case, I hope me sharing this post doesn't come across as
really the opposite of what I'm trying to achieve.
Thanks for reading!
The Art of Releasing Software
6 min read
Thoughts on Release Engineering and Management
The act of releasing is something all software engineers deal with. Obviously, context dictates everything here: an on-prem software product that needs to be deployed in 100 different flavors on as many different customer sites requires a very different process than a multi-tenant SaaS product, which is very different again from a mom-and-pop webshop or an open-source command-line tool.
Regardless of context though, in my experience releasing software is often still an afterthought - an necessary evil to get the code to the user, not a repeatable and optimized process. Yet, there’s plenty of evidence that the ability to release often and consistently (ie. without issues) is a critical capability for every software engineering or IT organization.
In particular, I find that in many cases not enough attention is given to the difference between Release Management and Release Engineering and their respective responsibilities. While in smaller teams, these 2 jobs can be performed by the same person (although the skill-sets are orthogonal), many organizations can benefit from splitting these out and putting deliberation into each role.
Release Management deals with the overall coordination of releases - it’s a project management role.
Schedule and cadence: When and where are releases going out, taking into account calendars, resource availability, customer requests, etc.
Plan: What is the sequence of high-level steps for a single release. Tracking these as they occur. The VSCode Endgame(s) is a good light-weight example of this.
Scope: Ensure everyone understands the feature scope of the release (what’s included and what not). I’ve experienced many cases of mismatch between what (product) managers thought was being deployed and what engineers actually deployed.
Is the release scope really what you think it is?
Coordination: Coordination between all different teams involved with the release: product, development, QA, RelEng, Ops/SRE, etc. Scheduling meetings, resolving blockers, driving progress. This is before, during and after the release window.
Communications: Communicating about upcoming, ongoing and performed releases, detailing impact (if any) and changes. Writing good release notes is an art on its own.
Writing Release Notes Basics: 1) Write them before doing the release 2) you might need different versions for different audiences 3) a commit/changelog is not the same as release notes!
Migration: Coordinating any transitions of users that are older versions of the software to newer versions. If required, coordinating the decommissioning of older versions of the software.
Change management: If your organization uses a formal change management process, ensuring the right paper trails are in place and the required approvals are obtained.
Process documentation: Documenting all the above on a wiki and create re-usable templates.
Driving continuous release improvement: Releasing anything but the simplest piece of software is hard work. You’ll get it wrong, fix it, and then get it wrong AGAIN. In my experience, doing it flawlessly requires at least a two dozen iterations. Getting better should be a conscious effort.
Kata sounds like a karate scream, but is really a reference to Toyota Kata which embodies the idea of practicing something until it becomes second nature - an important principle when releasing software.
Release Engineering deals with the mechanics of getting code to production - it’s a software engineering role.
Method of Procedure (MOP): Also referred to as playbooks or runbooks, this is the step-by-step guide that outlines how to get a new change into production. This guide should be very tactical and exhaustive, so that anyone with a basic level of technical understanding of the software can repeat the process.
A good MOP contains:
MOP Metadata: expected duration, executer, time
Release metadata: pointers to scope, plan, test results, change management
Pre-requisites: pointers to how to get system access, assumptions, etc
Rollback plan: Steps to undo the release, including pointers on how to get help or escalate
Release Automation: developing any tooling or automation to speed up and simplify the release and rollback process. The goal here is to minimize the number of steps in the MOP. Typically this work heavily leans on CI/CD and infrastructure-as-code technology. My blogpost on Useful DevOps Resources contains a bunch of interesting links in this space.
If cutting a release feels like cutting onions, something’s wrong. You should fix that.
Cutting Releases: coming up with a branching strategy, cutting the release branch(es), collecting and promoting release artifacts, locking artifact dependencies (e.g. which versions of subcomponents make up the bigger release). The concepts of reproducible and hermetic builds are important here.
Configuration Management: Determining the process of how static configuration should be managed (hint: as code in version control) and pushed with the release. I highly recommend Chapter 15 of the SRE Workbook on this topic or at least looking at Jsonnet for a fresh perspective.
Deployment: Actually pushing releases to all integration, stage and production environments following the MOP and verifying correct deployment. Depending on your context, this might take anywhere from minutes to weeks, be an all-in-one process or using staged roll-outs.
Side-note: One particular thing I've learned is that rotating engineers into the release engineering role benefits everyone: it shares the load amongst the team, it creates awareness for developers to address release related work earlier in the development cycle and it helps ensure the release process itself is continuously optimized as fresh eyes typically more easily identify new areas for improvement.
On the fully automated release process
Chances are that after reading the above, you’re thinking that a lot of it is old-skool or corporate overhead. While that might be true to some extent, I’d like to re-emphasize that context is everything. Once you reach a certain amount of complexity (technical debt, deployment heterogeneity, deployment size, whatever), a lot of aforementioned work will come into play - releasing software is no longer a one button push activity. That’s true even if you’re using a fully integrated CI/CD pipeline with a public cloud provider and/or infrastructure orchestrator like kubernetes.
It should be clear that there’s also not one way to do it right, although certain ways are certainly better than others. There’s a lot of content out there, here’s some things I recommend:
As with any sufficiently complex problem, there’s no silver bullet here, only a lot of hard work - the important part is deliberation. Don’t let releases be an afterthought, give them the attention they need!