Do you own your deployment?

Cliff Brake September 23, 2024 #deployment

Saturday morning, I got a call from a customer -- something was not working due to a bug we had deployed Friday (no we don't have very good tests) 🤫.

The fix was easy, I tested it locally, and then tried to push it to a Git hosting service we are using, but the Git service was down.

Now what? Our Ansible deployment script pulled directly from git, built the program, and then deployed it.

While I could reverse engineer the build from the Ansible scripts and do it manually, that would have taken time and introduced the possibility of another error.

So I pushed the repo to my Gitea server, tweaked the repo line in the Ansible script, and deployed the update -- not a big deal.

This brings up a question though -- we don't usually think of deployment as critical infrastructure -- not a big deal if it is not working -- until you need to fix something quickly in production.

What if the deployment was wrapped up in some CI/CD workflow that only worked in vendor X's cloud service?

Maybe simple deployments are actually a better -- a shell script that lives in the project repo that you can run anywhere. This could still be called by a CI process for the normal workflow.

All computing systems have the potential to fail -- it does not matter how big vendor X is -- their stuff can still fail.

Networks occasionally have problems.

DNS can have issues.

Systems get hacked.

No matter how many layers of complexity we pile on top of this.

In networked computer systems, the simplest path to resiliency is the ability to QUICKLY rebuild systems, whether that is your workstation, laptop, server, or deployment system.