Feeling Like I Have No Release: A Journey Towards Sane Deployments

Home » Feeling Like I Have No Release: A Journey Towards Sane Deployments

When I was young and dinosaurs walked the earth, I worked on a software team that developed a web-based product for two years before ever releasing it. I don’t just mean we didn’t make it publicly available; we didn’t deploy it anywhere except for a test machine in the office, accessed by two internal testers, and this required a change to each tester’s hosts file. You don’t have to be an agile evangelist to spot the red flag. There’s “release early, release often,” which seemed revolutionary the first time I heard it after living under a waterfall for years, or there’s building so much while waiting so long to deploy that you guarantee weird surprises in a realistic deployment, let alone when you get feedback from real users. I’m told the first deployment experience to a web farm was very special.

A tale of a dodgy deployment

Being a junior, I was spared being involved in the first deployment. But towards the end of the first three-month cycle of fixes, the team leader asked me, “Would you be available on Tuesday at 2 a.m. to come to the office and do a deployment?”

“Yep, sure, no worries.” I went home thinking what a funny dude my team leader was.

So on Tuesday at 9 a.m., I show up and say good morning to the team leader and the architect, who sit together staring at one computer. I sit down at my dev machine and start typing.

“Man, what happened?” the team leader says over the partition. “You said you’d be here at 2 a.m.”

I look at him and see he is not smiling. I say, ”Oh. I thought you were joking.”

“I was not joking, and we have a massive problem with the deployment.”

Uh-oh.

I was junior and did not have the combined forty years of engineering experience of the team leader and architect, but what I had that they lacked was a well-rested brain, so I found the problem rather quickly: It was a code change the dev manager had made to the way we handled cookies, which didn’t show a problem on the internal test server but broke the world on the real web servers. Perhaps my finding the issue was the only thing that saved me from getting a stern lecture. By the time I left years later, it was just a funny story the dev manager shared in my farewell speech, along with nice compliments about what I had achieved for the company — I also accepted an offer to work for the company again later.

Breaking news: Human beings need sleep

I am sure the two seniors would have been capable of spotting the problem under different circumstances. They had a lot working against them: Sleep deprivation, together with the miscommunication about who would be present, would’ve contributed to feelings of panic, which the outage would’ve exacerbated after they powered through and deployed without me. More importantly, they didn’t know whether the problem was in the new code or human error in their manual deployment process of copying zipped binaries and website files to multiple servers, manually updating config files, comparing and updating database schemas — all in the wee hours of the morning.

They were sleepily searching for a needle in a haystack of their own making. The haystack wouldn’t have existed if they had a proven automated deployment process, and if they could be sure the problem could only reside in the code they deployed. There was no reason everything they were doing couldn’t be scripted. They could’ve woken up at 6 a.m. instead of 2 a.m. to verify the automated release of the website before shifting traffic to it and fix any problems that became evident in their release without disrupting real users. The company would get a more stable website and the expensive developers would have more time to focus on developing.

If you manually deploy overnight, and then drive, you’re a bloody idiot

The 2 a.m. deployments might seem funny if it wasn’t your night to attend and if you have a dark sense of humor. In the subsequent years, I attended many 2 a.m. deployments to atone for the one I slept through. The company paid for breakfast on those days, and if we proved the deployment was working, we could leave for the day and catch up on sleep, assuming we survived the drive home and didn’t end up sleeping forever.

The manual deployment checklist was perpetually incomplete and out-of-date, yet the process was never blamed for snafus on deployment days. In reality, sometimes it was a direct consequence of the fallibility of manually working from an inaccurate checklist. Sometimes manual deployment wasn’t directly the culprit, but it made pinpointing the problem or deciding whether to roll back unnecessarily challenging. And you knew rolling back would mean forgoing your sleep again the next day so you’d have a mounting sleep debt working against you.

I learned a lot from that team and the complex features I had the opportunity to build. But the deployment process was a step backward from my internship doing Windows programming because in that job I had to write installers so my code would work on user machines, which by nature of the task, I didn’t have access to. When web development removes that inherent limitation, it’s like a devil on your shoulder tempting you to do what seems easy in the moment and update production from your dev machine. You know you want to, especially when the full deployment process is hard and people want a fix straightaway. This is why if you automate deployments, you want to lock things down so that the automated process is the only way to deploy changes.

As I became more senior and had more say in how these processes happened at my workplace, I started researching — and I found it easy to relate to the shots taken at manual deployers, such as this presentation titled “Octopus Deploy and how to stop deploying like an idiot” and Octopus Deploy founder Paul Stovell’s sentiments on how to deploy database updates: “Your database isn’t in source control? You don’t deserve one. Go use Excel.” This approach to giving developers a kick in their complacency reminds me of the long-running anti-drunk driving ads here in Australia with the slogan “If you drink then drive, you’re a bloody idiot,” which scared people straight by insulting them for destructive life choices.

In the “Stop deploying like an idiot” talk, Damian Brady insults a hypothetical deployment manager at Acme Corp named Frank, who keeps being a hero by introducing risk and wasted time to a process that could be automated using Octopus, which would never make stupid mistakes like overwriting the config file.

“Frank’s pretty proud of his process in general,” says Damian. “Frank’s an idiot.”

Why are people like this?

Frankly, some of the Franks I have worked with were something worse than idiots. Comedian Jim Jeffries has a bit in which he says he’d take a nice idiot over a clever bastard. Frank’s a cunning bastard wolf in idiotic sheep’s clothing — the demographic of people who work in software shows above average IQ, and a person appointed “deployment manager” will have googled the options to make this task easier, but he chose not to use them. The thing is, Frank gets to seem important, make other devs look and feel stupid when they try to follow his process while he’s on leave, and even when he is working he gets to take overseas trips to hang with clients because he is the only one who can get the product working on a new server. Companies must be careful which behaviors they reward, and Conway’s law applies to deployment processes.

What I learned by being forced to do deployments manually

To an extent, the process reflecting hierarchy and division of responsibility is normal and necessary, which is why Octopus Deploy has built-in manual intervention and approval steps. But also, some of the motivations to stick with manual deployments are nonsense. Complex manual deployments are still more widespread than they need to be, which makes me feel bad for the developers who still live like me back in the 2000s — if you call that living.

I guess there is an argument for the team-building experiences in those 2 a.m. deployments, much like deployments in the military sense of the word may teach the troops some valuable life lessons, even if the purported reason for the battle isn’t the real reason, and the costs turn out to be higher than anyone expects.

It reminds me of a tour I had the good fortune to take in 2023 of the Adobe San Jose offices, in which a “Photoshop floor” includes time capsule conference rooms representing different periods in Photoshop’s history, including a 90’s room with a working Macintosh Classic running Photoshop 1.0. The past is an interesting and instructive place to visit but not somewhere you’d want to live in 2025.

Even so, my experience of Flintsones-style deployments gave me an appreciation for the ways a tool like Octopus Deploy automates everything I was forced to do manually in the past, which kept my motivation up when I was working through the teething problems once I was tasked with transforming a manual deployment process into an automated process. This appreciation for the value proposition of a tool like Octopus Deploy was why I later jumped at the opportunity to work for Octopus in 2021.

What I learned working for Octopus Deploy

The first thing I noticed was how friendly the devs were and the smoothness of the onboarding process, with only one small manual change to make the code run correctly in Docker on my dev box. The second thing I noticed was that this wasn’t heaven, and there were flaky integration tests, slow builds, and cake file output that hid the informative build errors. In fairness, at the time Octopus was in a period of learning how to upscale. There was a whole project I eventually joined to performance-tune the integration tests and Octopus itself. As an Octopus user, the product had seemed as close to magic as we were likely to find, compared to the hell we had to go through without a proper deployment tool. Yet there’s something heartening about knowing nobody has a flawless codebase, and even Octopus Deploy has some smelly code they have to deal with and suboptimal deployments of some stuff.

Once I made my peace with the fact that there’s no silver bullet that magically and perfectly solves any aspect of software, including deployments, my hot take is that deploying like an idiot comes down to a mismatch between the tools you use to deploy and the reward in complexity reduced versus complexity added. Therefore, one example of deploying like an idiot is the story I opened with, in which team members routinely drove to the office at 2 a.m. to manually deploy a complicated website involving database changes, background processes, web farms, and SLAs. But another example of deploying like an idiot might be a solo developer with a side project who sets up Azure Devops to push to Octopus Deploy and pays more than necessary in money and cognitive load. Indeed, Octopus is a deceptively complex tool that can automate anything, not only deployments, but the complexity comes at the price of a learning curve and the risk of decision fatigue.

For instance, when I used my “sharpening time” (the Octopus term for side-project time) to explore ways to deploy a JavaScript library, I found at least two different ways to do it in Octopus, depending on whether it’s acceptable to automate upgrading all your consumers to the latest version of your library or whether you need more control of versioning per consumer. Sidenote: the Angry Birds Octopus parody that Octopus marketing created to go with my “consumers of your JavaScript library as tenants” article was a highlight of my time at Octopus — I wish we could have made it playable like a Google Doodle.

Nowadays I see automation as a spectrum for how automatic and sophisticated you need things to be, somewhat separate from the choice of tools. The challenge is locating that sweet spot, where automation makes your life easier versus the cost of licensing fees and the time and energy you need to devote to working on the deployment process. Octopus Deploy might be at one end of the spectrum of automated deployments when you need lots of control over a complicated automatic process. On the other end of the spectrum, the guy who runs Can I Use found that adopting git-ftp was a life upgrade from manually copying the modified files to his web server while keeping his process simple and not spending a lot of energy on more sophisticated deployment systems. Somewhere in the middle reside things like Bitbucket Pipelines or GitHub Actions, which are more automated and sophisticated than just git-ftp from your dev machine, but less complicated than Octopus together with TeamCity, which could be overkill on a simple project.

The complexity of deployment might be something to consider when defining your architecture, similar to how planning poker can trigger a business to rethink the value of certain features once they obtain holistic feedback from the team on the overall cost. For instance, you might assume you need a database, but when you factor in the complexity it adds to roll-outs, you may be motivated to rethink whether your use case truly needs a database.

What about serverless? Does serverless solve our problems given it’s supposed to eliminate the need to worry about how the server works?

Reminder: Serverless isn’t serverless

It should be uncontroversial to say that “serverless” is a misnomer, but how much this inaccuracy matters is debatable. I’ll give this analogy for why I think the name “serverless” is a problem: Early cars had a right to call themselves “horseless carriages” because they were a paradigm shift that meant your carriage could move without a horse. “Driverless cars” shouldn’t be called that, because they don’t remove the need for a driver; it’s just that the driver is an AI. “Self-driving car” is therefore a better name. Self-driving cars often work well, but completely ignoring the limitations of how they work can be fatal. When you unpack the term “serverless,” it’s like a purportedly horseless carriage still pulled by horse — but the driver claims his feeding and handling of the horse will be managed so well, the carriage will be so insulated from neighing and horse flatulence, passengers will feel as if the horse doesn’t exist. My counterargument is that the reality of the horse is bound to affect the passenger experience sooner or later.

For example, one of my hobby projects was a rap battle chat deployed to Firebase. I needed the Firebase cloud function to calculate the score for each post using the same rhyme detection algorithm I used to power the front end. This worked fine in testing when I ran the Firebase function using the Cloud Functions emulator — but it performed unacceptably after my first deployment due to a cold start (loading the pronunciation dictionary was the likely culprit if you’re wondering). Much like my experiences in the 2000s, my code behaved dramatically differently on my dev machine than on the real Firebase, almost as though there is still a server I can’t pretend doesn’t exist — but now I had limited ability to tweak it. One way to fix it was to throw money at the problem.

That serverless experience reminds me of a scene in the science fiction novel Rainbows End in which the curmudgeonly main character cuts open a car that isn’t designed to be serviced, only to find that all the components inside are labeled “No user-serviceable parts within.” He’s assured that even if he could cut open those parts, the car is “Russian dolls all the way down.” One of the other characters asks him: “Who’d want to change them once they’re made? Just trash ’em if they’re not working like you want.”

I don’t want to seem like a curmudgeon — but my point is that while something like Firebase offers many conveniences and can simplify deployment and configuration, it can also move the problem to knowing which services are appropriate to pay extra for. And you may find your options are limited when things go wrong with a deployment or any other part of web development.

Deploying this article

Since I love self-referential twist endings, I’ll point out that even publishing an article like this has a variety of possible “deployment processes.” For instance, Octopus uses Jekyll for their blog. You make a branch with the markdown of your proposed blog post, and then marketing proposes changes before setting a publication date and merging. The relevant automated process will handle publication from there. This process has the advantage of using familiar tools for collaborating on changes to a file — but it might not feel approachable to teams not comfortable with Git, and it also might not be immediately apparent how to preview the final article as it will appear on the website.

On the other hand, when I create an article for CSS-Tricks, I use Dropbox Paper to create my initial draft, then send it to Geoff Graham, who makes edits, for which I get notifications. Once we have confirmed via email that we’re happy with the article, he manually ports it to Markdown in WordPress, then sends me a link to a pre-live version on the site to check before the article is scheduled for publication. It’s a manual process, so I sometimes find problems even in this “release” of static content collaborated by only two people — but you gotta weigh how much risk there is of mistakes against how much value there would be in fully automating the process. With anything you have to publish on the web, keep searching for that sweet spot of elegance, risk, and the reward-to-effort ratio.


Feeling Like I Have No Release: A Journey Towards Sane Deployments originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

​ 

Leave a Comment

Your email address will not be published. Required fields are marked *