📦 Release first ⏩ Ship faster ⚡

⚖️ Reliability tradeoffs

Recently, one of our servers running Arch Linux refused to boot after updates. The fix was to comment out a line in the /etc/default/grub file. It took about 15 minutes to find and everything was done in a rescue terminal — no need to boot from a rescue image. This server has been running for several years, been updated hundreds of times, and this is the only time it would not boot.

Many dismiss Arch as a server OS because there is supposedly a higher likelihood of things like the above happening. But in practice, they rarely happen and are fairly quick to fix. So on one side of the equation:

  • Easy to administer OS, latest software, no OS upgrades, etc. This saves a ton of time.

On the other side:

  • Occasional effort to fix an issue like the above.
  • Downtime while doing the above.

For a high-profile eCommerce site, downtime would matter, but for a small business, 30 minutes of downtime makes no practical difference. The time saved running Arch greatly swamps (by an order of magnitude or two) the time spent fixing things that break. Admin time is many times more valuable than server up-time. Even spending half a day rebuilding the site from scratch (which is mostly automated) would not matter.

There is no one-size-fits-all. Just because Summit Federal Bank runs Crucible Linux Enterprise edition does not mean everyone else needs to. Always consider the benefit-cost ratio (BCR).

Reliability tradeoffs

Cliff Brake January 22, 2026 #reliability #linux #server #BCR #risk #infrastructure #maintenance