What Are Special-Purpose Operating Systems in the Cloud Native World?
Disclaimer
I'm one of the maintainers of Kairos, an operating system built for cloud-native workloads. I also served a term as co-chair of what was then the CNCF Special Purpose Operating System Working Group under TAG Runtime. Both of those shape how I see this space. These are my personal views, not the official views of my employer or of the Kairos project. I'll try to be fair to other projects and honest about where my perspective is partial.
GPOS and SPOS
Most people, when they say "operating system," are thinking of a general-purpose operating system (GPOS). Linux, macOS, Windows, the BSDs. The defining property of a GPOS is exactly what it sounds like: it's designed to run a wide range of workloads on a wide range of hardware, configurable in a wide range of ways, for a wide range of users. GPOSes are the right answer when you don't know in advance what the machine is going to do, when the machine is going to do several things at once, or when a human is going to log in and use the machine interactively.
A special-purpose operating system (SPOS) is the opposite. It's designed around a specific operational model, workload, or hardware profile. You give up generality in exchange for fit: the OS matches its purpose more precisely than a general-purpose OS can, because it doesn't have to accommodate every other purpose too.
SPOS is an umbrella term, and the umbrella is wider than most cloud-native conversations acknowledge. It covers:
- Mobile operating systems like Android and iOS, purpose-built for touch-first, battery-constrained, app-store-distributed devices
- Real-time operating systems (RTOS) used in industrial control, avionics, and automotive systems, where deterministic timing is the defining constraint
- Embedded operating systems for routers, appliances, IoT devices, and other devices with fixed functions
- Unikernels: single-application images that collapse the OS/app boundary entirely
- Cloud-native operating systems: the subcategory I'll spend the rest of this post on
These are all SPOSes, and they are not interchangeable. An RTOS would be a bad fit for a phone, Android would be a bad fit for an industrial controller, and a cloud-native OS would be a bad fit for either. The "special" in "special-purpose" means something different in each case.
The SPOS Working Group, and why "cloud-native OS" is a better name for this subcategory
For a while, the CNCF had a working group specifically focused on this space. It was the Special Purpose Operating System Working Group, it sat under TAG Runtime, and it was the home where practitioners from Bottlerocket, Flatcar, Talos, Kairos, openSUSE MicroOS, and others compared notes. In the CNCF TAG restructure that happened in 2025, TAG Runtime and several adjacent TAGs were consolidated into a new TAG called Workloads Foundation, and the old working groups, including SPOS, were dissolved as part of that restructure. Kairos, Flatcar, and bootc sit under TAG Workloads Foundation.
The dissolution of the SPOS WG is actually useful for a naming conversation I've been wanting to have for a while. "SPOS" was always a slightly awkward fit for the specific category that working group covered: it's accurate but ambiguous, since SPOS includes things that have nothing to do with cloud-native. The original WG charter scoped itself further to "container OSes," which was more specific but still named the category by its current workload rather than its architecture. Containers are what these systems run today. WASM is what some of them will run tomorrow. Whatever comes after is what they'll run in ten years. Defining the category by the workload makes the definition brittle as workloads evolve.
What I think works better, and what I'll use for the rest of this post and a series that follows it, is cloud-native OS. The term names the design goal rather than the implementation, which makes it robust to how workloads and techniques evolve. And now that the SPOS WG no longer exists as an active body, there's no institutional name to displace. The conversation is free to pick up a clearer term.
What makes an OS "cloud-native"?
Here's where I want to be careful, because it's easy to define cloud-native OS the wrong way.
The tempting definition is to list features: image-based delivery, immutable root filesystem, atomic upgrades, declarative configuration, OCI-artifact transport, trusted boot. These are the properties most of the projects in this space have, and they're the properties I'll spend most of the rest of the series talking about.
But features aren't the definition. Features are how you achieve the definition. If we say a cloud-native OS is one that has these five features, then any project that adopts those features, even if it doesn't serve cloud-native goals, would qualify, and any project that serves cloud-native goals through different techniques wouldn't. Both of those conclusions seem wrong to me.
A better framing starts with what cloud-native itself is about. The CNCF defines cloud-native as a set of practices for building and running scalable applications in modern, dynamic environments. The original context was public cloud, but the same goals have since been applied on-prem, at the edge, in air-gapped environments, and across sovereign clouds. The goals are:
- Scalability: systems that scale up and down with demand without requiring architectural redesign
- Reliability: systems that tolerate the failure of individual components without taking everything down
- Portability: workloads that can move between environments without being rewritten
- Manageability at scale: fleets that can be operated declaratively rather than node-by-node
- Observability: systems whose state is legible to automated monitoring and human operators
Cloud-native techniques, containers, immutable infrastructure, declarative APIs, orchestration, are how the industry has been achieving those goals. They're not the goals themselves.
So here's the definition I'd propose: a cloud-native OS is an operating system designed to serve cloud-native goals. It's scalable, reliable, portable, manageable at scale, and observable, at the OS layer, for the fleet it's part of. The features most of these projects share (image-based delivery, immutability, atomic upgrades, declarative config) are the current best-known techniques for achieving those goals, but they're not the only possible techniques, and future cloud-native OSes might achieve the same goals in different ways.
I think this framing is better than a features-list definition for three reasons. First, it's honest about what we're actually trying to do: the techniques serve the goals, the goals don't serve the techniques. Second, it's robust: if someone invents a better technique for achieving cloud-native goals at the OS layer ten years from now, we don't have to redefine the category. And third, it makes the evaluation question sharper: instead of asking "does this OS have image-based upgrades," you ask "does this OS actually serve my cloud-native goals in my environment," which is the question you needed answered anyway.
Did the OS stop mattering?
At some point in the cloud-computing timeline, the industry started acting like the OS didn't matter anymore. Platforms-as-a-Service abstracted everything below the application. You wrote code, shipped it, and the platform handled the rest. Kubernetes went a step further: the OS became the thing Kubernetes ran on, and for many teams the answer to "which OS?" was "whichever one the cloud provider picked for us."
That abstraction is genuinely useful for a lot of teams. If you're a startup moving fast, you probably shouldn't be spending time on OS choices. The platform is your operating environment, and that's fine.
But there's a cost to taking the abstraction too far. When the OS is invisible, you lose the ability to reason about things the OS is in the best position to answer: how do I know what's actually running on this machine, how do I verify nothing has been tampered with between build and boot, how do I upgrade a fleet predictably, what happens when a node behaves strangely, what's my recovery story if a physical device is compromised, how do I prove to an auditor that my compliance claims are backed by measurable facts?
These questions don't stop existing because the OS got abstracted. They just stop having good answers. And as platform teams take more direct ownership of edge deployments, sovereign clouds, air-gapped environments, and Kubernetes infrastructure they don't rent from a hyperscaler, the questions come back. Cloud-native OSes are what you get when you take those questions seriously and design the OS layer around them, rather than inheriting a general-purpose distribution and hoping for the best.
Examples in the current landscape
A few projects in this space, as of now:
- Bottlerocket: AWS's purpose-built OS for container workloads, Apache 2.0 licensed, not in any foundation
- Flatcar Container Linux: CNCF Incubating, Microsoft-stewarded via the Kinvolk acquisition, the first OS to reach CNCF Incubation
- Talos Linux: Sidero Labs', API-driven with no shell access, MPL 2.0 for the OS itself
- Kairos: CNCF Sandbox, first operating system accepted into the CNCF, distribution-agnostic and Kubernetes-first
- bootc: CNCF Sandbox, Red Hat-led, bootable OS images using ostree and composefs
- EVE OS: LF Edge project, created by ZEDEDA for distributed edge and fleet-managed deployments
- Garden Linux: SAP-built, donated to the NeoNephos Foundation under Linux Foundation Europe
- Google Container-Optimized OS: Google's internal OS for GCP, not open source in any meaningful sense
- openSUSE MicroOS: a transactional-update variant of openSUSE aimed at container workloads
Each of these makes different trade-offs, and I'll go deeper on several of them in other posts. The point of listing them here is just to make clear that the cloud-native OS category isn't hypothetical. It has real projects, real adopters, and a real body of practice that has accumulated over the last several years.
What this series will cover
This post is the on-ramp to a series I'm starting on cloud-native operating systems: what they are, what they promise, what they actually deliver, how to evaluate them, and where the category is going. Upcoming posts will cover:
- Why image-based delivery is the foundational technical shift, and why most cloud-native-OS properties depend on it
- How to think about governance at the OS layer, since the people who control the OS project end up controlling part of your platform
- What trusted boot and measured boot actually prove, and why the "platform contract" framing matters more than the pure security framing
- How to evaluate security claims in this space without being taken in by marketing that oversells attack-surface arguments
- The honest trade-offs: what you give up moving from a general-purpose OS to a cloud-native one, and when those trade-offs aren't worth it
- A deeper survey of the landscape above, with governance, licensing, and adoption context for each project
If you're evaluating any of these systems for real, or just trying to make sense of the category, I hope the series is useful.
Let's continue the conversation
If you're working on this space too, or just thinking through these ideas, feel free to reach out through my contact page.