Kubernetes Operators are one of those ideas that feel magical when they work: you declare an intent/goal in YAML, and software continuously makes the cluster match it – handling upgrades, failures, drift, and lifecycle cleanup: like a purpose-built SRE on autopilot.
Although in theory it looks like sci-fi fiction, in practice Operators are just code written by someone that leverages Kubernetes’ extensibility features: they take the desired and current state of the cluster and run a control loop that reconciles the two.
At a high level, an operator is:
PostgresCluster, Cache, Tenant)This is “reconciliation”: controllers run repeatedly until the world matches the declared desired state. The key insight is that operators usually reconcile one level higher than built-in Kubernetes controllers (your operator might create a Deployment, and then Kubernetes’ own deployment controller creates ReplicaSets and Pods).
spec what the user wantsstatus what’s actually happening (observed state)If you get this separation right early on, everything else gets easier: upgrades, debugging, user trust, and testing.
If you’re building a “classic” Kubernetes operator, the most common and best-supported path is:
If you want extra packaging workflows – especially around Operator Lifecycle Manager bundles (or OLM bundles, a Kubernetes extension that helps install, upgrade, and manage operators in a cluster) – use Operator SDK. For Go projects, Operator SDK uses Kubebuilder under the hood and shares the same basic layout and controller-runtime foundation.
You can write operators in other ecosystems, and sometimes you should (team skill set, rapid prototyping, etc.). But always keep in mind that Go is the safest bet because examples are abundant and libraries/tools are very mature.
Before you scaffold anything, answer these questions:
Pick a single primary CR type as the main entry point:
WebsiteDatabaseTopicBackupJobA common beginner mistake is trying to make the operator “watch everything” and infer intent. Don’t. Make the CR the user contract.
List the Kubernetes resources the operator will create/maintain, like:
Deployment / StatefulSetConfigMap / SecretIngressPodDisruptionBudgetThis list informs your RBAC, your testing, and your reconcile design.
Define your Status Conditions strategy early:
Available=True?And last but not least, decide on your cleanup model:
ownerReferences (ideal when possible)This pays in test assertions and user experience.
Kubebuilder organizes your project into a fixed structure. The names may evolve across versions, but conceptually you’ll have:
Under the hood, controllers:
You don’t need to implement that plumbing, but you do need to design your reconcile logic to be:
This is where you should slow down and design:
fields that capture intent from users, e.g.:
suspend, paused, etc.)Available, Progressing, DegradedA good heuristic: if an SRE would ask for it at 3am, consider putting it in status.
Kubebuilder’s controller style is basically: load the CR, observe cluster state, take actions, update status.
A practical reconcile “recipe”:
The important point isn’t the exact code – it’s the shape:
Operator testing gets dramatically easier when you split it into three layers:
Test pure functions and deterministic logic:
These should run in milliseconds and cover most branches.
envtest is another very powerful tool up our sleeve that starts a local control plane (API server + etcd) so your controller talks to a real Kubernetes API, without spinning up a full cluster. This is the sweet spot for most controller tests.
What envtest is great for:
kind runs a real Kubernetes cluster in Docker containers. It’s lightweight enough for CI and is widely used for “real cluster” validation.
Use kind E2E tests to catch issues envtest won’t:
CI gets your operator correct.
CD keeps it safe, upgradeable, and operable over time.
Think of operator CD as three deliverables, not one:
Your CD pipeline should explicitly manage all three.
Before automating CD, think deeply and decide how versions will work.
v1.2.3)CRDs are APIs. Breaking them breaks users!
Operator CD must answer one question: “What happens to existing clusters when this version rolls out?”
Key things your CD process should validate:
a) CRD compatibility
b) Reconcile backward compatibility
Reconcile logic must tolerate:
c) Rolling upgrades
Testing what’s happening in the wild can be really tricky! A practical CD test outline could be something like:
This is where an operator distinguishes itself from a simple controller to a fully fledged operator.
Recreating child resources on every change causes unnecessary disruption. Patch existing resources where possible.
Use owner references so Kubernetes can garbage collect dependents automatically when the CR is deleted.
Start with:
Then add more only when you have a reason to.
Status is your UX:
And remember, in the end an operator is just a “while true” loop with discipline, in a Pod.
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth IT Italy.