How LDAP Authentication Works on Linux: PAM, NSS, and the Login Stack

Reading Time: 9 minutes

The Identity Stack, Episode 3
EP01: What Is LDAPEP02: LDAP InternalsEP03EP04: SSSD → …

Focus Keyphrase: LDAP authentication Linux
Search Intent: Informational
Meta Description: Trace a Linux SSH login through PAM, NSS, and LDAP step by step — and understand why LDAP alone is not an authentication protocol. (144 chars)


TL;DR

  • LDAP is a directory protocol — it stores identity information and can verify a password via Bind, but authentication on Linux runs through PAM, not directly through LDAP
  • NSS (/etc/nsswitch.conf) answers “who is this user?” — it resolves UIDs, group memberships, and home directories by querying LDAP (or the local files, or SSSD)
  • PAM (/etc/pam.d/) answers “are they allowed in?” — it enforces authentication, account validity, session setup, and password policy
  • pam_ldap (the old way) opened a direct LDAP connection on every login — fragile, no caching, broken when the LDAP server was unreachable
  • pam_sss (the modern way) delegates to SSSD, which caches credentials and handles failover — SSSD is the layer between Linux and the directory
  • Tracing a single SSH login: sshd → PAM → pam_sss → SSSD → LDAP Bind + Search → session created

The Big Picture: One SSH Login, Four Layers

You type: ssh [email protected]

  sshd
    │
    ▼
  PAM  (/etc/pam.d/sshd)          ← "Is this user allowed in?"
    │
    ├── pam_sss    (auth)          ← sends credentials to SSSD
    ├── pam_sss    (account)       ← checks account not expired/locked
    ├── pam_sss    (session)       ← logs the session open/close
    └── pam_mkhomedir (session)    ← creates /home/vamshi if it doesn't exist
    │
    ▼
  SSSD  (/etc/sssd/sssd.conf)     ← "Let me check the directory"
    │
    ├── NSS responder              ← answers getent, id, getpwnam
    └── LDAP/Kerberos provider     ← talks to the actual directory
    │
    ▼
  LDAP Server (AD / OpenLDAP)
    │
    ├── Bind: uid=vamshi + password (or Kerberos ticket)
    └── Search: posixAccount attrs for uid=vamshi
    │
    ▼
  Linux session created
  UID=1001, GID=1001, HOME=/home/vamshi, SHELL=/bin/bash

EP02 showed what the directory contains and what travels on the wire. What it left open is how Linux uses that to grant a login — and why LDAP is not, by itself, an authentication protocol.


Why LDAP Is Not an Authentication Protocol

This is the confusion that trips people most. LDAP can verify a password — the Bind operation does exactly that. But authentication on Linux means something broader: checking credentials, checking account validity, enforcing password policy, setting up a session, creating a home directory. LDAP handles one piece of that. PAM handles the rest.

More precisely: LDAP doesn’t know what a Linux session is. It doesn’t know about /etc/pam.d/. It doesn’t enforce login hours, account expiry, or concurrent session limits. It returns directory entries and verifies binds. The intelligence about what to do with those results lives in the Linux authentication stack.

When you run ssh vamshi@server, the OS doesn’t open an LDAP connection and ask “can this user log in?” It calls PAM. PAM consults its configuration, and PAM decides whether to call LDAP (directly or via SSSD), whether to check the shadow file, whether to enforce MFA. LDAP is one possible backend. It’s not the gatekeeper.


NSS: The Traffic Controller

Before PAM runs, Linux needs to know if the user exists at all. That’s NSS’s job.

/etc/nsswitch.conf is a routing table for name resolution. It tells the OS where to look when something asks “who is UID 1001?” or “what groups is vamshi in?”:

# /etc/nsswitch.conf

passwd:     files sss        ← user lookups: check /etc/passwd first, then SSSD
group:      files sss        ← group lookups: check /etc/group first, then SSSD
shadow:     files sss        ← shadow password lookups
hosts:      files dns        ← hostname lookups (not identity-related)
netgroup:   sss              ← NIS netgroups from SSSD only
automount:  sss              ← autofs maps from SSSD

Every call to getpwnam(), getpwuid(), getgrnam(), getgrgid() in any process — including sshd — goes through NSS. The entries in nsswitch.conf control which backends are tried in order.

With passwd: files sss, a lookup for user vamshi:
1. Checks /etc/passwd — not found (vamshi is a domain user, not in local files)
2. Queries SSSD — SSSD checks its cache, or queries LDAP, and returns the posixAccount attributes

Without the sss entry in passwd:, domain users don’t exist on the system — getent passwd vamshi returns nothing, id vamshi fails, SSH login never gets to PAM’s authentication step.

# Verify NSS is routing to SSSD correctly
getent passwd vamshi
# vamshi:*:1001:1001:Vamshi K:/home/vamshi:/bin/bash

# If this returns nothing, NSS isn't reaching SSSD
# Check: systemctl status sssd && grep passwd /etc/nsswitch.conf

# See what groups the user is in (NSS group lookup)
id vamshi
# uid=1001(vamshi) gid=1001(engineers) groups=1001(engineers),1002(ops)

PAM: The Real Gatekeeper

PAM (Pluggable Authentication Modules) is the framework that lets Linux swap authentication backends without recompiling anything. Every service that needs to authenticate users — sshd, sudo, login, su, gdm — has a PAM configuration file in /etc/pam.d/.

Each PAM config defines four stacks:

auth        ← verify credentials (password, key, MFA)
account     ← check if the account is valid (not expired, not locked, login hours)
password    ← password change policy
session     ← set up/tear down the session (home dir, limits, logging)

A typical /etc/pam.d/sshd on a system joined to AD via SSSD:

# /etc/pam.d/sshd

# auth stack — verify the user's credentials
auth    required      pam_sepermit.so
auth    substack      password-auth   ← usually includes pam_sss.so

# account stack — check account validity
account required      pam_nologin.so
account include       password-auth

# password stack — handle password changes
password include      password-auth

# session stack — set up the session
session required      pam_selinux.so close
session required      pam_loginuid.so
session optional      pam_keyinit.so force revoke
session include       password-auth
session optional      pam_motd.so
session optional      pam_mkhomedir.so skel=/etc/skel/ umask=0077
session required      pam_selinux.so open

The include and substack directives pull in shared stacks from other files (like /etc/pam.d/password-auth). On a system with SSSD, password-auth contains:

auth    required      pam_env.so
auth    sufficient    pam_sss.so      ← try SSSD first
auth    required      pam_deny.so     ← if pam_sss fails, deny

account required      pam_unix.so
account sufficient    pam_localuser.so
account sufficient    pam_sss.so      ← SSSD account check
account required      pam_permit.so

session optional      pam_sss.so      ← SSSD session tracking

The sufficient flag means: if this module succeeds, stop checking this stack and consider it passed. required means: this must pass (but continue checking other modules and report failure at the end). requisite means: if this fails, stop immediately.


PAM Control Flags at a Glance

required   — must succeed; failure reported after remaining modules run
requisite  — must succeed; failure reported immediately, stack stops
sufficient — if success, stop stack (ignore remaining); failure continues
optional   — result ignored unless it's the only module in the stack

This matters for debugging. If pam_sss.so is sufficient and SSSD is down, PAM falls through to pam_deny.so — login denied. If it were optional, the login would proceed to the next module. The control flag is the policy decision.


The Old Way: pam_ldap

Before SSSD, Linux systems used pam_ldap and nss_ldap directly:

# Old /etc/pam.d/common-auth (Ubuntu pre-SSSD era)
auth    sufficient    pam_ldap.so    ← direct LDAP connection per login
auth    required      pam_unix.so nullok_secure

# Old /etc/nsswitch.conf
passwd: files ldap    ← nss_ldap for user lookups
group:  files ldap

pam_ldap opened a fresh LDAP connection on every login attempt. No caching. If the LDAP server was unreachable for 3 seconds, the login hung for 3 seconds — sometimes much longer. If the LDAP server was down, all domain logins failed immediately. Previously logged-in users with active sessions were fine; new logins simply didn’t work.

nss_ldap had the same problem for NSS lookups: every getpwnam() call hit the LDAP server directly. On a busy system with many processes doing user lookups, this meant hundreds of LDAP queries per second, no connection reuse, and no way to survive a brief network blip.

The problems were structural:
– No credential caching — offline logins impossible
– No connection pooling — LDAP server saw one connection per login attempt
– No failover logic — one LDAP server down meant all logins down
– Slow timeouts that blocked login sessions

SSSD was built to fix all of this.


The Modern Way: pam_sss + SSSD

pam_sss doesn’t talk to LDAP directly. It’s a thin client that passes authentication requests to SSSD over a Unix domain socket. SSSD manages the LDAP connection, the credential cache, and the failover logic.

sshd  →  PAM (pam_sss)  →  SSSD (Unix socket)  →  LDAP server
                                   │
                                   └── credential cache
                                       (survives brief LDAP outages)

When pam_sss sends a credential to SSSD:
1. SSSD checks its in-memory cache — if the credential hash matches a recent successful auth, it can satisfy the request without hitting LDAP
2. If not cached (or cache expired), SSSD sends a Bind to the LDAP server
3. On success, SSSD caches the result and returns success to pam_sss
4. pam_sss returns PAM_SUCCESS, and the auth stack continues

The credential cache is what enables offline logins. If the LDAP server is unreachable and a user has authenticated successfully within the cache TTL (default: 1 day for credentials, configurable via cache_credentials = True in sssd.conf), SSSD satisfies the auth from cache and the login succeeds. The user never knows the LDAP server was down.


Tracing a Full SSH Login

Here’s every step of an SSH login for a domain user, in order:

1.  sshd accepts the TCP connection
2.  sshd calls PAM: pam_start("sshd", "vamshi", ...)

3.  PAM auth stack runs pam_sss:
      pam_sss sends credentials to SSSD via /var/lib/sss/pipes/pam

4.  SSSD auth provider:
      a. Check credential cache — miss (first login)
      b. Resolve user: NSS lookup for uid=vamshi
         → SSSD LDAP provider searches dc=corp,dc=com for (uid=vamshi)
         → Returns: uidNumber=1001, gidNumber=1001, homeDirectory=/home/vamshi
      c. Authenticate: LDAP Simple Bind as uid=vamshi,ou=engineers,dc=corp,dc=com
         → Server returns: success
      d. Cache the credential hash + POSIX attrs

5.  SSSD returns PAM_SUCCESS to pam_sss

6.  PAM account stack runs pam_sss:
      SSSD checks: account not expired, not locked, login permitted
      → PAM_ACCT_MGMT success

7.  PAM session stack:
      pam_loginuid sets /proc/self/loginuid = 1001
      pam_mkhomedir creates /home/vamshi if missing
      pam_sss opens session (records in SSSD session tracking)

8.  sshd creates the shell, sets environment:
      USER=vamshi, HOME=/home/vamshi, SHELL=/bin/bash, LOGNAME=vamshi

9.  Shell prompt appears

Steps 4b and 4c are the only two LDAP operations in the entire login flow: one Search to resolve the user’s attributes, one Bind to verify the password. Everything else is PAM and SSSD.


Debugging the Stack

When a login fails, the failure could be in any layer. Work top-down:

# 1. Does NSS resolve the user at all?
getent passwd vamshi
# If empty: NSS isn't reaching SSSD, or SSSD isn't finding the user in LDAP

# 2. Is SSSD running and healthy?
systemctl status sssd
sssctl domain-status corp.com      # shows SSSD's view of domain connectivity

# 3. What does SSSD think about the user?
sssctl user-checks vamshi          # runs auth + account checks internally
id vamshi                          # forces NSS resolution and shows group memberships

# 4. What does SSSD's log say?
journalctl -u sssd -f              # tail SSSD logs live, then attempt login

# 5. Can you reach the LDAP server at all?
ldapsearch -x -H ldap://dc.corp.com \
  -D "cn=svc-ldap,ou=services,dc=corp,dc=com" \
  -w "password" \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)" dn

# 6. Force a cache flush if entries are stale
sss_cache -u vamshi                # invalidate this user's cache entry
sss_cache -G engineers             # invalidate a group

The sssctl user-checks command is the single most useful diagnostic — it simulates the full PAM auth + account check flow without actually creating a session, and prints exactly what SSSD would do on a real login attempt.


⚠ Common Misconceptions

“If ldapsearch works, SSH login should work.” Not necessarily. ldapsearch tests the LDAP layer. An SSH login requires NSS to resolve the user, PAM to authenticate, SSSD to be running and configured correctly, and pam_mkhomedir to create the home directory if it’s the first login. Any of these can fail independently.

“pam_ldap and pam_sss do the same thing.” They have the same job (authenticate via LDAP) but completely different architectures. pam_ldap is a direct-connect, no-cache module. pam_sss is a client of SSSD, which provides caching, connection pooling, failover, and offline support. On any modern system, you want pam_sss.

“nsswitch.conf order doesn’t matter much.” It matters exactly as much as the order suggests. passwd: files sss means local /etc/passwd is always checked first — if a domain username collides with a local user, the local account wins. This is the intended behavior (local accounts should always be reachable), but it means you’ll never override a local account with a directory entry.

“SSSD cache = security risk.” The cache stores a credential hash, not the cleartext password. An attacker with access to the SSSD cache database (/var/lib/sss/db/) would see hashed credentials — the same situation as /etc/shadow. The real concern is whether offline authentication is appropriate for your security posture; it can be disabled with offline_credentials_expiration = 0.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management PAM is the enforcement layer for authentication policy on Linux — understanding its stack is foundational to any Linux IAM deployment
CISSP Domain 3: Security Architecture and Engineering The separation between NSS (resolution) and PAM (authentication) is an architectural boundary — misunderstanding it leads to misconfigured systems where account checks are bypassed
CISSP Domain 4: Communications and Network Security pam_ldap vs pam_sss affects whether credentials travel over a direct LDAP connection (one socket per login, no TLS guarantee) or through SSSD’s managed, pooled connection

Key Takeaways

  • LDAP alone is not an authentication protocol for Linux — authentication flows through PAM, and LDAP is one of PAM’s possible backends
  • NSS (/etc/nsswitch.conf) resolves user identity (who is UID 1001?); PAM enforces it (are they allowed in?)
  • pam_ldap talks to LDAP directly — no cache, no failover, login blocked when LDAP is unreachable
  • pam_sss delegates to SSSD — credential caching, connection pooling, offline login, and failover are all built in
  • A full SSH login touches LDAP exactly twice: one Search for POSIX attributes, one Bind to verify the password
  • When login fails, debug top-down: NSS resolution → SSSD status → LDAP reachability → PAM config

What’s Next

EP03 showed how authentication reaches LDAP — through PAM, through SSSD, through a Bind. What it assumed is that SSSD is healthy and the LDAP server is reachable. The moment either goes wrong, the behavior depends entirely on how SSSD is configured — its cache TTLs, its failover order, its offline credential policy.

EP04 goes inside SSSD: the architecture, the sssd.conf knobs that matter, how to read the logs, and how to break it intentionally and fix it.

Next: SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

One Blueprint, Six Clouds — Multi-Provider OS Image Builds

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**

Focus Keyphrase: multi-cloud OS hardening
Search Intent: Informational
Meta Description: Maintain one OS hardening baseline across AWS, GCP, and Azure without separate scripts that drift. One HardeningBlueprint YAML, six providers, zero duplication. (155 chars)


TL;DR

  • Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
  • A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
  • The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
  • Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
  • One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
  • Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.


We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.


How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.


The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.


What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider OS disk Ephemeral Data
AWS /dev/xvda /dev/xvdb /dev/xvdc+
GCP /dev/sda /dev/sdb+
Azure /dev/sda /dev/sdb (temp disk) /dev/sdc+
DigitalOcean /dev/vda /dev/vdb+

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.


Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.


Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.


Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.


Key Takeaways

  • Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
  • The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
  • Parallel multi-provider builds produce images with identical compliance grades on the same schedule
  • Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
  • Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

Kubernetes CRDs in Production: Finalizers, Status Conditions, and RBAC Patterns

Reading Time: 8 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 10
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Finalizers block deletion until cleanup completes — they prevent orphaned external resources but cause stuck objects if the controller crashes mid-cleanup; always implement a removal timeout
  • Status conditions are the standard communication channel between controller and user: use type, status, reason, message, and observedGeneration on every condition; never invent ad-hoc status fields
  • Owner references wire automatic garbage collection — when the parent custom resource is deleted, Kubernetes deletes owned child objects; use them for every object your controller creates in the same namespace
  • RBAC for CRDs in multi-tenant clusters must include separate ClusterRoles for controller, editor, and viewer; grant status and finalizers as separate sub-resources; never give application teams cluster-scoped create/delete on CRDs
  • The three most common Kubernetes CRD production failure modes: finalizer death loop, status thrash, and CRD deletion cascade — all avoidable with the patterns in this episode
  • Running kubectl get crds on a healthy cluster should show Established: True for every CRD; non-Established CRDs silently reject all create requests

The Big Picture

  PRODUCTION CRD LIFECYCLE: FULL PICTURE

  Create         Reconcile        Suspend/Resume      Delete
  ──────         ─────────        ──────────────      ──────
  User applies   Controller       User patches         User deletes
  BackupPolicy   creates CronJob, spec.suspended=true  BackupPolicy
      │          sets status          │                    │
      ▼              │                ▼                    ▼
  Admission      │           Controller          Finalizer blocks
  webhook        │           suspends CronJob     deletion
  (if any)       │                               Controller:
      │          │                                 1. Delete CronJob
      ▼          ▼                                 2. Remove external state
  Schema       Status                              3. Remove finalizer
  validation   conditions                          Object deleted from etcd
      │        updated
      ▼
  Controller
  reconcile
  triggered

Kubernetes CRD production readiness is not just about making the happy path work — it is about designing for the failure modes: controllers crashing mid-operation, deletion races, and status messages that confuse operators at 2am.


Finalizers: Controlled Deletion

A finalizer is a string in metadata.finalizers. Kubernetes will not delete an object that has finalizers, regardless of who issues the delete command.

metadata:
  name: nightly
  namespace: demo
  finalizers:
    - storage.example.com/backup-cleanup  # ← your controller put this here

When kubectl delete bp nightly runs:

  1. API server sets metadata.deletionTimestamp  (does NOT delete yet)
  2. Object is visible as "Terminating"
  3. Controller sees deletionTimestamp set
  4. Controller runs cleanup:
       - delete backup data from S3
       - delete CronJob (or let owner references handle it)
       - release any external locks
  5. Controller removes the finalizer:
       patch bp nightly --type=json \
         -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
  6. API server sees finalizers list is now empty → deletes the object

Adding a finalizer in Go

const finalizerName = "storage.example.com/backup-cleanup"

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Deletion path
    if !bp.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(bp, finalizerName) {
            if err := r.cleanupExternalResources(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(bp, finalizerName)
            if err := r.Update(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }

    // Normal path: ensure finalizer is present
    if !controllerutil.ContainsFinalizer(bp, finalizerName) {
        controllerutil.AddFinalizer(bp, finalizerName)
        if err := r.Update(ctx, bp); err != nil {
            return ctrl.Result{}, err
        }
    }

    // ... rest of reconcile
}

Finalizer death loop and the timeout pattern

If cleanupExternalResources always returns an error (external system down, bug in cleanup code), the object gets stuck in Terminating forever. The operator cannot delete it; kubectl delete --force does not help with finalizers.

Prevention: add a cleanup deadline with status tracking.

func (r *BackupPolicyReconciler) cleanupExternalResources(ctx context.Context, bp *storagev1alpha1.BackupPolicy) error {
    // Check if we've been trying to clean up for too long
    if bp.DeletionTimestamp != nil {
        deadline := bp.DeletionTimestamp.Add(10 * time.Minute)
        if time.Now().After(deadline) {
            // Log the failure, abandon cleanup, let the object be deleted.
            log.FromContext(ctx).Error(nil, "cleanup deadline exceeded, removing finalizer anyway",
                "name", bp.Name)
            return nil   // returning nil removes the finalizer
        }
    }
    // ... actual cleanup
}

Recovery for a stuck object (use only when cleanup truly cannot succeed):

kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'

Status Conditions: The Right Way

The Kubernetes standard condition format is defined in k8s.io/apimachinery/pkg/apis/meta/v1.Condition:

type Condition struct {
    Type               string          // e.g. "Ready", "Synced", "Degraded"
    Status             ConditionStatus // "True", "False", "Unknown"
    ObservedGeneration int64           // the .metadata.generation this condition reflects
    LastTransitionTime metav1.Time     // when Status last changed
    Reason             string          // machine-readable, CamelCase, e.g. "CronJobCreated"
    Message            string          // human-readable, may contain details
}

Standard condition types

Type Meaning
Ready The resource is fully reconciled and operational
Synced The resource has been synced with an external system
Progressing An operation is actively in progress
Degraded The resource is operating in a reduced capacity

Use Ready: True only when the full reconcile is complete and the resource is functional. Use Ready: False with a clear Message when reconcile fails or is blocked.

Setting conditions in Go

meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionFalse,
    ObservedGeneration: bp.Generation,
    Reason:             "CronJobCreateFailed",
    Message:            fmt.Sprintf("failed to create CronJob: %v", err),
})

meta.SetStatusCondition handles deduplication — it updates an existing condition of the same Type rather than appending a duplicate.

observedGeneration is critical

metadata.generation      = 5   (increments on every spec change)
status.observedGeneration = 3  (set by controller on each reconcile)

If observedGeneration < generation:
  → controller has not yet reconciled the latest spec change
  → status.conditions reflect an older state
  → do NOT alert based on conditions that lag generation

Always set ObservedGeneration: bp.Generation when writing status conditions. Tooling (Argo CD, Flux, kubectl wait) depends on this to know whether status is current.

kubectl wait uses conditions

# Wait until BackupPolicy is Ready
kubectl wait bp/nightly -n demo \
  --for=condition=Ready \
  --timeout=60s

This works because kubectl wait reads the status.conditions array.


Owner References: Automatic Garbage Collection

Owner references wire a parent-child relationship between Kubernetes objects. When the parent is deleted, Kubernetes garbage-collects all owned children automatically.

metadata:
  name: nightly-backup       # CronJob
  ownerReferences:
    - apiVersion: storage.example.com/v1alpha1
      kind: BackupPolicy
      name: nightly
      uid: a1b2c3d4-...
      controller: true          # only one owner can be the controller
      blockOwnerDeletion: true  # the GC waits for this owner before deleting child

Set in Go using ctrl.SetControllerReference:

if err := ctrl.SetControllerReference(bp, cronJob, r.Scheme); err != nil {
    return ctrl.Result{}, err
}

Owner reference rules

  • Owner and owned object must be in the same namespace — cluster-scoped objects cannot own namespaced objects
  • Only one object can be the controller: true owner; others can be non-controller owners
  • Deleting the owner cascades to deleting owned objects — this is garbage collection, not finalizer-based cleanup

Without owner references, deleting a BackupPolicy leaves the CronJob as an orphan. This is hard to detect and accumulates over time.


RBAC Patterns for Multi-Tenant CRD Usage

A production CRD deployment needs three distinct RBAC roles:

# 1. Controller role — full access for the operator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 2. Editor role — for application teams (namespaced binding)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  # No status write — only the controller writes status
  # No finalizers write — prevents deletion blocking by non-controllers
---
# 3. Viewer role — for audit, monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]

Bind editor/viewer roles at namespace scope, not cluster scope:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-backup-editor
  namespace: team-alpha
subjects:
  - kind: Group
    name: team-alpha
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: backuppolicy-editor
  apiGroup: rbac.authorization.k8s.io

This pattern gives team-alpha full control over BackupPolicies in their namespace but no access to other namespaces — standard Kubernetes multi-tenancy.


The Three Production Failure Modes

1. Finalizer death loop

Symptoms: Object stuck in Terminating for hours; kubectl get bp nightly shows DeletionTimestamp set but object exists.

Cause: cleanupExternalResources always returns an error.

Detection:

kubectl get bp nightly -n demo -o jsonpath='{.metadata.deletionTimestamp}'
# non-empty = stuck in termination
kubectl describe bp nightly -n demo
# look for repeated reconcile error events

Fix: Add cleanup deadline in controller; use kubectl patch to remove finalizer as last resort.

2. Status thrash

Symptoms: Controller sets Ready: True, then Ready: False, then Ready: True in a rapid loop. Alert noise, confusing dashboards.

Cause: Each reconcile compares actual state incorrectly due to cache lag — it sees its own status write as a change, re-reconciles, and flips the status again.

Fix: Set ObservedGeneration on every condition. Compare generation with observedGeneration before re-reconciling. Use meta.IsStatusConditionTrue to check current condition before overwriting it with the same value.

// Only update status if it changed
current := meta.FindStatusCondition(bp.Status.Conditions, "Ready")
if current == nil || current.Status != desired.Status || current.Reason != desired.Reason {
    meta.SetStatusCondition(&bpCopy.Status.Conditions, desired)
    r.Status().Update(ctx, bpCopy)
}

3. CRD deletion cascade

Symptoms: A team deletes a CRD for cleanup purposes; all instances across all namespaces disappear silently.

Cause: kubectl delete crd backuppolicies.storage.example.com — the API server cascades the deletion to all custom resources of that type.

Prevention:
– Add a resourcelock annotation on production CRDs managed by your operator
– Use GitOps (Argo CD, Flux) to manage CRD installation — a deleted CRD is automatically re-applied from the Git source
– Back up CRDs and instances with velero or equivalent before any CRD management operations


Production Readiness Checklist

CRD DEFINITION
  □ spec.versions has exactly one storage: true version
  □ Status subresource enabled (subresources.status: {})
  □ additionalPrinterColumns includes Ready column from status.conditions
  □ OpenAPI schema defines required fields and types
  □ CEL rules cover cross-field constraints

CONTROLLER
  □ Owner references set on all child resources
  □ Finalizer logic includes cleanup deadline
  □ Status conditions use standard format with observedGeneration
  □ Reconcile function is idempotent
  □ Not-found errors handled cleanly (return nil, not error)
  □ At least 2 replicas with leader election enabled

RBAC
  □ Three ClusterRoles: controller, editor, viewer
  □ Status and finalizers are separate RBAC sub-resources
  □ Editor/viewer bound at namespace scope, not cluster scope
  □ Controller ServiceAccount has only necessary permissions

OPERATIONS
  □ CRD installed via GitOps or Helm (not manual kubectl apply)
  □ Backup of CRDs and instances included in cluster backup
  □ kubectl get crds shows Established: True for all CRDs
  □ Monitoring for stuck Terminating objects (finalizer deadlock)
  □ Alert on controller reconcile error rate, not just pod health

⚠ Common Mistakes

Granting update on backuppolicies but not backuppolicies/status to the controller. If the controller cannot write status, status updates silently fail. The controller appears to run but status conditions never update. Grant both backuppolicies (for spec/metadata writes) and backuppolicies/status (for the status subresource path).

Setting Ready: True before all owned resources are healthy. If the controller sets Ready: True after creating the CronJob but before verifying the CronJob is actually active, users see a false-positive health signal. Only set Ready: True when you have confirmed the desired state is actually achieved.

Not setting observedGeneration on status conditions. Tools like Argo CD and kubectl wait --for=condition=Ready will report incorrect health status if observedGeneration is stale. Always set ObservedGeneration: obj.Generation in every condition write.

Using kubectl delete crd in a production cluster without a backup. This is irreversible. Treat CRDs as production-critical infrastructure — require GitOps review, backup verification, and team approval before any CRD deletion.


Quick Reference

# Check for stuck Terminating objects
kubectl get backuppolicies -A --field-selector metadata.deletionTimestamp!=''

# Force-remove a stuck finalizer (use only when cleanup is truly impossible)
kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'

# Check all CRDs are Established
kubectl get crds -o jsonpath='{range .items[*]}{.metadata.name} {.status.conditions[?(@.type=="Established")].status}{"\n"}{end}'

# Watch status conditions update during reconcile
kubectl get bp nightly -n demo -w -o \
  jsonpath='{.status.conditions[?(@.type=="Ready")].status} {.status.conditions[?(@.type=="Ready")].message}{"\n"}'

# Verify owner references are set on child CronJob
kubectl get cronjob nightly-backup -n demo \
  -o jsonpath='{.metadata.ownerReferences}'

# List all objects owned by a BackupPolicy (by label)
kubectl get all -n demo -l backuppolicy=nightly

Key Takeaways

  • Finalizers block deletion until cleanup completes — always implement a cleanup deadline to prevent permanent stuck objects
  • Status conditions must use the standard format with observedGeneration — tooling depends on it for correctness
  • Owner references enable automatic garbage collection of child resources when the parent is deleted
  • RBAC needs three roles (controller, editor, viewer) with status and finalizers as separate sub-resources
  • The three production failure modes — finalizer death loop, status thrash, CRD deletion cascade — are all preventable with the patterns covered in this episode

Series Complete

You now have the full picture of Kubernetes CRDs and Operators: from understanding what a CRD is (EP01), through real examples (EP02), schema design (EP03), hands-on YAML (EP04), CEL validation (EP05), the controller loop (EP06), building an operator (EP07), versioning (EP08), admission webhooks (EP09), to production patterns in this episode.

The next series in the Kubernetes learning arc on linuxcent.com covers Kubernetes Networking Deep Dive — Services, Ingress, Gateway API, CNI, and eBPF networking. Subscribe below to get it when it launches.

Stay subscribed → linuxcent.com

Admission Webhooks: Validating and Mutating Requests Before They Reach etcd

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 9
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes admission webhooks are HTTPS endpoints called by the API server synchronously on every create/update/delete — before the object reaches etcd
    (two types: mutating webhooks modify the object; validating webhooks approve or reject it — mutating runs first, then validating)
  • Use a validating webhook when you need to reject objects based on state you cannot express in CEL: checking if a referenced Secret exists, enforcing cross-resource quotas, consulting an external policy engine
  • Use a mutating webhook when you need to inject defaults or sidecar containers that depend on context you cannot express in the CRD schema (environment-specific defaults, sidecar injection)
  • Admission webhooks are an availability dependency — if your webhook is unreachable, the API requests it covers will fail. failurePolicy: Ignore is the safety valve; use it only for non-critical webhooks
  • OPA/Gatekeeper and Kyverno are admission webhook platforms — they let you write policy as code (Rego, YAML) instead of writing Go webhook handlers
  • For CRD-specific validation that only depends on the object itself, prefer CEL (EP05) — webhooks are for rules that require external lookups or cross-resource checks

The Big Picture

  KUBERNETES ADMISSION CHAIN (full picture)

  kubectl apply -f backuppolicy.yaml
        │
        ▼
  API Server: authentication + authorization
        │
        ▼
  1. Mutating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return modified object │
     │ Examples: inject annotations,          │
     │ set defaults, add sidecars            │
     └───────────────────────────────────────┘
        │
        ▼
  2. Schema validation (OpenAPI + CEL)
        │
        ▼
  3. Validating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return allow/deny     │
     │ Examples: quota checks, cross-        │
     │ resource validation, policy engines   │
     └───────────────────────────────────────┘
        │
        ▼ (allowed)
  etcd storage

Kubernetes admission webhooks are how tools like Istio inject sidecars, Kyverno enforces policies, and OPA/Gatekeeper applies organizational guardrails — all without modifying Kubernetes source code. Understanding them completes the picture of how Kubernetes is extended beyond CRDs.


Validating vs Mutating: When to Use Each

  DECISION TREE: CEL vs Validating Webhook vs Mutating Webhook

  "I need to validate a field value"
      │
      ├── Depends only on the object being submitted?
      │   → Use CEL (x-kubernetes-validations) — EP05
      │
      └── Needs to look up another resource, quota, or external system?
          → Use Validating Admission Webhook

  "I need to set default values or inject content"
      │
      ├── Defaults depend only on other fields in the same object?
      │   → Use OpenAPI schema defaults or CEL
      │
      └── Defaults depend on environment, namespace labels, or external config?
          → Use Mutating Admission Webhook

Practical examples:

Rule Right tool
retentionDays must be ≤ 365 CEL
if storageClass=premium then retentionDays ≤ 90 CEL
Referenced SecretStore must exist in the same namespace Validating webhook
BackupPolicy count per namespace must not exceed team quota Validating webhook
Inject costCenter annotation from namespace labels Mutating webhook
Inject backup-agent sidecar into all Pods in labeled namespaces Mutating webhook
Enforce that all BackupPolicies have a team label Kyverno or OPA policy

The Webhook Request/Response Contract

Both webhook types receive an AdmissionReview object and return an AdmissionReview response.

Request (from API server to webhook):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "kind": {"group": "storage.example.com", "version": "v1alpha1", "kind": "BackupPolicy"},
    "resource": {"group": "storage.example.com", "version": "v1alpha1", "resource": "backuppolicies"},
    "operation": "CREATE",
    "userInfo": {"username": "alice", "groups": ["system:authenticated"]},
    "object": { /* full BackupPolicy JSON */ },
    "oldObject": null
  }
}

Response for a validating webhook (allow):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "allowed": true
  }
}

Response for a validating webhook (deny):

{
  "response": {
    "uid": "...",
    "allowed": false,
    "status": {
      "code": 422,
      "message": "referenced SecretStore 'aws-secrets-manager' not found in namespace 'production'"
    }
  }
}

Response for a mutating webhook (allow + patch):

{
  "response": {
    "uid": "...",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL21ldGFkYXRhL2Fubm90YXRpb25zL2Nvc3RDZW50ZXIiLCJ2YWx1ZSI6ImVuZ2luZWVyaW5nIn1d"
    // base64-encoded JSON patch:
    // [{"op":"add","path":"/metadata/annotations/costCenter","value":"engineering"}]
  }
}

Writing a Validating Webhook with kubebuilder

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --programmatic-validation

Edit api/v1alpha1/backuppolicy_webhook.go:

package v1alpha1

import (
    "context"
    "fmt"

    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/webhook/admission"
    esov1beta1 "github.com/external-secrets/external-secrets/apis/externalsecrets/v1beta1"
)

type BackupPolicyCustomValidator struct {
    Client client.Client
}

//+kubebuilder:webhook:path=/validate-storage-example-com-v1alpha1-backuppolicy,mutating=false,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create;update,versions=v1alpha1,name=vbackuppolicy.kb.io,admissionReviewVersions=v1

func (v *BackupPolicyCustomValidator) SetupWebhookWithManager(mgr ctrl.Manager) error {
    v.Client = mgr.GetClient()
    return ctrl.NewWebhookManagedBy(mgr).
        For(&BackupPolicy{}).
        WithValidator(v).
        Complete()
}

// ValidateCreate validates a new BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    bp := obj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateUpdate validates an updated BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
    bp := newObj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateDelete is a no-op here.
func (v *BackupPolicyCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    return nil, nil
}

// validateSecretStoreRef checks that the referenced SecretStore exists in the same namespace.
func (v *BackupPolicyCustomValidator) validateSecretStoreRef(ctx context.Context, bp *BackupPolicy) error {
    ref := bp.Spec.SecretStoreRef
    if ref == "" {
        return nil  // optional field; CEL handles it if required
    }

    store := &esov1beta1.SecretStore{}
    err := v.Client.Get(ctx, types.NamespacedName{Name: ref, Namespace: bp.Namespace}, store)
    if apierrors.IsNotFound(err) {
        return fmt.Errorf("referenced SecretStore %q not found in namespace %q",
            ref, bp.Namespace)
    }
    return err  // nil on found, real error on API failure
}

Writing a Mutating Webhook: Cost Center Injection

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --defaulting

Edit the defaulting webhook:

//+kubebuilder:webhook:path=/mutate-storage-example-com-v1alpha1-backuppolicy,mutating=true,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create,versions=v1alpha1,name=mbackuppolicy.kb.io,admissionReviewVersions=v1

func (r *BackupPolicy) Default() {
    // Default is called by kubebuilder's webhook framework on admission.
    // The webhook handler calls this and patches the object.
    //
    // This runs AFTER API server schema defaults — use it for context-dependent defaults.
}

// For namespace-label-based injection, implement the full webhook handler instead:
type BackupPolicyMutator struct {
    Client client.Client
}

func (m *BackupPolicyMutator) Handle(ctx context.Context, req admission.Request) admission.Response {
    bp := &BackupPolicy{}
    if err := json.Unmarshal(req.Object.Raw, bp); err != nil {
        return admission.Errored(http.StatusBadRequest, err)
    }

    // Fetch the namespace to read its labels
    ns := &corev1.Namespace{}
    if err := m.Client.Get(ctx, types.NamespacedName{Name: bp.Namespace}, ns); err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }

    // Inject costCenter annotation from namespace label
    if costCenter, ok := ns.Labels["billing/cost-center"]; ok {
        if bp.Annotations == nil {
            bp.Annotations = make(map[string]string)
        }
        bp.Annotations["billing/cost-center"] = costCenter
    }

    marshaled, err := json.Marshal(bp)
    if err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }
    return admission.PatchResponseFromRaw(req.Object.Raw, marshaled)
}

The WebhookConfiguration Resource

The ValidatingWebhookConfiguration tells the API server which webhooks exist and which resources/operations they handle:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: backup-operator-validating-webhook
  annotations:
    cert-manager.io/inject-ca-from: backup-operator-system/backup-operator-serving-cert
webhooks:
  - name: vbackuppolicy.kb.io
    admissionReviewVersions: ["v1"]
    clientConfig:
      service:
        name: backup-operator-webhook-service
        namespace: backup-operator-system
        path: /validate-storage-example-com-v1alpha1-backuppolicy
    rules:
      - apiGroups:   ["storage.example.com"]
        apiVersions: ["v1alpha1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["backuppolicies"]
    failurePolicy: Fail          # Fail = reject request if webhook unreachable
    sideEffects: None
    timeoutSeconds: 10
    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values: ["kube-system"]  # never webhook kube-system objects

failurePolicy: Fail vs Ignore

  failurePolicy: Fail (default)
  ──────────────────────────────
  If webhook is unreachable → API request fails with 500
  Use when: the validation is critical (quota enforcement, policy)
  Risk: your webhook becoming unavailable breaks all covered API operations

  failurePolicy: Ignore
  ──────────────────────────────
  If webhook is unreachable → API request proceeds as if webhook allowed it
  Use when: the webhook is advisory or can be bypassed safely
  Risk: policy is silently not enforced during webhook outage

For production operators, use failurePolicy: Fail but ensure high availability:
– Run at least 2 webhook pod replicas with PodDisruptionBudget
– Use cert-manager for automatic TLS certificate rotation
– Set timeoutSeconds to a value that allows graceful degradation (5–10s)
– Exclude system namespaces with namespaceSelector


OPA/Gatekeeper and Kyverno: Webhooks as Policy Platforms

Writing raw webhook handlers in Go is powerful but heavyweight for policy enforcement. OPA/Gatekeeper and Kyverno are webhook-based policy engines that let you express policies as code:

Kyverno (YAML-based policies):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-backup-label
spec:
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds: ["BackupPolicy"]
      validate:
        message: "BackupPolicy must have a 'team' label"
        pattern:
          metadata:
            labels:
              team: "?*"

OPA/Gatekeeper (Rego-based policies):

package backuppolicy

deny[msg] {
    input.request.kind.kind == "BackupPolicy"
    not input.request.object.metadata.labels["team"]
    msg := "BackupPolicy must have a 'team' label"
}

Both run as admission webhooks that the API server calls. The policy language sits on top of the webhook plumbing. For organizational policy enforcement across many resource types, these tools outperform custom Go webhook handlers.


⚠ Common Mistakes

Webhook covering * resources or * operations. A webhook covering all resources in the cluster is a reliability risk — a bug in the webhook or an outage breaks everything. Scope webhooks to exactly the resources and operations they need with rules[].resources and rules[].operations.

No TLS certificate rotation. Webhook endpoints require a TLS certificate that the API server trusts. Certificates expire. Using cert-manager with the cert-manager.io/inject-ca-from annotation automates this. Without it, expired certificates cause silent webhook outages (the API server rejects the TLS handshake, triggering failurePolicy behavior).

Not excluding system namespaces. If a validating webhook covers Pods and has failurePolicy: Fail, and the webhook pod itself crashes, the API server cannot create a new webhook pod because the webhook rejects the creation. Use namespaceSelector to exclude kube-system and your operator’s own namespace.

Treating webhook latency as free. Every API operation covered by a webhook adds a synchronous HTTP round-trip. On a busy cluster creating thousands of objects per minute, a 100ms webhook latency becomes significant. Set timeoutSeconds, profile webhook performance, and scope rules narrowly.


Quick Reference

# List all webhook configurations
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

# Inspect webhook rules and failure policy
kubectl describe validatingwebhookconfiguration backup-operator-validating-webhook

# Temporarily disable a webhook for debugging (dangerous in production)
kubectl delete validatingwebhookconfiguration backup-operator-validating-webhook

# Check webhook endpoint certificate
kubectl get secret backup-operator-webhook-server-cert \
  -n backup-operator-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# Test webhook is reachable from a cluster node
kubectl run webhook-test --image=curlimages/curl --rm -it --restart=Never -- \
  curl -k https://backup-operator-webhook-service.backup-operator-system.svc:443/healthz

Key Takeaways

  • Mutating webhooks modify objects at admission; validating webhooks approve or reject them — mutating runs before validating
  • Use CEL for rules that depend only on the submitted object; use webhooks when you need external lookups or cross-resource checks
  • failurePolicy: Fail blocks API requests if the webhook is unreachable — ensure high availability before using it
  • Always exclude system namespaces and scope rules to specific resource types to minimize the blast radius of webhook failures
  • OPA/Gatekeeper and Kyverno are admission webhook platforms for policy-as-code — prefer them over custom Go handlers for organizational policy enforcement

What’s Next

EP10: Kubernetes CRDs in Production ties the full series together — finalizer design patterns, status condition conventions, owner references, RBAC for multi-tenant CRD usage, and the production failure modes that catch teams off guard.

Get EP10 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Versioning: From v1alpha1 to v1 Without Breaking Clients

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 8
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes CRD versioning lets you evolve your API from v1alpha1 to v1 without deleting existing custom resources or breaking clients still using the old version
    (storage version = the version etcd actually stores objects in; served versions = the versions the API server responds to; you can serve v1alpha1 and v1 simultaneously while migrating)
  • The hub-and-spoke model is the recommended conversion architecture: one “hub” version (usually v1) that every other version converts to/from
  • Without a conversion webhook, the API server only allows one served version at a time — you must use a webhook to serve multiple versions with schema differences
  • kubectl storage-version-migrator (or manual re-apply) migrates existing objects from the old storage version to the new one after you update storage: true
  • Changing field names between versions without a conversion webhook corrupts data silently — always test conversion round-trips before promoting a version

The Big Picture

  CRD VERSION LIFECYCLE

  Stage 1: Alpha                 Stage 2: Beta              Stage 3: Stable
  ──────────────────             ──────────────             ──────────────
  v1alpha1                       v1alpha1 (deprecated)      v1alpha1 (removed)
    served: true                   served: true               served: false
    storage: true                  storage: false             storage: false
                                 v1beta1                    v1beta1 (deprecated)
                                   served: true               served: true
                                   storage: false             storage: false
                                 v1                         v1
                                   served: true               served: true
                                   storage: true              storage: true

  Clients using v1alpha1:         The API server converts     Eventually remove
  still work via conversion       on the fly                  old served versions
  webhook

Kubernetes CRD versioning is what allows you to ship BackupPolicy v1alpha1 today, learn from real usage, evolve the schema to v1 with renamed fields and new constraints, and keep existing clusters running without a migration window.


Why Versioning Is Necessary

When BackupPolicy v1alpha1 shipped, the spec used retentionDays. After six months of production use, the team learns:

  • retentionDays should be renamed to retention.days (nested under a retention object for future extensibility)
  • A new required field backupFormat needs to be added with a default of tar.gz
  • The targets field should be renamed to includedNamespaces

These are breaking changes. Clients (GitOps repos, Helm charts, other operators) still have YAML referencing v1alpha1 with the old field names. You cannot simply rename the fields.

The solution: add v1 with the new schema, run both versions simultaneously via a conversion webhook, migrate objects to the new storage version, then deprecate v1alpha1.


Simple Case: Non-Breaking Addition (No Webhook Needed)

If you only add new optional fields to the schema — no renames, no removals — you can add a new version without a conversion webhook, as long as only one version is served at a time.

versions:
  - name: v1alpha1
    served: false      # stop serving old version
    storage: false
    schema: ...
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              schedule:
                type: string
              retentionDays:
                type: integer
              backupFormat:          # new optional field
                type: string
                default: "tar.gz"

Existing objects stored as v1alpha1 are served as v1 with the new field defaulted. This works for purely additive changes because the stored bytes are compatible with the new schema.

When this is not enough: field renames, type changes, field removal, or structural reorganization all require a conversion webhook.


The Hub-and-Spoke Model

For breaking schema changes, the API server needs a conversion webhook. The recommended architecture is hub-and-spoke:

  HUB-AND-SPOKE CONVERSION

       v1alpha1
          │
          ▼ convert to hub
         v1  (hub)
          ▲
          │ convert to hub
       v1beta1

  Every version converts TO the hub and FROM the hub.
  The hub is always the storage version.
  Two-version conversion: v1alpha1 → v1 → v1beta1
  Never directly: v1alpha1 → v1beta1

This means you only write N conversion functions (one per version) rather than N² (one per version pair). As you add versions, the conversion complexity grows linearly.


Writing a Conversion Webhook

The conversion webhook is an HTTPS endpoint that the API server calls when it needs to convert an object between versions.

1. Define the conversion hub

In the kubebuilder project, mark v1 as the hub:

In api/v1/backuppolicy_conversion.go:

package v1

// Hub marks this type as the conversion hub.
func (*BackupPolicy) Hub() {}

2. Implement conversion in v1alpha1

In api/v1alpha1/backuppolicy_conversion.go:

package v1alpha1

import (
    "fmt"
    v1 "github.com/example/backup-operator/api/v1"
    "sigs.k8s.io/controller-runtime/pkg/conversion"
)

// ConvertTo converts v1alpha1 BackupPolicy to v1 (the hub).
func (src *BackupPolicy) ConvertTo(dstRaw conversion.Hub) error {
    dst := dstRaw.(*v1.BackupPolicy)

    // Metadata
    dst.ObjectMeta = src.ObjectMeta

    // Field mapping: v1alpha1 → v1
    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.BackupFormat  = "tar.gz"           // new field: default for old objects
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended

    // Renamed field: retentionDays → retention.days
    dst.Spec.Retention = v1.RetentionSpec{
        Days: src.Spec.RetentionDays,
    }

    // Renamed field: targets → includedNamespaces
    for _, t := range src.Spec.Targets {
        dst.Spec.IncludedNamespaces = append(dst.Spec.IncludedNamespaces,
            v1.NamespaceTarget{
                Namespace:      t.Namespace,
                IncludeSecrets: t.IncludeSecrets,
            })
    }

    dst.Status = v1.BackupPolicyStatus(src.Status)
    return nil
}

// ConvertFrom converts v1 (hub) BackupPolicy back to v1alpha1.
func (dst *BackupPolicy) ConvertFrom(srcRaw conversion.Hub) error {
    src := srcRaw.(*v1.BackupPolicy)

    dst.ObjectMeta = src.ObjectMeta

    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended
    dst.Spec.RetentionDays = src.Spec.Retention.Days

    for _, n := range src.Spec.IncludedNamespaces {
        dst.Spec.Targets = append(dst.Spec.Targets, BackupTarget{
            Namespace:      n.Namespace,
            IncludeSecrets: n.IncludeSecrets,
        })
    }

    // backupFormat cannot be round-tripped to v1alpha1 (no such field)
    // Store it in an annotation to preserve the value if the object is
    // re-converted back to v1.
    if src.Spec.BackupFormat != "" && src.Spec.BackupFormat != "tar.gz" {
        if dst.Annotations == nil {
            dst.Annotations = make(map[string]string)
        }
        dst.Annotations["storage.example.com/backup-format"] = src.Spec.BackupFormat
    }

    dst.Status = BackupPolicyStatus(src.Status)
    return nil
}

3. Register the webhook

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --conversion

This generates the webhook server setup. Deploy with a TLS certificate (cert-manager can manage this automatically via the kubebuilder //+kubebuilder:webhook:... marker).


Updating the CRD to Reference the Webhook

spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: backup-operator-webhook-service
          namespace: backup-operator-system
          path: /convert
      conversionReviewVersions: ["v1", "v1beta1"]
  versions:
    - name: v1alpha1
      served: true
      storage: false
      schema: ...
    - name: v1
      served: true
      storage: true
      schema: ...

Once applied, kubectl get backuppolicies.v1alpha1.storage.example.com/nightly and kubectl get backuppolicies.v1.storage.example.com/nightly both work — the API server converts transparently.


Migrating Existing Objects to the New Storage Version

After changing storage: true from v1alpha1 to v1, existing objects in etcd are still stored as v1alpha1 bytes. They are served correctly (via conversion) but are not yet migrated.

Migrate them:

# Option 1: Manual re-apply (works for small object counts)
kubectl get backuppolicies -A -o name | while read name; do
  kubectl apply -f <(kubectl get $name -o yaml)
done

# Option 2: Storage Version Migrator (automated, for large clusters)
# Install: https://github.com/kubernetes-sigs/kube-storage-version-migrator
kubectl apply -f storageVersionMigration.yaml

After migration, all objects in etcd are stored as v1. You can then set v1alpha1 served: false to stop serving the old version.


Storage Version Migration Checklist

  SAFE VERSION PROMOTION CHECKLIST

  □ New version (v1) has served: true, storage: true
  □ Old version (v1alpha1) has served: true, storage: false
  □ Conversion webhook deployed and healthy
  □ Round-trip conversion tested (v1alpha1 → v1 → v1alpha1 preserves all data)
  □ kubectl get backuppolicies works at both versions
  □ Existing objects migrated (re-applied or migration job run)
  □ Old version set to served: false (stop serving)
  □ Old version removed from CRD after N release cycles

⚠ Common Mistakes

Changing the storage version without a conversion webhook. If you flip storage: true from v1alpha1 to v1 while still serving v1alpha1, the API server tries to read stored v1alpha1 bytes as v1 and fails. Always deploy the conversion webhook before changing the storage version.

Lossy conversion. If ConvertFrom (v1 → v1alpha1) drops a field that exists in v1, objects are silently corrupted when a v1alpha1 client reads and re-saves them. Round-trip test every conversion: original → hub → original must produce identical objects (or use annotations to preserve fields that cannot round-trip).

Forgetting to migrate existing objects. After changing the storage version, existing objects are still stored in the old format. They convert on read, but etcd still holds old bytes. Until migrated, your etcd backup/restore story is broken — restoring from backup would restore old-format bytes that need conversion.


Quick Reference

# Check which version is currently the storage version
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.storedVersions}'
# output: ["v1alpha1"]  or  ["v1alpha1","v1"]  or  ["v1"]

# Verify conversion webhook is reachable
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.spec.conversion.webhook.clientConfig}'

# Read an object at a specific version
kubectl get backuppolicies.v1alpha1.storage.example.com/nightly -n demo -o yaml
kubectl get backuppolicies.v1.storage.example.com/nightly -n demo -o yaml

# Check CRD conditions (NamesAccepted, Established)
kubectl describe crd backuppolicies.storage.example.com | grep -A5 Conditions

Key Takeaways

  • CRD versioning lets you evolve the schema without a migration window — old and new versions coexist via a conversion webhook
  • The hub-and-spoke model minimizes conversion code: N functions, not N² — the hub version is always the storage version
  • Never change the storage version without a deployed conversion webhook for breaking schema changes
  • Conversion must be lossless — fields that cannot round-trip should be preserved in annotations
  • Migrate existing objects to the new storage version after promoting it, then deprecate the old served version

What’s Next

EP09: Admission Webhooks completes the Kubernetes extension picture — validating and mutating webhooks that intercept API requests before they reach etcd, when to use them alongside CRDs, and how they differ from CEL validation.

Get EP09 in your inbox when it publishes → subscribe at linuxcent.com

Build a Simple Kubernetes Operator with controller-runtime and kubebuilder

Reading Time: 7 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 7
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Building a Kubernetes operator means writing a Go reconciler with controller-runtime — kubebuilder scaffolds the project structure, RBAC markers, and Makefile targets so you focus on the reconcile logic
    (kubebuilder = a CLI and framework that generates the operator project scaffold; controller-runtime = the Go library that provides the informer cache, work queue, and reconciler interface)
  • The reconciler for BackupPolicy in this episode creates and manages a CronJob — it is the behavior layer for the CRD built in EP03–EP05
  • RBAC is expressed as Go code comments (//+kubebuilder:rbac:...) — kubebuilder generates the ClusterRole YAML from them
  • Run the operator locally with make run during development; no cluster deployment needed until ready
  • The same project that builds the operator also builds and installs the CRD — make install applies the CRD YAML generated from your Go types
  • Testing: the operator ships with envtest — a local API server + etcd for controller testing without a real cluster

The Big Picture

  OPERATOR PROJECT STRUCTURE (kubebuilder scaffold)

  backup-operator/
  ├── api/v1alpha1/
  │   ├── backuppolicy_types.go     ← Go types that define CRD schema
  │   └── groupversion_info.go
  ├── internal/controller/
  │   └── backuppolicy_controller.go ← reconcile logic (our main focus)
  ├── config/
  │   ├── crd/                       ← generated CRD YAML
  │   ├── rbac/                      ← generated RBAC YAML
  │   └── manager/                   ← controller Deployment YAML
  ├── cmd/main.go                    ← entrypoint, sets up the manager
  └── Makefile                       ← build, test, install, deploy targets

  FLOW:
  Go types → kubebuilder generate → CRD YAML + RBAC YAML
  Reconcile function → runs in cluster → watches BackupPolicy → manages CronJobs

Building a Kubernetes operator with controller-runtime is where CRDs become living infrastructure — the BackupPolicy objects created in EP04 now get actual behavior attached to them.


Prerequisites

# Go 1.22+
go version

# kubebuilder CLI
curl -L -o kubebuilder \
  https://github.com/kubernetes-sigs/kubebuilder/releases/latest/download/kubebuilder_linux_amd64
chmod +x kubebuilder
sudo mv kubebuilder /usr/local/bin/

# A running cluster (kind works well for development)
kind create cluster --name operator-dev

# Verify kubectl works
kubectl cluster-info --context kind-operator-dev

Step 1: Scaffold the Project

mkdir backup-operator && cd backup-operator

# Initialize the Go module and project structure
kubebuilder init \
  --domain storage.example.com \
  --repo github.com/example/backup-operator

# Create the API (Go types + controller scaffold)
kubebuilder create api \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --resource \
  --controller

When prompted:

Create Resource [y/n]: y
Create Controller [y/n]: y

The generated directory tree:

backup-operator/
├── api/
│   └── v1alpha1/
│       ├── backuppolicy_types.go
│       └── groupversion_info.go
├── internal/
│   └── controller/
│       └── backuppolicy_controller.go
├── cmd/
│   └── main.go
├── config/
│   ├── crd/bases/
│   ├── rbac/
│   └── manager/
├── go.mod
├── go.sum
└── Makefile

Step 2: Define the Go Types

Edit api/v1alpha1/backuppolicy_types.go to match the schema from EP03:

package v1alpha1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BackupTarget specifies a namespace to include in the backup.
type BackupTarget struct {
    Namespace      string `json:"namespace"`
    IncludeSecrets bool   `json:"includeSecrets,omitempty"`
}

// BackupPolicySpec defines the desired state of BackupPolicy.
type BackupPolicySpec struct {
    // Schedule is a cron expression for when to run backups.
    // +kubebuilder:validation:Pattern=`^(\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+)$`
    Schedule string `json:"schedule"`

    // RetentionDays is how long to keep backup snapshots.
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=365
    RetentionDays int32 `json:"retentionDays"`

    // StorageClass is the storage class to use for backup volumes.
    // +kubebuilder:default=standard
    // +kubebuilder:validation:Enum=standard;premium;encrypted;archive
    StorageClass string `json:"storageClass,omitempty"`

    // Targets lists the namespaces and resources to include.
    // +kubebuilder:validation:MaxItems=20
    Targets []BackupTarget `json:"targets,omitempty"`

    // Suspended pauses backup execution when true.
    // +kubebuilder:default=false
    Suspended bool `json:"suspended,omitempty"`
}

// BackupPolicyStatus defines the observed state of BackupPolicy.
type BackupPolicyStatus struct {
    // Conditions reflect the current state of the BackupPolicy.
    Conditions []metav1.Condition `json:"conditions,omitempty"`

    // LastBackupTime is when the most recent backup completed.
    LastBackupTime *metav1.Time `json:"lastBackupTime,omitempty"`

    // CronJobName is the name of the managed CronJob.
    CronJobName string `json:"cronJobName,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Schedule",type=string,JSONPath=`.spec.schedule`
// +kubebuilder:printcolumn:name="Retention",type=integer,JSONPath=`.spec.retentionDays`
// +kubebuilder:printcolumn:name="Suspended",type=boolean,JSONPath=`.spec.suspended`
// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=='Ready')].status`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// BackupPolicy is the Schema for the backuppolicies API.
type BackupPolicy struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   BackupPolicySpec   `json:"spec,omitempty"`
    Status BackupPolicyStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// BackupPolicyList contains a list of BackupPolicy.
type BackupPolicyList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []BackupPolicy `json:"items"`
}

func init() {
    SchemeBuilder.Register(&BackupPolicy{}, &BackupPolicyList{})
}

Regenerate the CRD YAML and DeepCopy methods:

make generate   # regenerates zz_generated.deepcopy.go
make manifests  # regenerates CRD YAML under config/crd/bases/

Step 3: Write the Reconciler

Edit internal/controller/backuppolicy_controller.go:

package controller

import (
    "context"
    "fmt"

    batchv1 "k8s.io/api/batch/v1"
    corev1 "k8s.io/api/core/v1"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/meta"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    storagev1alpha1 "github.com/example/backup-operator/api/v1alpha1"
)

// BackupPolicyReconciler reconciles BackupPolicy objects.
type BackupPolicyReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// RBAC markers — kubebuilder generates ClusterRole YAML from these comments.
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/finalizers,verbs=update
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Step 1: Fetch the BackupPolicy
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        if apierrors.IsNotFound(err) {
            // Object deleted before we could reconcile — nothing to do.
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, fmt.Errorf("fetching BackupPolicy: %w", err)
    }

    // Step 2: Define the desired CronJob name
    cronJobName := fmt.Sprintf("%s-backup", bp.Name)

    // Step 3: Fetch the existing CronJob (if any)
    existing := &batchv1.CronJob{}
    err := r.Get(ctx, types.NamespacedName{Name: cronJobName, Namespace: bp.Namespace}, existing)
    notFound := apierrors.IsNotFound(err)
    if err != nil && !notFound {
        return ctrl.Result{}, fmt.Errorf("fetching CronJob: %w", err)
    }

    // Step 4: Build the desired CronJob
    desired := r.buildCronJob(bp, cronJobName)

    // Step 5: Create or update
    if notFound {
        logger.Info("Creating CronJob", "name", cronJobName)
        if err := r.Create(ctx, desired); err != nil {
            return ctrl.Result{}, fmt.Errorf("creating CronJob: %w", err)
        }
    } else {
        // Update schedule and suspend state if they differ
        if existing.Spec.Schedule != desired.Spec.Schedule ||
            existing.Spec.Suspend != desired.Spec.Suspend {
            existing.Spec.Schedule = desired.Spec.Schedule
            existing.Spec.Suspend = desired.Spec.Suspend
            logger.Info("Updating CronJob", "name", cronJobName)
            if err := r.Update(ctx, existing); err != nil {
                return ctrl.Result{}, fmt.Errorf("updating CronJob: %w", err)
            }
        }
    }

    // Step 6: Update status
    bpCopy := bp.DeepCopy()
    meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
        Type:               "Ready",
        Status:             metav1.ConditionTrue,
        Reason:             "CronJobReady",
        Message:            fmt.Sprintf("CronJob %s is configured", cronJobName),
        ObservedGeneration: bp.Generation,
    })
    bpCopy.Status.CronJobName = cronJobName

    if err := r.Status().Update(ctx, bpCopy); err != nil {
        return ctrl.Result{}, fmt.Errorf("updating status: %w", err)
    }

    return ctrl.Result{}, nil
}

func (r *BackupPolicyReconciler) buildCronJob(bp *storagev1alpha1.BackupPolicy, name string) *batchv1.CronJob {
    suspend := bp.Spec.Suspended
    retentionArg := fmt.Sprintf("--retention-days=%d", bp.Spec.RetentionDays)

    cj := &batchv1.CronJob{
        ObjectMeta: metav1.ObjectMeta{
            Name:      name,
            Namespace: bp.Namespace,
            Labels: map[string]string{
                "app.kubernetes.io/managed-by": "backup-operator",
                "backuppolicy":                 bp.Name,
            },
        },
        Spec: batchv1.CronJobSpec{
            Schedule: bp.Spec.Schedule,
            Suspend:  &suspend,
            JobTemplate: batchv1.JobTemplateSpec{
                Spec: batchv1.JobSpec{
                    Template: corev1.PodTemplateSpec{
                        Spec: corev1.PodSpec{
                            RestartPolicy: corev1.RestartPolicyOnFailure,
                            Containers: []corev1.Container{
                                {
                                    Name:    "backup",
                                    Image:   "backup-tool:latest",
                                    Args:    []string{retentionArg},
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    // Set owner reference — CronJob is garbage-collected when BackupPolicy is deleted
    _ = ctrl.SetControllerReference(bp, cj, r.Scheme)
    return cj
}

// SetupWithManager registers the controller with the manager and declares what to watch.
func (r *BackupPolicyReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&storagev1alpha1.BackupPolicy{}).
        Owns(&batchv1.CronJob{}).    // reconcile BackupPolicy when owned CronJob changes
        Complete(r)
}

Step 4: Install the CRD and Run Locally

# Install the CRD into the cluster
make install
customresourcedefinition.apiextensions.k8s.io/backuppolicies.storage.example.com created
# Run the controller locally (outside the cluster)
make run
2026-04-25T08:00:00Z  INFO  Starting manager
2026-04-25T08:00:00Z  INFO  Starting workers  {"controller": "backuppolicy", "worker count": 1}

In a separate terminal:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: default
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
EOF

Watch the controller output:

2026-04-25T08:01:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

Check the result:

kubectl get bp nightly
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       True    10s
kubectl get cronjob nightly-backup
NAME             SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
nightly-backup   0 2 * * *   False     0        <none>          10s

Test self-healing — delete the CronJob and watch the controller recreate it:

kubectl delete cronjob nightly-backup
# Controller output:
# 2026-04-25T08:02:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

kubectl get cronjob nightly-backup
# Back within seconds

Test suspend:

kubectl patch bp nightly --type=merge -p '{"spec":{"suspended":true}}'
kubectl get cronjob nightly-backup -o jsonpath='{.spec.suspend}'
# true

Step 5: Deploy to Cluster

When ready for in-cluster deployment:

# Build and push the controller image
make docker-build docker-push IMG=your-registry/backup-operator:v0.1.0

# Deploy to cluster (creates Deployment, RBAC, CRD)
make deploy IMG=your-registry/backup-operator:v0.1.0
kubectl get pods -n backup-operator-system
NAME                                          READY   STATUS    RESTARTS   AGE
backup-operator-controller-manager-abc123     2/2     Running   0          30s

Understanding the RBAC Markers

The //+kubebuilder:rbac:... comments in the controller generate the ClusterRole YAML when you run make manifests:

//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

Generated YAML under config/rbac/role.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: manager-role
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

This approach keeps RBAC co-located with the code that needs it — if you add a new resource access in the controller, you add the marker next to it.


⚠ Common Mistakes

Not setting an owner reference on child resources. Without ctrl.SetControllerReference(parent, child, scheme), deleting the BackupPolicy leaves orphaned CronJobs. Owner references enable automatic garbage collection of child resources.

Updating the object after r.Get() without handling conflicts. If two reconciles run concurrently (possible after a controller restart), both may try to update the same resource. The API server uses resource version for optimistic concurrency — you will get a conflict error. Retry the reconcile on conflict errors rather than failing.

Writing to bp directly instead of bp.DeepCopy() for status updates. If the status update fails and you retry, the original bp object now has the modified status in memory. Always update a deep copy when writing status so the in-memory state stays consistent with what was actually persisted.

Not watching owned resources. If you forget .Owns(&batchv1.CronJob{}) in SetupWithManager, the controller will not reconcile when a CronJob is deleted. Self-healing requires watching the resources you manage.


Quick Reference

# Scaffold a new API + controller
kubebuilder create api --group mygroup --version v1alpha1 --kind MyKind

# Regenerate deep copy methods after changing types
make generate

# Regenerate CRD YAML + RBAC from markers
make manifests

# Install CRD into current cluster
make install

# Run controller locally (outside cluster)
make run

# Build + push image, then deploy to cluster
make docker-build docker-push IMG=registry/operator:tag
make deploy IMG=registry/operator:tag

# Uninstall CRD (WARNING: deletes all instances)
make uninstall

Key Takeaways

  • kubebuilder scaffolds the project; you write the types and the reconcile function
  • Go struct markers (//+kubebuilder:...) generate the CRD YAML and RBAC — keep them close to the code they describe
  • ctrl.SetControllerReference enables automatic garbage collection of child resources
  • Always deep-copy the object before writing status; retry on conflict errors
  • make run runs the controller locally — no Docker build needed during development

What’s Next

EP08: Kubernetes CRD Versioning covers how to evolve the BackupPolicy schema from v1alpha1 to v1 without breaking existing clients — storage versions, conversion webhooks, and the hub-and-spoke model for safe API evolution in production clusters.

Get EP08 in your inbox when it publishes → subscribe at linuxcent.com

The Kubernetes Controller Reconcile Loop: How CRDs Come Alive at Runtime

Reading Time: 7 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 6
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • The Kubernetes controller reconcile loop is the mechanism that makes CRDs do something — it watches custom resources, compares desired state (spec) to actual state, and takes actions to close the gap
    (reconcile = “make actual match desired”; the loop runs repeatedly because the world is not static — things drift, fail, and change)
  • Controllers do not receive events like webhooks — they receive object names from a work queue, then re-read the full object from the API server cache
  • The reconcile function is idempotent: calling it ten times with the same object must produce the same result as calling it once
  • controller-runtime is the Go library that provides the informer cache, work queue, and reconciler interface — kubebuilder scaffolds controllers on top of it
  • Kubernetes uses the same reconcile loop internally — the Deployment controller, ReplicaSet controller, and node lifecycle controller all follow this exact pattern
  • A failed reconcile returns an error or explicit requeue request; the controller retries with exponential backoff, not an infinite tight loop

The Big Picture

  THE KUBERNETES CONTROLLER RECONCILE LOOP

  etcd
   │ change event
   ▼
  Informer cache
  (API server-side list+watch,
   local in-memory replica)
   │ cache update → enqueue object name
   ▼
  Work queue
  (rate-limited, deduplicating)
   │ dequeue: "demo/nightly"
   ▼
  Reconcile(ctx, Request{Name, Namespace})
   │
   ├── 1. Fetch object from cache
   │        if not found → ignore (already deleted)
   │
   ├── 2. Read spec (desired state)
   │
   ├── 3. Read actual state
   │        (check child resources, external systems)
   │
   ├── 4. Compare: actual vs desired
   │
   ├── 5. Act: create/update/delete child resources
   │        OR update external system
   │
   └── 6. Update status with outcome
           └── return Result{}, nil      → done
               return Result{Requeue}, nil → retry after delay
               return Result{}, err     → immediate retry + backoff

The Kubernetes controller reconcile loop is what separates a CRD (validated storage) from an operator (automated behavior). Understanding this loop is the prerequisite for writing controllers that work correctly under failure, partial completion, and concurrent modification.


What “Reconcile” Actually Means

Reconcile means: look at what the user asked for (spec), look at what actually exists, and do whatever is needed to make actual match desired.

The key insight is that this is not event-driven in the traditional sense. A controller does not receive a “diff” — it receives a name. It reads the full current state of the object and acts accordingly.

This matters because:

  1. Multiple events get deduplicated. If a BackupPolicy is updated five times in one second, the work queue delivers one reconcile call, not five.
  2. The reconcile is stateless. The controller should not maintain in-memory state about what it “did last time.” It re-reads everything on each reconcile.
  3. Partial failure is safe. If the reconcile fails halfway through, the next reconcile re-reads actual state and continues from where it left off.

The Informer Cache

Controllers do not call the API server directly for every read. They use an informer — a list-and-watch mechanism that maintains a local in-memory copy of all objects of a given type.

  HOW THE INFORMER CACHE WORKS

  Controller startup:
  ┌─────────────────────────────────────────────────────┐
  │ 1. List all BackupPolicies from API server          │
  │    → populate local cache                           │
  │ 2. Establish a Watch stream                         │
  │    → receive incremental updates                    │
  │ 3. For each update: update cache + enqueue object   │
  └─────────────────────────────────────────────────────┘

  On reconcile:
  ┌─────────────────────────────────────────────────────┐
  │ controller reads from LOCAL cache (not API server)  │
  │ → fast, no network round-trip per reconcile         │
  │ → cache is eventually consistent                    │
  └─────────────────────────────────────────────────────┘

Cache consistency: After writing a change (creating a child Secret, for example), re-reading from the cache may return the old state for a brief period. This is normal and expected. Well-written controllers handle this by returning a requeue rather than assuming the write is immediately visible.


Walking Through a Reconcile for BackupPolicy

Suppose a user creates this BackupPolicy:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
  targets:
    - namespace: production

The controller’s reconcile function runs. Here is what it does conceptually:

Reconcile(ctx, {Namespace: "demo", Name: "nightly"})

Step 1: Fetch BackupPolicy "demo/nightly" from cache
  → found; spec.schedule = "0 2 * * *", spec.retentionDays = 30

Step 2: Check if a CronJob for this BackupPolicy exists
  → kubectl get cronjob nightly-backup -n demo
  → not found

Step 3: Gap detected: CronJob should exist but doesn't
  → Create CronJob "nightly-backup" in namespace "demo"
    spec.schedule = "0 2 * * *"
    spec.jobTemplate.spec.template.spec.containers[0].args = ["--retention=30"]

Step 4: Set owner reference on CronJob pointing to BackupPolicy
  → CronJob is now garbage-collected if BackupPolicy is deleted

Step 5: Update BackupPolicy status
  → conditions: [{type: Ready, status: True, reason: CronJobCreated}]
  → lastScheduleTime: null (not yet run)

Step 6: Return Result{}, nil   → reconcile complete

Next time the BackupPolicy is modified (e.g., suspended: true):

Reconcile(ctx, {Namespace: "demo", Name: "nightly"})

Step 1: Fetch → spec.suspended = true

Step 2: Fetch CronJob "nightly-backup"
  → found; spec.suspend = false  ← actual state

Step 3: Gap: CronJob.spec.suspend should be true but is false
  → Patch CronJob: set spec.suspend = true

Step 4: Update status
  → conditions: [{type: Ready, status: True, reason: Suspended}]

Step 5: Return Result{}, nil

Idempotency: The Essential Property

The reconcile function must be idempotent. If it runs ten times with the same object state, the result must be the same as if it ran once.

Why? Because the controller framework delivers at-least-once semantics — your reconcile function will be called more than once for the same object state, especially at startup (the informer re-lists all objects) and after controller restarts.

Non-idempotent (wrong):

// Creates a new CronJob every time, even if one already exists
err := r.Create(ctx, cronJob)

Idempotent (correct):

// Only creates if it doesn't exist; updates if it does
existing := &batchv1.CronJob{}
err := r.Get(ctx, types.NamespacedName{Name: jobName, Namespace: ns}, existing)
if apierrors.IsNotFound(err) {
    err = r.Create(ctx, cronJob)
} else if err == nil {
    // update if spec differs
    existing.Spec = cronJob.Spec
    err = r.Update(ctx, existing)
}

The get-before-create pattern is the most basic idempotency mechanism. controller-runtime provides CreateOrUpdate helpers that codify this.


Requeue and Retry Semantics

The reconcile function returns a (Result, error) pair:

return Result{}, nil
  → Reconcile succeeded. Re-run only if object changes again.

return Result{RequeueAfter: 5 * time.Minute}, nil
  → Reconcile succeeded, but requeue in 5 minutes regardless.
  → Used for: polling external system, TTL-based refresh.

return Result{Requeue: true}, nil
  → Requeue immediately (with rate limiting).
  → Used for: cache not yet consistent after a write.

return Result{}, err
  → Reconcile failed. Retry with exponential backoff.
  → Used for: API errors, transient failures.
  RETRY BEHAVIOR

  First failure  → retry after ~1s
  Second failure → retry after ~2s
  Third failure  → retry after ~4s
  ...
  Max backoff    → ~16min (controller-runtime default)

  Object changes (new version from informer) → reset backoff, reconcile immediately

Do not return Result{Requeue: true}, nil in a tight loop — this saturates the work queue and starves other objects. If you need to poll, use RequeueAfter with a meaningful interval.


Watches: What Triggers a Reconcile

The controller does not only watch the primary resource (BackupPolicy). It also watches child resources and maps child changes back to the parent:

  WATCH CONFIGURATION (conceptual)

  Controller watches:
    BackupPolicy (primary) → reconcile when BackupPolicy changes
    CronJob (child/owned)  → reconcile BackupPolicy owner when CronJob changes
    ConfigMap (watched)    → reconcile BackupPolicy when referenced ConfigMap changes

If a user accidentally deletes the CronJob that the controller created:

  1. CronJob deletion event arrives in the informer
  2. Controller maps the deleted CronJob → its owner BackupPolicy
  3. BackupPolicy is enqueued
  4. Reconcile runs, detects missing CronJob, recreates it

This “self-healing” behavior — where controllers reconcile the world back to desired state — is the core operational value of operators. It is not magic; it is the result of watching child resources and re-running reconcile when they drift.


Level-Triggered vs Edge-Triggered

Kubernetes controllers are level-triggered, not edge-triggered. This distinction matters:

  EDGE-TRIGGERED (not what Kubernetes uses)
  → "BackupPolicy was updated FROM retained-30 TO retained-7"
  → If event is lost, the update is lost forever

  LEVEL-TRIGGERED (what Kubernetes uses)
  → "BackupPolicy exists with retentionDays=7"
  → On every reconcile, the controller reads the current level (state)
  → Missing an event is safe — the next reconcile corrects the state

Level-triggered design is why controllers survive restarts, network partitions, and lost events gracefully. The reconcile does not need to track “what changed” — it only needs to know “what is the desired state right now.”


The Same Pattern in Kubernetes Core

Every built-in Kubernetes controller follows this loop:

Controller Watches Manages Reconciles
Deployment controller Deployment ReplicaSets desired replicas ↔ actual ReplicaSet count
ReplicaSet controller ReplicaSet Pods desired replicas ↔ running Pod count
Node lifecycle controller Node Node conditions NotReady nodes → taint, evict pods
Service controller (cloud) Service LoadBalancer cloud LB exists ↔ Service spec

The BackupPolicy controller you will build in EP07 follows exactly the same structure as the Deployment controller.


⚠ Common Mistakes

Reading from the API server directly instead of the cache. Every reconcile reading directly from the API server (not the informer cache) creates N×M load on the API server as the number of objects and reconcile frequency grows. Always read via the controller’s cached client.

Not handling “not found” on object fetch. If a reconcile is triggered but the object has been deleted by the time reconcile runs, the cache returns “not found.” This is normal — the correct response is to return Result{}, nil, not an error.

Tight requeue loop on recoverable error. Returning Result{Requeue: true}, nil or Result{}, err on every call creates an infinite busy-loop. Use RequeueAfter for expected wait conditions, and only return errors for unexpected failures that should back off.

Mutable reconcile state. Do not store reconcile state in struct fields on the reconciler. The reconciler is shared across goroutines; mutable fields cause race conditions. Everything transient must be local to the reconcile function.


Quick Reference

Reconcile input:
  ctx context.Context
  req ctrl.Request   → {Namespace: "demo", Name: "nightly"}

Reconcile output:
  (ctrl.Result, error)

Common returns:
  Result{}, nil                        → done, wait for next change
  Result{Requeue: true}, nil           → retry now (rate limited)
  Result{RequeueAfter: 5*time.Minute}  → retry in 5 minutes
  Result{}, err                        → retry with backoff

Key operations:
  r.Get(ctx, req.NamespacedName, &obj)     → fetch from cache
  r.Create(ctx, &obj)                      → create in API server
  r.Update(ctx, &obj)                      → full update
  r.Patch(ctx, &obj, patch)                → partial update
  r.Delete(ctx, &obj)                      → delete
  r.Status().Update(ctx, &obj)             → update status only

Key Takeaways

  • The reconcile loop reads desired state from spec, reads actual state from the cluster, and closes the gap — on every trigger, not just on changes
  • Controllers use an informer cache for reads — fast, eventually consistent, does not hammer the API server
  • Idempotency is not optional: the reconcile function will be called multiple times with the same state
  • Level-triggered design means missing events is safe — the next reconcile corrects any drift
  • Return values from reconcile control retry behavior: RequeueAfter for polling, err for failures, nil for success

What’s Next

EP07: Build a Simple Kubernetes Operator with controller-runtime puts the reconcile loop into practice — kubebuilder scaffold, a complete reconciler for BackupPolicy, RBAC markers, and running the operator locally against a real cluster.

Get EP07 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD CEL Validation: Replace Admission Webhooks for Schema Rules

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 5
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes CRD CEL validation (x-kubernetes-validations) lets you write arbitrary validation rules in the CRD schema — no admission webhook needed
    (CEL = Common Expression Language, a lightweight expression language built into Kubernetes since 1.25 stable; replaces most reasons you would write a validating admission webhook)
  • CEL rules are evaluated by the API server at admit time — the same place as OpenAPI schema validation, before etcd
  • self refers to the current object’s field; oldSelf refers to the previous value (for update rules)
  • Cross-field validation: “if storageClass is premium, retentionDays must be ≤ 90″ — impossible with plain OpenAPI schema, trivial with CEL
  • Immutable fields: oldSelf == self with reason: Immutable prevents users from changing values after creation
  • CEL rules run in ~microseconds inside the API server; no external service, no TLS, no latency budget to manage

The Big Picture

  CEL VALIDATION: WHERE IT FITS IN THE ADMISSION CHAIN

  kubectl apply -f backup.yaml
         │
         ▼
  API Server admission chain
  ┌────────────────────────────────────────────────────┐
  │                                                    │
  │  1. Mutating admission webhooks (modify object)    │
  │  2. Schema validation (OpenAPI types, required,    │
  │     minimum/maximum, pattern)                      │
  │  3. CEL validation (x-kubernetes-validations)  ←  │ THIS EPISODE
  │  4. Validating admission webhooks (external)       │
  │                                                    │
  └────────────────────────────────────────────────────┘
         │
         ▼ (passes all checks)
  etcd storage

Kubernetes CRD CEL validation sits between schema validation and external webhooks. For most validation requirements, CEL eliminates the need for a webhook entirely — which means no separate deployment to maintain, no TLS certificates to rotate, no availability dependency between your CRD and a webhook server.


Why CEL Replaces Most Admission Webhooks

Before CEL (stable in Kubernetes 1.25), the only way to express “if field A has value X, field B must be present” was an admission webhook — a separate HTTP server that Kubernetes called synchronously during every API request.

Webhooks work, but they have real costs:

  • Availability dependency: if the webhook is down, creates/updates for that resource type fail
  • TLS management: webhook endpoints require valid TLS certs that must be rotated
  • Deployment overhead: another Deployment, Service, and certificate to manage
  • Latency: every API operation waits for an HTTP round-trip

CEL runs inside the API server process. There is no network call, no certificate, no separate deployment. Rules are compiled once and evaluated in microseconds.

The trade-off: CEL cannot make network calls or access state outside the object being validated. For rules that need to look up other resources (e.g., “does this referenced Secret exist?”), you still need a webhook or a controller that validates via status conditions.


CEL Syntax Basics

CEL expressions are small programs. In Kubernetes CRD validation, the key variables are:

Variable Meaning
self The current field value (or root object at top level)
oldSelf The previous value of the field (only available on update; nil on create)

CEL returns true (validation passes) or false (validation fails, API returns error).

Common patterns:

# String not empty
self.size() > 0

# String matches format
self.matches('^[a-z][a-z0-9-]*$')

# Integer in range
self >= 1 && self <= 365

# Field present (for optional fields)
has(self.fieldName)

# Conditional: if A then B
!has(self.premium) || self.retentionDays <= 90

# List not empty
self.size() > 0

# All items in list satisfy condition
self.all(item, item.namespace.size() > 0)

# Cross-field: access sibling field via parent
self.retentionDays >= self.minRetentionDays

Adding CEL Rules to the BackupPolicy CRD

Start from the CRD built in EP04. Add x-kubernetes-validations at the levels where you need them.

Rule 1: Cron expression validation

The OpenAPI pattern field can validate basic structure, but a proper cron regex is unwieldy. CEL is cleaner:

spec:
  type: object
  required: ["schedule", "retentionDays"]
  x-kubernetes-validations:
    - rule: "self.schedule.matches('^(\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+)$')"
      message: "schedule must be a valid 5-field cron expression"

Rule 2: Cross-field validation

spec:
  type: object
  x-kubernetes-validations:
    - rule: "!(self.storageClass == 'premium') || self.retentionDays <= 90"
      message: "premium storage class supports at most 90 days retention"
    - rule: "!self.suspended || !has(self.pausedBy) || self.pausedBy.size() > 0"
      message: "when suspended is true, pausedBy must be non-empty if provided"

Rule 3: Immutable fields

Once a BackupPolicy is created, the schedule field should not be changeable without deleting and recreating:

schedule:
  type: string
  x-kubernetes-validations:
    - rule: "self == oldSelf"
      message: "schedule is immutable after creation"
      reason: Immutable

reason field: Available reasons are FieldValueInvalid (default), FieldValueForbidden, FieldValueRequired, and Immutable. Using Immutable returns HTTP 422 with a clear message that the field cannot be changed.

Rule 4: Conditional required field

If storageClass is encrypted, then encryptionKeyRef must be present:

spec:
  type: object
  x-kubernetes-validations:
    - rule: "self.storageClass != 'encrypted' || has(self.encryptionKeyRef)"
      message: "encryptionKeyRef is required when storageClass is 'encrypted'"

Rule 5: List element validation

Ensure each target namespace is a valid RFC 1123 DNS label:

targets:
  type: array
  items:
    type: object
    x-kubernetes-validations:
      - rule: "self.namespace.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?$')"
        message: "namespace must be a valid DNS label"

The Complete Updated CRD with CEL

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  scope: Namespaced
  names:
    plural:     backuppolicies
    singular:   backuppolicy
    kind:       BackupPolicy
    shortNames: [bp]
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: ["spec"]
          properties:
            spec:
              type: object
              required: ["schedule", "retentionDays"]
              x-kubernetes-validations:
                - rule: "!(self.storageClass == 'premium') || self.retentionDays <= 90"
                  message: "premium storage class supports at most 90 days retention"
              properties:
                schedule:
                  type: string
                  x-kubernetes-validations:
                    - rule: "self == oldSelf"
                      message: "schedule is immutable after creation"
                      reason: Immutable
                retentionDays:
                  type: integer
                  minimum: 1
                  maximum: 365
                storageClass:
                  type: string
                  default: "standard"
                  enum: ["standard", "premium", "encrypted", "archive"]
                encryptionKeyRef:
                  type: string
                targets:
                  type: array
                  maxItems: 20
                  items:
                    type: object
                    required: ["namespace"]
                    x-kubernetes-validations:
                      - rule: "self.namespace.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?$')"
                        message: "namespace must be a valid DNS label"
                    properties:
                      namespace:
                        type: string
                      includeSecrets:
                        type: boolean
                        default: false
                suspended:
                  type: boolean
                  default: false
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}
      additionalPrinterColumns:
        - name: Schedule
          type: string
          jsonPath: .spec.schedule
        - name: Retention
          type: integer
          jsonPath: .spec.retentionDays
        - name: Ready
          type: string
          jsonPath: .status.conditions[?(@.type=='Ready')].status
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Testing CEL Rules

Apply the updated CRD:

kubectl apply -f backuppolicies-crd-cel.yaml

Test cross-field validation:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: premium-long
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 180          # violates: premium + > 90 days
  storageClass: premium
EOF
The BackupPolicy "premium-long" is invalid:
  spec: Invalid value: "object":
    premium storage class supports at most 90 days retention

Test immutability:

# Create valid policy
kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: immutable-test
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
EOF

# Try to change the schedule
kubectl patch bp immutable-test -n demo \
  --type=merge -p '{"spec":{"schedule":"0 3 * * *"}}'
The BackupPolicy "immutable-test" is invalid:
  spec.schedule: Invalid value: "0 3 * * *":
    schedule is immutable after creation

Test list element validation:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad-namespace
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 7
  targets:
    - namespace: "UPPERCASE_IS_INVALID"
EOF
The BackupPolicy "bad-namespace" is invalid:
  spec.targets[0]: Invalid value: "object":
    namespace must be a valid DNS label

CEL Cost and Limits

CEL expressions are evaluated at admission time in the API server. Kubernetes imposes cost limits to prevent expressions from consuming excessive CPU:

  • Each expression is assigned a cost based on its operations (string matches, list iteration, etc.)
  • If the expression cost exceeds the per-validation limit, the API server rejects the CRD itself when you apply it
  • Complex all() over large lists is the most common way to hit cost limits

If you hit a cost limit error:

CustomResourceDefinition is invalid: spec.validation.openAPIV3Schema...
  CEL expression cost exceeds budget

Solutions:
– Reduce list traversal in CEL rules; enforce list length with maxItems instead
– Split one expensive rule into multiple simpler rules
– Move the expensive validation to a controller (status condition) rather than admission


⚠ Common Mistakes

Using oldSelf on create. On create operations, oldSelf is nil/unset. A rule like self == oldSelf for immutability will panic on create unless you guard it: oldSelf == null || self == oldSelf. In practice, Kubernetes applies immutable rules only on updates (the reason: Immutable annotation helps here), but be explicit in rules that reference oldSelf.

Forgetting has() checks for optional fields. If encryptionKeyRef is optional (not in required) and you write a rule like self.encryptionKeyRef.size() > 0, it will fail with a “no such key” error when the field is absent. Always guard optional field access with has(self.fieldName).

Overloading CEL for what a controller should do. CEL validates fields at admission. If your rule needs to verify that a referenced Secret actually exists, CEL cannot do that — it only sees the object being submitted. Use a controller status condition for existence checks, not CEL.


Quick Reference: Common CEL Patterns

# String not empty
self.size() > 0

# String matches regex
self.matches('^[a-z][a-z0-9-]{1,62}$')

# Optional field guard
!has(self.fieldName) || self.fieldName.size() > 0

# Conditional requirement
!(condition) || has(self.requiredWhenConditionIsTrue)

# Immutable field (update only)
self == oldSelf

# All list items satisfy condition
self.all(item, item.namespace.size() > 0)

# At least one list item satisfies condition
self.exists(item, item.type == 'primary')

# Cross-field comparison
self.minReplicas <= self.maxReplicas

# Enum-style check
self.in(['standard', 'premium', 'archive'])

Key Takeaways

  • x-kubernetes-validations with CEL rules replaces most validating admission webhooks for CRD-specific logic
  • CEL runs inside the API server — no external service, no TLS, no separate deployment
  • Cross-field validation, immutable fields, and conditional requirements are all expressible in CEL
  • Use has() guards for optional fields; use oldSelf carefully (it is nil on create)
  • CEL has cost limits — avoid unbounded list iteration; use maxItems to bound lists first

What’s Next

EP06: The Kubernetes Controller Reconcile Loop explains how a controller watches BackupPolicy objects and acts on them — the mechanism that makes CRDs useful beyond validated configuration storage. Before writing code in EP07, you need to understand the reconcile loop conceptually.

Get EP06 in your inbox when it publishes → subscribe at linuxcent.com

Write Your First Kubernetes CRD: A Hands-On YAML Walkthrough

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 4
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Writing a Kubernetes CRD requires five YAML files: the CRD itself, a ClusterRole/ClusterRoleBinding, a namespaced Role/RoleBinding for consumers, and a sample custom resource
  • The BackupPolicy CRD built in this episode is the running example throughout the rest of the series — operators, versioning, and production patterns all use it
  • Apply the CRD, verify it with kubectl get crds, create a custom resource, and watch the API server validate your spec
  • RBAC for CRDs follows the same Role/ClusterRole model as built-in resources — the generated resource name is {plural}.{group}
  • Schema validation fires at apply time: bad field types, missing required fields, and out-of-range values all return clear errors before anything reaches etcd
  • Without a controller, a BackupPolicy is stored in etcd but nothing acts on it — that is the topic of EP05 and EP07

The Big Picture

  WHAT WE'RE BUILDING IN THIS EPISODE

  1. backuppolicies-crd.yaml        ← registers the BackupPolicy type
  2. backuppolicies-rbac.yaml       ← controls who can create/view/delete
  3. nightly-backup.yaml            ← our first custom resource instance

  After applying:

  kubectl get crds | grep backup      ← BackupPolicy type exists
  kubectl get backuppolicies -n demo  ← nightly instance exists
  kubectl describe bp nightly -n demo ← spec visible, status empty
  kubectl apply -f bad-backup.yaml    ← schema validation rejects bad data

Writing your first Kubernetes CRD is the step that bridges understanding CRDs conceptually to operating them in a real cluster. This episode is hands-on — every block of YAML is something you apply and verify.


Prerequisites

You need a running Kubernetes cluster and kubectl configured. Any of these work:

# Local options
kind create cluster --name crd-demo
# or
minikube start

# Verify cluster access
kubectl cluster-info
kubectl get nodes

Step 1: Write the CRD

Save this as backuppolicies-crd.yaml:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  scope: Namespaced
  names:
    plural:     backuppolicies
    singular:   backuppolicy
    kind:       BackupPolicy
    shortNames:
      - bp
    categories:
      - storage
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: ["spec"]
          properties:
            spec:
              type: object
              required: ["schedule", "retentionDays"]
              properties:
                schedule:
                  type: string
                  description: "Cron expression (e.g. '0 2 * * *' for 02:00 daily)"
                retentionDays:
                  type: integer
                  minimum: 1
                  maximum: 365
                  description: "How many days to retain backup snapshots"
                storageClass:
                  type: string
                  default: "standard"
                  description: "StorageClass to use for backup volumes"
                targets:
                  type: array
                  description: "Namespaces and resources to include in the backup"
                  maxItems: 20
                  items:
                    type: object
                    required: ["namespace"]
                    properties:
                      namespace:
                        type: string
                      includeSecrets:
                        type: boolean
                        default: false
                suspended:
                  type: boolean
                  default: false
                  description: "Set to true to pause backup execution"
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}
      additionalPrinterColumns:
        - name: Schedule
          type: string
          jsonPath: .spec.schedule
        - name: Retention
          type: integer
          jsonPath: .spec.retentionDays
        - name: Suspended
          type: boolean
          jsonPath: .spec.suspended
        - name: Ready
          type: string
          jsonPath: .status.conditions[?(@.type=='Ready')].status
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Apply it:

kubectl apply -f backuppolicies-crd.yaml

Verify it registered correctly:

kubectl get crds backuppolicies.storage.example.com
NAME                                    CREATED AT
backuppolicies.storage.example.com      2026-04-25T08:00:00Z

Check the API server now knows about it:

kubectl api-resources | grep backuppolic
backuppolicies    bp    storage.example.com/v1alpha1    true    BackupPolicy

Check it is Established:

kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.conditions[?(@.type=="Established")].status}'
True

If you see False or empty output, wait a few seconds and retry — the API server takes a moment to register new CRDs.


Step 2: Write RBAC

CRDs follow the same RBAC model as built-in resources. The resource name is {plural}.{group}.

Save this as backuppolicies-rbac.yaml:

# ClusterRole for operators/controllers that manage BackupPolicy objects
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
---
# Role for application teams to manage BackupPolicies in their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# Read-only role for auditors
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]
kubectl apply -f backuppolicies-rbac.yaml

Verify the roles exist:

kubectl get clusterrole | grep backuppolicy
backuppolicy-controller   2026-04-25T08:01:00Z
backuppolicy-editor       2026-04-25T08:01:00Z
backuppolicy-viewer       2026-04-25T08:01:00Z

Note on backuppolicies/status: The separate status RBAC rule is only meaningful if you enabled the status subresource (we did). Without it, status and spec share the same update path.


Step 3: Create a Namespace and Your First Custom Resource

kubectl create namespace demo

Save this as nightly-backup.yaml:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: demo
  labels:
    app.kubernetes.io/managed-by: manual
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
  storageClass: standard
  targets:
    - namespace: production
      includeSecrets: false
    - namespace: staging
      includeSecrets: false
  suspended: false

Apply it:

kubectl apply -f nightly-backup.yaml

Get it back:

kubectl get backuppolicies -n demo
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       <none>  5s

The Ready column is <none> because there is no controller writing status yet. The custom resource exists and is stored in etcd, but nothing is acting on it.

Describe it:

kubectl describe bp nightly -n demo
Name:         nightly
Namespace:    demo
Labels:       app.kubernetes.io/managed-by=manual
Annotations:  <none>
API Version:  storage.example.com/v1alpha1
Kind:         BackupPolicy
Metadata:
  Creation Timestamp:  2026-04-25T08:05:00Z
  ...
Spec:
  Retention Days:  30
  Schedule:        0 2 * * *
  Storage Class:   standard
  Suspended:       false
  Targets:
    Include Secrets:  false
    Namespace:        production
    Include Secrets:  false
    Namespace:        staging
Status:
Events:  <none>

Step 4: Test Schema Validation

The API server now validates every BackupPolicy against the schema. Try creating an invalid one:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad-policy
  namespace: demo
spec:
  schedule: "not-a-cron"
  retentionDays: 500
EOF
The BackupPolicy "bad-policy" is invalid:
  spec.retentionDays: Invalid value: 500:
    spec.retentionDays in body should be less than or equal to 365

Missing required field:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: missing-schedule
  namespace: demo
spec:
  retentionDays: 7
EOF
The BackupPolicy "missing-schedule" is invalid:
  spec.schedule: Required value

Wrong type:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: wrong-type
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: "thirty"
EOF
The BackupPolicy "wrong-type" is invalid:
  spec.retentionDays: Invalid value: "string":
    spec.retentionDays in body must be of type integer: "string"

All validation fires at the API boundary — before etcd, before any controller sees the object.


Step 5: Verify Default Values Apply

The schema defines storageClass: default: "standard" and suspended: default: false. Verify they are applied even when not specified:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: minimal
  namespace: demo
spec:
  schedule: "0 0 * * 0"
  retentionDays: 7
EOF

kubectl get bp minimal -n demo -o jsonpath='{.spec.storageClass}'
standard
kubectl get bp minimal -n demo -o jsonpath='{.spec.suspended}'
false

Defaults are injected by the API server at admission time. They appear in etcd and in every kubectl get -o yaml output — the stored object includes the defaults even if the user did not specify them.


Step 6: Explore the API Endpoints

Your custom resource is now available at standard REST endpoints:

kubectl proxy --port=8001 &

# List all BackupPolicies in the demo namespace
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies \
  | jq '.items[].metadata.name'
"nightly"
"minimal"
# Get a specific BackupPolicy
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies/nightly \
  | jq '.spec'

This is how controllers discover and watch custom resources — via the same API server endpoints, using informers that wrap these REST calls with efficient list-and-watch semantics.


Step 7: Clean Up

kubectl delete namespace demo
kubectl delete -f backuppolicies-rbac.yaml
kubectl delete -f backuppolicies-crd.yaml   # WARNING: deletes all BackupPolicy instances first

⚠ Common Mistakes

metadata.name does not match {plural}.{group}. The most common error. If you name the CRD backuppolicy.storage.example.com (singular) but the spec says plural: backuppolicies, the API server rejects it. The name must always be {plural}.{group}.

No required fields on spec. Without required constraints, kubectl apply accepts an empty spec: {}. The controller then receives objects with no configuration and has to handle the nil case. Define required fields in the schema.

Forgetting subresources: status: {}. Without this, controllers writing .status also overwrite .spec on full PUT updates. This causes status updates to reset user edits. Enable the status subresource from day one.

Not testing validation errors. Schema validation is the first line of defense. Always explicitly test that your required fields are required, types are enforced, and range constraints work — before deploying the controller.


Quick Reference

# All kubectl operations work on custom resources
kubectl get      backuppolicies -n demo
kubectl get      bp -n demo                  # shortName
kubectl describe bp nightly -n demo
kubectl edit     bp nightly -n demo
kubectl delete   bp nightly -n demo

# Output formats
kubectl get bp -n demo -o yaml
kubectl get bp -n demo -o json
kubectl get bp -n demo -o jsonpath='{.items[*].metadata.name}'

# Watch for changes
kubectl get bp -n demo -w

# List across all namespaces
kubectl get bp -A

# Patch spec
kubectl patch bp nightly -n demo \
  --type=merge -p '{"spec":{"suspended":true}}'

Key Takeaways

  • A working CRD deployment needs: the CRD YAML, RBAC ClusterRoles, and at least one sample custom resource
  • The API server validates all custom resources against the schema at apply time — errors are surfaced immediately, not inside the controller
  • Default values in the schema are injected at admission time and appear in every stored object
  • RBAC for custom resources uses {plural}.{group} as the resource name — status and finalizers are separate sub-resources
  • Without a controller, custom resources are stored in etcd and serve as validated configuration — nothing acts on them until a controller is deployed

What’s Next

EP05: Kubernetes CRD CEL Validation extends schema validation beyond simple type and range checks — cross-field rules (“if storageClass is premium, retentionDays must be at most 90″), regex validation beyond pattern, and immutable field enforcement. All without an admission webhook.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Schema Explained: Versions, Validation, and Status Subresource

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 3
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • The Kubernetes CRD schema is defined in spec.versions[].schema.openAPIV3Schema — the API server uses it to validate every custom resource create and update before storing in etcd
    (OpenAPI v3 schema = a JSON Schema dialect that describes the structure, types, and constraints of your resource’s fields)
  • spec.versions is a list — CRDs can serve multiple API versions simultaneously; exactly one version must have storage: true
  • scope: Namespaced vs scope: Cluster controls whether custom resources live inside a namespace or at cluster level (like PersistentVolume vs PersistentVolumeClaim)
  • spec.names defines the plural, singular, kind, and optional shortNames used in kubectl and RBAC
  • The status subresource (subresources.status: {}) separates user writes (spec) from controller writes (status) — enabling optimistic concurrency and kubectl status support
  • The scale subresource (subresources.scale) makes your custom resource compatible with kubectl scale and the HorizontalPodAutoscaler

The Big Picture

  ANATOMY OF A CUSTOMRESOURCEDEFINITION

  apiVersion: apiextensions.k8s.io/v1
  kind: CustomResourceDefinition
  metadata:
    name: {plural}.{group}        ← MUST be exactly this format
  spec:
    group: {group}                ← API group (e.g. storage.example.com)
    scope: Namespaced | Cluster   ← where instances live
    names:                        ← how kubectl refers to this resource
      plural: backuppolicies
      singular: backuppolicy
      kind: BackupPolicy
      shortNames: [bp]
    versions:                     ← can be a list; one must have storage: true
      - name: v1alpha1
        served: true              ← API server responds to this version
        storage: true             ← etcd stores objects in this version
        schema:
          openAPIV3Schema:        ← validation schema for ALL objects of this type
            type: object
            properties:
              spec: {...}
              status: {...}
        subresources:
          status: {}              ← enables separate status write path
          scale:                  ← enables kubectl scale + HPA
            specReplicasPath: .spec.replicas
            statusReplicasPath: .status.replicas
        additionalPrinterColumns: ← extra columns in kubectl get output
          - name: Schedule
            type: string
            jsonPath: .spec.schedule

Understanding the Kubernetes CRD schema is the prerequisite for writing a CRD that behaves correctly in production — validation catches bad data at the API boundary, the status subresource prevents controller race conditions, and scope determines your entire RBAC and multi-tenancy model.


spec.group and metadata.name

The group is a reverse-DNS identifier for your API. Convention:

storage.example.com     ← domain you control + functional area
monitoring.myteam.io
databases.platform.company.com

The CRD’s metadata.name must be exactly {plural}.{group}:

metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  names:
    plural: backuppolicies

If these do not match, the API server rejects the CRD with a validation error. This is the most common first-timer mistake.


spec.scope: Namespaced vs Cluster

  SCOPE DETERMINES WHERE INSTANCES LIVE

  Namespaced (scope: Namespaced)       Cluster (scope: Cluster)
  ─────────────────────────────         ──────────────────────────
  kubectl get backuppolicies -n prod    kubectl get clusterbackuppolicies
  kubectl get backuppolicies -A         (no -n flag, no namespace)

  Analogous to: Pod, Deployment,        Analogous to: PersistentVolume,
                ConfigMap                             ClusterRole, Node

Namespaced: Use when instances are per-tenant or per-application. Users with namespace-scoped RBAC can manage their own instances without cluster-admin. Most CRDs should be namespaced.

Cluster-scoped: Use when instances represent cluster-wide configuration — a ClusterIssuer (cert-manager), ClusterSecretStore (ESO), a StorageClass-like concept. Requires cluster-level RBAC to create/modify.

You cannot change scope after a CRD is created without deleting and recreating it (which deletes all instances). Choose carefully.


spec.versions: Serving Multiple API Versions

spec:
  versions:
    - name: v1alpha1
      served: true
      storage: false       # not stored; converted on read
      schema:
        openAPIV3Schema: {...}
    - name: v1beta1
      served: true
      storage: false
      schema:
        openAPIV3Schema: {...}
    - name: v1
      served: true
      storage: true        # etcd stores in this version
      schema:
        openAPIV3Schema: {...}

Rules:
served: true means the API server accepts requests at this version
served: false means the API server returns 404 for that version — use to deprecate
– Exactly one version must have storage: true — this is what gets written to etcd
– When a client requests a non-storage version, the API server converts on the fly (or calls your conversion webhook — see EP08)

Early in development, start with v1alpha1 storage: true. Promote to v1 when the schema is stable. EP08 covers how to do this without losing data.


spec.names: What kubectl Sees

spec:
  names:
    plural:     backuppolicies     # kubectl get backuppolicies
    singular:   backuppolicy       # kubectl get backuppolicy (also works)
    kind:       BackupPolicy       # used in YAML apiVersion/kind
    listKind:   BackupPolicyList   # optional; auto-derived if omitted
    shortNames:                    # kubectl get bp
      - bp
    categories:                    # kubectl get all includes this type
      - all

categories is worth noting: if you add all to categories, your custom resources appear when someone runs kubectl get all -n mynamespace. Most CRDs deliberately do not add this — it clutters get all output. Only add it if your resource is a primary operational concern.


schema.openAPIV3Schema: Validation

The schema is where you define field types, required fields, constraints, and descriptions. The API server validates every create and update against this schema before writing to etcd.

schema:
  openAPIV3Schema:
    type: object
    required: ["spec"]
    properties:
      spec:
        type: object
        required: ["schedule", "retentionDays"]
        properties:
          schedule:
            type: string
            description: "Cron expression for backup schedule"
            pattern: '^(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)$'
          retentionDays:
            type: integer
            minimum: 1
            maximum: 365
          storageClass:
            type: string
            default: "standard"        # default value (Kubernetes 1.17+)
          targets:
            type: array
            maxItems: 10
            items:
              type: object
              required: ["name"]
              properties:
                name:
                  type: string
                namespace:
                  type: string
                  default: "default"
      status:
        type: object
        x-kubernetes-preserve-unknown-fields: true   # controllers write arbitrary status

Field types available

Type Usage
string Text values; supports format, pattern, enum, minLength, maxLength
integer Whole numbers; supports minimum, maximum
number Floating point
boolean true/false
object Nested structure; use properties to define fields
array List; use items to define element schema; supports minItems, maxItems

x-kubernetes-preserve-unknown-fields: true

This tells the API server not to prune fields it does not know about. Use it on status (controllers write whatever they need) and on fields that are intentionally free-form (like a config field that accepts arbitrary YAML). Avoid it on spec — it bypasses validation.

Validation behavior in practice

# This will fail with a clear error:
kubectl apply -f - <<EOF
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad
  namespace: default
spec:
  schedule: "not-a-cron"    # fails pattern validation
  retentionDays: 500         # fails maximum: 365
EOF
The BackupPolicy "bad" is invalid:
  spec.schedule: Invalid value: "not-a-cron": spec.schedule in body should match
    '^(\*|[0-9,\-\/]+)\s+...'
  spec.retentionDays: Invalid value: 500: spec.retentionDays in body should be
    less than or equal to 365

Schema validation catches configuration mistakes at apply time, not at runtime inside a pod. This is one of the core advantages of expressing domain configuration as CRDs rather than ConfigMaps.


additionalPrinterColumns: What kubectl get Shows

By default, kubectl get backuppolicies shows only NAME and AGE. You can add columns:

additionalPrinterColumns:
  - name: Schedule
    type: string
    jsonPath: .spec.schedule
    description: Cron schedule for backups
  - name: Retention
    type: integer
    jsonPath: .spec.retentionDays
    priority: 1          # 0 = always shown; 1 = only with -o wide
  - name: Ready
    type: string
    jsonPath: .status.conditions[?(@.type=='Ready')].status
  - name: Age
    type: date
    jsonPath: .metadata.creationTimestamp

Result:

NAME        SCHEDULE      READY   AGE
nightly     0 2 * * *     True    3d
weekly      0 0 * * 0     False   7d

Good printer columns turn kubectl get into a useful operational dashboard. Include Ready (from status conditions) so operators can immediately see which custom resources are healthy without running kubectl describe.


The Status Subresource

subresources:
  status: {}

Without the status subresource, spec and status are part of the same object. Any user with update permission on the CRD can modify both. Controllers write status through the same path as users write spec.

With the status subresource enabled:
kubectl apply / kubectl patch only update spec — the status block is stripped
– Controllers use the /status subresource endpoint to write status
– RBAC can grant update on backuppolicies (spec) independently from update on backuppolicies/status

  WITHOUT status subresource:         WITH status subresource:
  ─────────────────────────            ──────────────────────────
  PUT /backuppolicies/nightly          PUT /backuppolicies/nightly
  → updates spec AND status            → updates spec only

                                       PUT /backuppolicies/nightly/status
                                       → updates status only (controller path)

Always enable the status subresource on production CRDs. The split between spec and status is fundamental to the Kubernetes API contract. Without it, a controller updating status can accidentally overwrite spec changes made by a user at the same time.


The Scale Subresource

subresources:
  scale:
    specReplicasPath: .spec.replicas
    statusReplicasPath: .status.replicas
    labelSelectorPath: .status.labelSelector

This makes your custom resource compatible with:

kubectl scale backuppolicy nightly --replicas=3

And with HorizontalPodAutoscaler targeting your custom resource. If your CRD manages something replica-based (workers, shards, connections), enabling the scale subresource lets it plug into the standard Kubernetes autoscaling ecosystem without extra plumbing.


⚠ Common Mistakes

Forgetting x-kubernetes-preserve-unknown-fields: true on status. If you validate the status field with a strict schema but do not add this, the API server will prune any status fields the controller writes that are not in the schema. The controller’s status updates will silently lose fields. Either define the full status schema or use x-kubernetes-preserve-unknown-fields: true.

Using scope: Cluster for resources that should be namespaced. Once a CRD is created as cluster-scoped, you cannot make it namespaced without deleting and recreating it. Plan scope before deploying to production.

Not enabling the status subresource. Without it, controllers writing status can race with users updating spec. It also means kubectl patch --subresource=status does not work and some tooling behaves unexpectedly. Enable it from the start.

Loose schema with no required fields. An openAPIV3Schema with no required constraint accepts objects with empty spec. This usually means your controller gets called with a resource that is missing mandatory configuration. Define required fields and validate them at the API boundary, not inside the controller.


Quick Reference

# Inspect the full schema of a CRD
kubectl get crd backuppolicies.storage.example.com -o yaml | \
  yq '.spec.versions[0].schema'

# Check what subresources are enabled
kubectl get crd certificates.cert-manager.io -o jsonpath=\
  '{.spec.versions[0].subresources}'

# See all served versions for a CRD
kubectl get crd prometheuses.monitoring.coreos.com \
  -o jsonpath='{.spec.versions[*].name}'

# Check which version is the storage version
kubectl get crd certificates.cert-manager.io \
  -o jsonpath='{.spec.versions[?(@.storage==true)].name}'

# Describe the printer columns for a CRD
kubectl get crd scaledobjects.keda.sh \
  -o jsonpath='{.spec.versions[0].additionalPrinterColumns}'

Key Takeaways

  • spec.versions allows serving and storing multiple API versions; only one version has storage: true
  • scope (Namespaced vs Cluster) cannot be changed after creation — choose deliberately
  • openAPIV3Schema validates every CR at the API boundary, before etcd storage
  • The status subresource separates the user write path (spec) from the controller write path (status) — always enable it
  • additionalPrinterColumns makes kubectl get operationally useful; include a Ready column from status conditions

What’s Next

EP04: Write Your First Kubernetes CRD puts the anatomy into practice — a complete hands-on walkthrough building a BackupPolicy CRD from scratch, applying it to a cluster, creating instances, and verifying validation, RBAC, and status behavior.

Get EP04 in your inbox when it publishes → subscribe at linuxcent.com

CRDs You Already Use: cert-manager, KEDA, and External Secrets Explained

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 2
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • cert-manager, KEDA, and External Secrets Operator are all CRD-based systems — understanding their custom resources shows you what a well-designed CRD looks like before you build one
  • cert-manager’s Certificate CRD expresses desired TLS state; the cert-manager controller reconciles that state by issuing, renewing, and storing certificates in Secrets
  • KEDA’s ScaledObject extends the HorizontalPodAutoscaler with external metrics (queue depth, Kafka lag, Prometheus queries) — the KEDA operator translates ScaledObjects into native HPA objects
  • External Secrets Operator’s ExternalSecret abstracts over secret backends (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) — the controller pulls values and writes Kubernetes Secrets
  • All three follow the same pattern: you describe desired state in a custom resource; the operator reconciles actual state to match
  • Kubernetes custom resources examples like these are the fastest way to internalize the CRD mental model before writing your own

The Big Picture

  THREE CRD-BASED OPERATORS AND WHAT THEY MANAGE

  ┌─────────────────────────────────────────────────────────────┐
  │  cert-manager                                               │
  │  Certificate CR  →  controller issues cert  →  TLS Secret  │
  └─────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │  KEDA                                                       │
  │  ScaledObject CR  →  controller creates HPA  →  Pod count  │
  └─────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │  External Secrets Operator                                  │
  │  ExternalSecret CR  →  controller pulls  →  K8s Secret      │
  │                         from Vault/AWS/GCP                  │
  └─────────────────────────────────────────────────────────────┘

  In every case:
  User creates CR  →  Operator watches CR  →  Operator acts  →  Status updated

Kubernetes custom resources examples from real tools like these reveal the design pattern you will use in every CRD you build: express desired state declaratively, let the controller bridge the gap to actual state, surface the outcome in the status subresource.


Why Look at Existing CRDs First?

Before designing your own CRD, you want to understand what good CRD design looks like from the user’s perspective. The engineers at Jetstack (cert-manager), KEDACORE (KEDA), and External Secrets contributors have collectively solved the same problems you will face:

  • What goes in spec vs status?
  • How do you reference other Kubernetes objects?
  • How do you handle secrets and credentials securely?
  • What does a healthy vs unhealthy custom resource look like?

Studying these before writing your own saves you from the most common first-timer mistakes.


cert-manager: The Certificate CRD

cert-manager is the most widely deployed CRD-based system in Kubernetes. It manages TLS certificates from Let’s Encrypt, internal CAs, and cloud providers.

The core CRDs

kubectl get crds | grep cert-manager
certificates.cert-manager.io
certificaterequests.cert-manager.io
challenges.acme.cert-manager.io
clusterissuers.cert-manager.io
issuers.cert-manager.io
orders.acme.cert-manager.io

The one you interact with most is Certificate. Here is a real example:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: production
spec:
  secretName: api-tls-cert        # cert-manager writes the TLS Secret here
  duration: 2160h                 # 90 days
  renewBefore: 720h               # renew 30 days before expiry
  subject:
    organizations:
      - example.com
  dnsNames:
    - api.example.com
    - api-internal.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

What happens after you apply this:

  1. cert-manager controller sees the new Certificate object
  2. It contacts the referenced ClusterIssuer (Let’s Encrypt in this case)
  3. It completes the ACME challenge, obtains the certificate
  4. It writes the certificate and private key into the api-tls-cert Secret
  5. It updates the Certificate object’s status to reflect success
kubectl describe certificate api-tls -n production
Status:
  Conditions:
    Last Transition Time:  2026-04-10T08:00:00Z
    Message:               Certificate is up to date and has not expired
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2026-07-09T08:00:00Z
  Not Before:              2026-04-10T08:00:00Z
  Renewal Time:            2026-06-09T08:00:00Z

What this teaches you about CRD design

  • spec.secretName — the CR references an output object by name. The controller creates or updates that object.
  • spec.issuerRef — the CR references another custom resource (ClusterIssuer) by name. This is a common pattern for separating configuration concerns.
  • status.conditions — the standard Kubernetes condition pattern: type, status, reason, message. You will use the same structure in your own CRDs.
  • The controller owns status — users own spec. This separation is a core convention.

KEDA: The ScaledObject CRD

KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes autoscaling beyond CPU and memory. It can scale deployments based on queue depth, Kafka consumer lag, Prometheus metric values, and dozens of other event sources.

The core CRDs

kubectl get crds | grep keda
clustertriggerauthentications.keda.sh
scaledjobs.keda.sh
scaledobjects.keda.sh
triggerauthentications.keda.sh

A ScaledObject ties a Deployment to an external scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor        # the Deployment to scale
  minReplicaCount: 0             # scale to zero when idle
  maxReplicaCount: 50
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
        queueLength: "5"         # target: 5 messages per pod
        awsRegion: us-east-1
      authenticationRef:
        name: keda-sqs-auth      # TriggerAuthentication for AWS credentials

What KEDA does with this:

  1. KEDA controller sees the ScaledObject
  2. It creates a native HorizontalPodAutoscaler object targeting the order-processor Deployment
  3. KEDA’s metrics adapter polls the SQS queue depth and exposes it as a custom metric
  4. The HPA uses that metric to scale replicas — including to zero when the queue is empty
kubectl get scaledobject order-processor-scaler -n production
NAME                       SCALETARGETKIND      SCALETARGETNAME    MIN   MAX   TRIGGERS         READY   ACTIVE
order-processor-scaler     apps/Deployment      order-processor    0     50    aws-sqs-queue    True    True

What this teaches you about CRD design

  • spec.scaleTargetRef — targeting another object by name. The controller acts on that object, not on the CR itself.
  • spec.triggers — a list of trigger specifications. Lists of typed sub-objects are a recurring CRD pattern.
  • spec.minReplicaCount: 0 — expressing scale-to-zero as a first-class concept in the API. Built-in HPA does not support this; KEDA’s CRD extends the vocabulary of what is expressible.
  • The KEDA operator translates ScaledObject → native HPA. The CRD is an abstraction over a more complex Kubernetes object. This “translate and manage child resources” pattern is extremely common in operators.

External Secrets Operator: The ExternalSecret CRD

External Secrets Operator (ESO) solves a specific problem: secrets live in external systems (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager), but Kubernetes workloads need them as Kubernetes Secrets. ESO bridges the gap.

The core CRDs

kubectl get crds | grep external-secrets
clusterexternalsecrets.external-secrets.io
clustersecretstores.external-secrets.io
externalsecrets.external-secrets.io
secretstores.external-secrets.io

A SecretStore defines the backend connection:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: eso-sa            # uses IRSA/workload identity

An ExternalSecret defines what to pull and how to map it:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-creds
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: database-secret          # Kubernetes Secret to create/update
    creationPolicy: Owner
  data:
    - secretKey: username          # key in the K8s Secret
      remoteRef:
        key: prod/database         # path in AWS Secrets Manager
        property: username         # property within that secret
    - secretKey: password
      remoteRef:
        key: prod/database
        property: password

After ESO reconciles this:

kubectl get secret database-secret -n production -o jsonpath='{.data.username}' | base64 -d
# outputs: db_user
kubectl describe externalsecret database-creds -n production
Status:
  Conditions:
    Last Transition Time:   2026-04-10T08:00:00Z
    Message:                Secret was synced
    Reason:                 SecretSynced
    Status:                 True
    Type:                   Ready
  Refresh Time:             2026-04-10T09:00:00Z
  Synced Resource Version:  1-abc123

What this teaches you about CRD design

  • spec.secretStoreRef — referencing a configuration CRD (SecretStore) from an operational CRD (ExternalSecret). This layering of CRDs to separate concerns is a mature pattern.
  • spec.refreshInterval — the CR expresses a desired behavior (periodic sync), not just a desired state snapshot. CRDs can express temporal behaviors.
  • spec.target.creationPolicy: Owner — ESO will set an owner reference on the created Secret, so deleting the ExternalSecret cascades to deleting the Secret. This is how controllers manage lifecycle.
  • Sensitive values never appear in the CR — only paths and references. The controller handles the actual secret retrieval. This is a key security pattern in CRD design.

The Common Pattern Across All Three

  OPERATOR PATTERN (cert-manager / KEDA / ESO / every other operator)

  User applies CR
        │
        ▼
  Controller watches CRDs
  (informer cache, events queue)
        │
        ▼
  Controller reconciles:
  actual state ──→ compare ──→ desired state
        │              │
        │         (gap found)
        │              │
        ▼              ▼
  Takes action      Updates status
  (issue cert,      conditions in CR
   create HPA,
   sync Secret)
        │
        └──── loops back, watches for next change

The design contract:
Users write spec — what they want
Controllers read spec, write status — what actually happened
Status conditions are truthReady: True/False with reason and message tell operators what the controller knows

This pattern, explained in depth in EP06, is why CRDs and controllers are designed the way they are.


⚠ Common Mistakes

Installing CRDs without the controller. If you install cert-manager’s CRDs from the crds.yaml manifest without installing cert-manager itself, Certificate objects will be accepted by the API server but never reconciled. The Ready condition will never appear. Always install the operator alongside its CRDs.

Editing status fields directly. Many teams try kubectl patch or kubectl edit to update a custom resource’s status to work around a stuck controller. Most well-written controllers overwrite status every reconcile loop — your manual change will be wiped. Fix the underlying issue, not the status display.

Assuming CRD deletion is safe. Covered in EP01 but worth repeating: deleting a CRD cascades to deleting all instances. If you kubectl delete crd certificates.cert-manager.io, every Certificate object in every namespace is gone and cert-manager will stop issuing. Back up CRDs and their instances before any CRD deletion.


Quick Reference

# See all CRDs installed by cert-manager
kubectl get crds | grep cert-manager.io

# Get all Certificates across all namespaces
kubectl get certificates -A

# Watch cert-manager reconcile a new Certificate
kubectl get certificate api-tls -n production -w

# See all ScaledObjects and their current state
kubectl get scaledobjects -A

# Check ESO sync status for all ExternalSecrets
kubectl get externalsecrets -A

# Inspect what APIs a CRD exposes
kubectl api-resources | grep cert-manager

Key Takeaways

  • cert-manager, KEDA, and ESO are canonical examples of well-designed CRD-based operators
  • All three follow the same pattern: user writes spec, controller reconciles to actual state, status reflects outcome
  • spec expresses desired state declaratively; the controller figures out how to achieve it
  • Status conditions (type, status, reason, message) are the standard way to surface controller outcomes
  • Sensitive values never appear in the CR — controllers retrieve them from external systems using references and credentials

What’s Next

EP03: CRD Anatomy opens the YAML of a CRD itself — spec.versions, OpenAPI schema properties, scope, names, and subresources. You have seen CRDs from the outside; next we look at how they are structured on the inside.

Get EP03 in your inbox when it publishes → subscribe at linuxcent.com