Kubernetes CRDs in Production: Finalizers, Status Conditions, and RBAC Patterns

Reading Time: 8 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 10
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Finalizers block deletion until cleanup completes — they prevent orphaned external resources but cause stuck objects if the controller crashes mid-cleanup; always implement a removal timeout
  • Status conditions are the standard communication channel between controller and user: use type, status, reason, message, and observedGeneration on every condition; never invent ad-hoc status fields
  • Owner references wire automatic garbage collection — when the parent custom resource is deleted, Kubernetes deletes owned child objects; use them for every object your controller creates in the same namespace
  • RBAC for CRDs in multi-tenant clusters must include separate ClusterRoles for controller, editor, and viewer; grant status and finalizers as separate sub-resources; never give application teams cluster-scoped create/delete on CRDs
  • The three most common Kubernetes CRD production failure modes: finalizer death loop, status thrash, and CRD deletion cascade — all avoidable with the patterns in this episode
  • Running kubectl get crds on a healthy cluster should show Established: True for every CRD; non-Established CRDs silently reject all create requests

The Big Picture

  PRODUCTION CRD LIFECYCLE: FULL PICTURE

  Create         Reconcile        Suspend/Resume      Delete
  ──────         ─────────        ──────────────      ──────
  User applies   Controller       User patches         User deletes
  BackupPolicy   creates CronJob, spec.suspended=true  BackupPolicy
      │          sets status          │                    │
      ▼              │                ▼                    ▼
  Admission      │           Controller          Finalizer blocks
  webhook        │           suspends CronJob     deletion
  (if any)       │                               Controller:
      │          │                                 1. Delete CronJob
      ▼          ▼                                 2. Remove external state
  Schema       Status                              3. Remove finalizer
  validation   conditions                          Object deleted from etcd
      │        updated
      ▼
  Controller
  reconcile
  triggered

Kubernetes CRD production readiness is not just about making the happy path work — it is about designing for the failure modes: controllers crashing mid-operation, deletion races, and status messages that confuse operators at 2am.


Finalizers: Controlled Deletion

A finalizer is a string in metadata.finalizers. Kubernetes will not delete an object that has finalizers, regardless of who issues the delete command.

metadata:
  name: nightly
  namespace: demo
  finalizers:
    - storage.example.com/backup-cleanup  # ← your controller put this here

When kubectl delete bp nightly runs:

  1. API server sets metadata.deletionTimestamp  (does NOT delete yet)
  2. Object is visible as "Terminating"
  3. Controller sees deletionTimestamp set
  4. Controller runs cleanup:
       - delete backup data from S3
       - delete CronJob (or let owner references handle it)
       - release any external locks
  5. Controller removes the finalizer:
       patch bp nightly --type=json \
         -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
  6. API server sees finalizers list is now empty → deletes the object

Adding a finalizer in Go

const finalizerName = "storage.example.com/backup-cleanup"

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Deletion path
    if !bp.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(bp, finalizerName) {
            if err := r.cleanupExternalResources(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(bp, finalizerName)
            if err := r.Update(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }

    // Normal path: ensure finalizer is present
    if !controllerutil.ContainsFinalizer(bp, finalizerName) {
        controllerutil.AddFinalizer(bp, finalizerName)
        if err := r.Update(ctx, bp); err != nil {
            return ctrl.Result{}, err
        }
    }

    // ... rest of reconcile
}

Finalizer death loop and the timeout pattern

If cleanupExternalResources always returns an error (external system down, bug in cleanup code), the object gets stuck in Terminating forever. The operator cannot delete it; kubectl delete --force does not help with finalizers.

Prevention: add a cleanup deadline with status tracking.

func (r *BackupPolicyReconciler) cleanupExternalResources(ctx context.Context, bp *storagev1alpha1.BackupPolicy) error {
    // Check if we've been trying to clean up for too long
    if bp.DeletionTimestamp != nil {
        deadline := bp.DeletionTimestamp.Add(10 * time.Minute)
        if time.Now().After(deadline) {
            // Log the failure, abandon cleanup, let the object be deleted.
            log.FromContext(ctx).Error(nil, "cleanup deadline exceeded, removing finalizer anyway",
                "name", bp.Name)
            return nil   // returning nil removes the finalizer
        }
    }
    // ... actual cleanup
}

Recovery for a stuck object (use only when cleanup truly cannot succeed):

kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'

Status Conditions: The Right Way

The Kubernetes standard condition format is defined in k8s.io/apimachinery/pkg/apis/meta/v1.Condition:

type Condition struct {
    Type               string          // e.g. "Ready", "Synced", "Degraded"
    Status             ConditionStatus // "True", "False", "Unknown"
    ObservedGeneration int64           // the .metadata.generation this condition reflects
    LastTransitionTime metav1.Time     // when Status last changed
    Reason             string          // machine-readable, CamelCase, e.g. "CronJobCreated"
    Message            string          // human-readable, may contain details
}

Standard condition types

Type Meaning
Ready The resource is fully reconciled and operational
Synced The resource has been synced with an external system
Progressing An operation is actively in progress
Degraded The resource is operating in a reduced capacity

Use Ready: True only when the full reconcile is complete and the resource is functional. Use Ready: False with a clear Message when reconcile fails or is blocked.

Setting conditions in Go

meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionFalse,
    ObservedGeneration: bp.Generation,
    Reason:             "CronJobCreateFailed",
    Message:            fmt.Sprintf("failed to create CronJob: %v", err),
})

meta.SetStatusCondition handles deduplication — it updates an existing condition of the same Type rather than appending a duplicate.

observedGeneration is critical

metadata.generation      = 5   (increments on every spec change)
status.observedGeneration = 3  (set by controller on each reconcile)

If observedGeneration < generation:
  → controller has not yet reconciled the latest spec change
  → status.conditions reflect an older state
  → do NOT alert based on conditions that lag generation

Always set ObservedGeneration: bp.Generation when writing status conditions. Tooling (Argo CD, Flux, kubectl wait) depends on this to know whether status is current.

kubectl wait uses conditions

# Wait until BackupPolicy is Ready
kubectl wait bp/nightly -n demo \
  --for=condition=Ready \
  --timeout=60s

This works because kubectl wait reads the status.conditions array.


Owner References: Automatic Garbage Collection

Owner references wire a parent-child relationship between Kubernetes objects. When the parent is deleted, Kubernetes garbage-collects all owned children automatically.

metadata:
  name: nightly-backup       # CronJob
  ownerReferences:
    - apiVersion: storage.example.com/v1alpha1
      kind: BackupPolicy
      name: nightly
      uid: a1b2c3d4-...
      controller: true          # only one owner can be the controller
      blockOwnerDeletion: true  # the GC waits for this owner before deleting child

Set in Go using ctrl.SetControllerReference:

if err := ctrl.SetControllerReference(bp, cronJob, r.Scheme); err != nil {
    return ctrl.Result{}, err
}

Owner reference rules

  • Owner and owned object must be in the same namespace — cluster-scoped objects cannot own namespaced objects
  • Only one object can be the controller: true owner; others can be non-controller owners
  • Deleting the owner cascades to deleting owned objects — this is garbage collection, not finalizer-based cleanup

Without owner references, deleting a BackupPolicy leaves the CronJob as an orphan. This is hard to detect and accumulates over time.


RBAC Patterns for Multi-Tenant CRD Usage

A production CRD deployment needs three distinct RBAC roles:

# 1. Controller role — full access for the operator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 2. Editor role — for application teams (namespaced binding)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  # No status write — only the controller writes status
  # No finalizers write — prevents deletion blocking by non-controllers
---
# 3. Viewer role — for audit, monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]

Bind editor/viewer roles at namespace scope, not cluster scope:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-backup-editor
  namespace: team-alpha
subjects:
  - kind: Group
    name: team-alpha
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: backuppolicy-editor
  apiGroup: rbac.authorization.k8s.io

This pattern gives team-alpha full control over BackupPolicies in their namespace but no access to other namespaces — standard Kubernetes multi-tenancy.


The Three Production Failure Modes

1. Finalizer death loop

Symptoms: Object stuck in Terminating for hours; kubectl get bp nightly shows DeletionTimestamp set but object exists.

Cause: cleanupExternalResources always returns an error.

Detection:

kubectl get bp nightly -n demo -o jsonpath='{.metadata.deletionTimestamp}'
# non-empty = stuck in termination
kubectl describe bp nightly -n demo
# look for repeated reconcile error events

Fix: Add cleanup deadline in controller; use kubectl patch to remove finalizer as last resort.

2. Status thrash

Symptoms: Controller sets Ready: True, then Ready: False, then Ready: True in a rapid loop. Alert noise, confusing dashboards.

Cause: Each reconcile compares actual state incorrectly due to cache lag — it sees its own status write as a change, re-reconciles, and flips the status again.

Fix: Set ObservedGeneration on every condition. Compare generation with observedGeneration before re-reconciling. Use meta.IsStatusConditionTrue to check current condition before overwriting it with the same value.

// Only update status if it changed
current := meta.FindStatusCondition(bp.Status.Conditions, "Ready")
if current == nil || current.Status != desired.Status || current.Reason != desired.Reason {
    meta.SetStatusCondition(&bpCopy.Status.Conditions, desired)
    r.Status().Update(ctx, bpCopy)
}

3. CRD deletion cascade

Symptoms: A team deletes a CRD for cleanup purposes; all instances across all namespaces disappear silently.

Cause: kubectl delete crd backuppolicies.storage.example.com — the API server cascades the deletion to all custom resources of that type.

Prevention:
– Add a resourcelock annotation on production CRDs managed by your operator
– Use GitOps (Argo CD, Flux) to manage CRD installation — a deleted CRD is automatically re-applied from the Git source
– Back up CRDs and instances with velero or equivalent before any CRD management operations


Production Readiness Checklist

CRD DEFINITION
  □ spec.versions has exactly one storage: true version
  □ Status subresource enabled (subresources.status: {})
  □ additionalPrinterColumns includes Ready column from status.conditions
  □ OpenAPI schema defines required fields and types
  □ CEL rules cover cross-field constraints

CONTROLLER
  □ Owner references set on all child resources
  □ Finalizer logic includes cleanup deadline
  □ Status conditions use standard format with observedGeneration
  □ Reconcile function is idempotent
  □ Not-found errors handled cleanly (return nil, not error)
  □ At least 2 replicas with leader election enabled

RBAC
  □ Three ClusterRoles: controller, editor, viewer
  □ Status and finalizers are separate RBAC sub-resources
  □ Editor/viewer bound at namespace scope, not cluster scope
  □ Controller ServiceAccount has only necessary permissions

OPERATIONS
  □ CRD installed via GitOps or Helm (not manual kubectl apply)
  □ Backup of CRDs and instances included in cluster backup
  □ kubectl get crds shows Established: True for all CRDs
  □ Monitoring for stuck Terminating objects (finalizer deadlock)
  □ Alert on controller reconcile error rate, not just pod health

⚠ Common Mistakes

Granting update on backuppolicies but not backuppolicies/status to the controller. If the controller cannot write status, status updates silently fail. The controller appears to run but status conditions never update. Grant both backuppolicies (for spec/metadata writes) and backuppolicies/status (for the status subresource path).

Setting Ready: True before all owned resources are healthy. If the controller sets Ready: True after creating the CronJob but before verifying the CronJob is actually active, users see a false-positive health signal. Only set Ready: True when you have confirmed the desired state is actually achieved.

Not setting observedGeneration on status conditions. Tools like Argo CD and kubectl wait --for=condition=Ready will report incorrect health status if observedGeneration is stale. Always set ObservedGeneration: obj.Generation in every condition write.

Using kubectl delete crd in a production cluster without a backup. This is irreversible. Treat CRDs as production-critical infrastructure — require GitOps review, backup verification, and team approval before any CRD deletion.


Quick Reference

# Check for stuck Terminating objects
kubectl get backuppolicies -A --field-selector metadata.deletionTimestamp!=''

# Force-remove a stuck finalizer (use only when cleanup is truly impossible)
kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'

# Check all CRDs are Established
kubectl get crds -o jsonpath='{range .items[*]}{.metadata.name} {.status.conditions[?(@.type=="Established")].status}{"\n"}{end}'

# Watch status conditions update during reconcile
kubectl get bp nightly -n demo -w -o \
  jsonpath='{.status.conditions[?(@.type=="Ready")].status} {.status.conditions[?(@.type=="Ready")].message}{"\n"}'

# Verify owner references are set on child CronJob
kubectl get cronjob nightly-backup -n demo \
  -o jsonpath='{.metadata.ownerReferences}'

# List all objects owned by a BackupPolicy (by label)
kubectl get all -n demo -l backuppolicy=nightly

Key Takeaways

  • Finalizers block deletion until cleanup completes — always implement a cleanup deadline to prevent permanent stuck objects
  • Status conditions must use the standard format with observedGeneration — tooling depends on it for correctness
  • Owner references enable automatic garbage collection of child resources when the parent is deleted
  • RBAC needs three roles (controller, editor, viewer) with status and finalizers as separate sub-resources
  • The three production failure modes — finalizer death loop, status thrash, CRD deletion cascade — are all preventable with the patterns covered in this episode

Series Complete

You now have the full picture of Kubernetes CRDs and Operators: from understanding what a CRD is (EP01), through real examples (EP02), schema design (EP03), hands-on YAML (EP04), CEL validation (EP05), the controller loop (EP06), building an operator (EP07), versioning (EP08), admission webhooks (EP09), to production patterns in this episode.

The next series in the Kubernetes learning arc on linuxcent.com covers Kubernetes Networking Deep Dive — Services, Ingress, Gateway API, CNI, and eBPF networking. Subscribe below to get it when it launches.

Stay subscribed → linuxcent.com

Admission Webhooks: Validating and Mutating Requests Before They Reach etcd

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 9
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes admission webhooks are HTTPS endpoints called by the API server synchronously on every create/update/delete — before the object reaches etcd
    (two types: mutating webhooks modify the object; validating webhooks approve or reject it — mutating runs first, then validating)
  • Use a validating webhook when you need to reject objects based on state you cannot express in CEL: checking if a referenced Secret exists, enforcing cross-resource quotas, consulting an external policy engine
  • Use a mutating webhook when you need to inject defaults or sidecar containers that depend on context you cannot express in the CRD schema (environment-specific defaults, sidecar injection)
  • Admission webhooks are an availability dependency — if your webhook is unreachable, the API requests it covers will fail. failurePolicy: Ignore is the safety valve; use it only for non-critical webhooks
  • OPA/Gatekeeper and Kyverno are admission webhook platforms — they let you write policy as code (Rego, YAML) instead of writing Go webhook handlers
  • For CRD-specific validation that only depends on the object itself, prefer CEL (EP05) — webhooks are for rules that require external lookups or cross-resource checks

The Big Picture

  KUBERNETES ADMISSION CHAIN (full picture)

  kubectl apply -f backuppolicy.yaml
        │
        ▼
  API Server: authentication + authorization
        │
        ▼
  1. Mutating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return modified object │
     │ Examples: inject annotations,          │
     │ set defaults, add sidecars            │
     └───────────────────────────────────────┘
        │
        ▼
  2. Schema validation (OpenAPI + CEL)
        │
        ▼
  3. Validating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return allow/deny     │
     │ Examples: quota checks, cross-        │
     │ resource validation, policy engines   │
     └───────────────────────────────────────┘
        │
        ▼ (allowed)
  etcd storage

Kubernetes admission webhooks are how tools like Istio inject sidecars, Kyverno enforces policies, and OPA/Gatekeeper applies organizational guardrails — all without modifying Kubernetes source code. Understanding them completes the picture of how Kubernetes is extended beyond CRDs.


Validating vs Mutating: When to Use Each

  DECISION TREE: CEL vs Validating Webhook vs Mutating Webhook

  "I need to validate a field value"
      │
      ├── Depends only on the object being submitted?
      │   → Use CEL (x-kubernetes-validations) — EP05
      │
      └── Needs to look up another resource, quota, or external system?
          → Use Validating Admission Webhook

  "I need to set default values or inject content"
      │
      ├── Defaults depend only on other fields in the same object?
      │   → Use OpenAPI schema defaults or CEL
      │
      └── Defaults depend on environment, namespace labels, or external config?
          → Use Mutating Admission Webhook

Practical examples:

Rule Right tool
retentionDays must be ≤ 365 CEL
if storageClass=premium then retentionDays ≤ 90 CEL
Referenced SecretStore must exist in the same namespace Validating webhook
BackupPolicy count per namespace must not exceed team quota Validating webhook
Inject costCenter annotation from namespace labels Mutating webhook
Inject backup-agent sidecar into all Pods in labeled namespaces Mutating webhook
Enforce that all BackupPolicies have a team label Kyverno or OPA policy

The Webhook Request/Response Contract

Both webhook types receive an AdmissionReview object and return an AdmissionReview response.

Request (from API server to webhook):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "kind": {"group": "storage.example.com", "version": "v1alpha1", "kind": "BackupPolicy"},
    "resource": {"group": "storage.example.com", "version": "v1alpha1", "resource": "backuppolicies"},
    "operation": "CREATE",
    "userInfo": {"username": "alice", "groups": ["system:authenticated"]},
    "object": { /* full BackupPolicy JSON */ },
    "oldObject": null
  }
}

Response for a validating webhook (allow):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "allowed": true
  }
}

Response for a validating webhook (deny):

{
  "response": {
    "uid": "...",
    "allowed": false,
    "status": {
      "code": 422,
      "message": "referenced SecretStore 'aws-secrets-manager' not found in namespace 'production'"
    }
  }
}

Response for a mutating webhook (allow + patch):

{
  "response": {
    "uid": "...",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL21ldGFkYXRhL2Fubm90YXRpb25zL2Nvc3RDZW50ZXIiLCJ2YWx1ZSI6ImVuZ2luZWVyaW5nIn1d"
    // base64-encoded JSON patch:
    // [{"op":"add","path":"/metadata/annotations/costCenter","value":"engineering"}]
  }
}

Writing a Validating Webhook with kubebuilder

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --programmatic-validation

Edit api/v1alpha1/backuppolicy_webhook.go:

package v1alpha1

import (
    "context"
    "fmt"

    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/webhook/admission"
    esov1beta1 "github.com/external-secrets/external-secrets/apis/externalsecrets/v1beta1"
)

type BackupPolicyCustomValidator struct {
    Client client.Client
}

//+kubebuilder:webhook:path=/validate-storage-example-com-v1alpha1-backuppolicy,mutating=false,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create;update,versions=v1alpha1,name=vbackuppolicy.kb.io,admissionReviewVersions=v1

func (v *BackupPolicyCustomValidator) SetupWebhookWithManager(mgr ctrl.Manager) error {
    v.Client = mgr.GetClient()
    return ctrl.NewWebhookManagedBy(mgr).
        For(&BackupPolicy{}).
        WithValidator(v).
        Complete()
}

// ValidateCreate validates a new BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    bp := obj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateUpdate validates an updated BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
    bp := newObj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateDelete is a no-op here.
func (v *BackupPolicyCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    return nil, nil
}

// validateSecretStoreRef checks that the referenced SecretStore exists in the same namespace.
func (v *BackupPolicyCustomValidator) validateSecretStoreRef(ctx context.Context, bp *BackupPolicy) error {
    ref := bp.Spec.SecretStoreRef
    if ref == "" {
        return nil  // optional field; CEL handles it if required
    }

    store := &esov1beta1.SecretStore{}
    err := v.Client.Get(ctx, types.NamespacedName{Name: ref, Namespace: bp.Namespace}, store)
    if apierrors.IsNotFound(err) {
        return fmt.Errorf("referenced SecretStore %q not found in namespace %q",
            ref, bp.Namespace)
    }
    return err  // nil on found, real error on API failure
}

Writing a Mutating Webhook: Cost Center Injection

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --defaulting

Edit the defaulting webhook:

//+kubebuilder:webhook:path=/mutate-storage-example-com-v1alpha1-backuppolicy,mutating=true,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create,versions=v1alpha1,name=mbackuppolicy.kb.io,admissionReviewVersions=v1

func (r *BackupPolicy) Default() {
    // Default is called by kubebuilder's webhook framework on admission.
    // The webhook handler calls this and patches the object.
    //
    // This runs AFTER API server schema defaults — use it for context-dependent defaults.
}

// For namespace-label-based injection, implement the full webhook handler instead:
type BackupPolicyMutator struct {
    Client client.Client
}

func (m *BackupPolicyMutator) Handle(ctx context.Context, req admission.Request) admission.Response {
    bp := &BackupPolicy{}
    if err := json.Unmarshal(req.Object.Raw, bp); err != nil {
        return admission.Errored(http.StatusBadRequest, err)
    }

    // Fetch the namespace to read its labels
    ns := &corev1.Namespace{}
    if err := m.Client.Get(ctx, types.NamespacedName{Name: bp.Namespace}, ns); err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }

    // Inject costCenter annotation from namespace label
    if costCenter, ok := ns.Labels["billing/cost-center"]; ok {
        if bp.Annotations == nil {
            bp.Annotations = make(map[string]string)
        }
        bp.Annotations["billing/cost-center"] = costCenter
    }

    marshaled, err := json.Marshal(bp)
    if err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }
    return admission.PatchResponseFromRaw(req.Object.Raw, marshaled)
}

The WebhookConfiguration Resource

The ValidatingWebhookConfiguration tells the API server which webhooks exist and which resources/operations they handle:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: backup-operator-validating-webhook
  annotations:
    cert-manager.io/inject-ca-from: backup-operator-system/backup-operator-serving-cert
webhooks:
  - name: vbackuppolicy.kb.io
    admissionReviewVersions: ["v1"]
    clientConfig:
      service:
        name: backup-operator-webhook-service
        namespace: backup-operator-system
        path: /validate-storage-example-com-v1alpha1-backuppolicy
    rules:
      - apiGroups:   ["storage.example.com"]
        apiVersions: ["v1alpha1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["backuppolicies"]
    failurePolicy: Fail          # Fail = reject request if webhook unreachable
    sideEffects: None
    timeoutSeconds: 10
    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values: ["kube-system"]  # never webhook kube-system objects

failurePolicy: Fail vs Ignore

  failurePolicy: Fail (default)
  ──────────────────────────────
  If webhook is unreachable → API request fails with 500
  Use when: the validation is critical (quota enforcement, policy)
  Risk: your webhook becoming unavailable breaks all covered API operations

  failurePolicy: Ignore
  ──────────────────────────────
  If webhook is unreachable → API request proceeds as if webhook allowed it
  Use when: the webhook is advisory or can be bypassed safely
  Risk: policy is silently not enforced during webhook outage

For production operators, use failurePolicy: Fail but ensure high availability:
– Run at least 2 webhook pod replicas with PodDisruptionBudget
– Use cert-manager for automatic TLS certificate rotation
– Set timeoutSeconds to a value that allows graceful degradation (5–10s)
– Exclude system namespaces with namespaceSelector


OPA/Gatekeeper and Kyverno: Webhooks as Policy Platforms

Writing raw webhook handlers in Go is powerful but heavyweight for policy enforcement. OPA/Gatekeeper and Kyverno are webhook-based policy engines that let you express policies as code:

Kyverno (YAML-based policies):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-backup-label
spec:
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds: ["BackupPolicy"]
      validate:
        message: "BackupPolicy must have a 'team' label"
        pattern:
          metadata:
            labels:
              team: "?*"

OPA/Gatekeeper (Rego-based policies):

package backuppolicy

deny[msg] {
    input.request.kind.kind == "BackupPolicy"
    not input.request.object.metadata.labels["team"]
    msg := "BackupPolicy must have a 'team' label"
}

Both run as admission webhooks that the API server calls. The policy language sits on top of the webhook plumbing. For organizational policy enforcement across many resource types, these tools outperform custom Go webhook handlers.


⚠ Common Mistakes

Webhook covering * resources or * operations. A webhook covering all resources in the cluster is a reliability risk — a bug in the webhook or an outage breaks everything. Scope webhooks to exactly the resources and operations they need with rules[].resources and rules[].operations.

No TLS certificate rotation. Webhook endpoints require a TLS certificate that the API server trusts. Certificates expire. Using cert-manager with the cert-manager.io/inject-ca-from annotation automates this. Without it, expired certificates cause silent webhook outages (the API server rejects the TLS handshake, triggering failurePolicy behavior).

Not excluding system namespaces. If a validating webhook covers Pods and has failurePolicy: Fail, and the webhook pod itself crashes, the API server cannot create a new webhook pod because the webhook rejects the creation. Use namespaceSelector to exclude kube-system and your operator’s own namespace.

Treating webhook latency as free. Every API operation covered by a webhook adds a synchronous HTTP round-trip. On a busy cluster creating thousands of objects per minute, a 100ms webhook latency becomes significant. Set timeoutSeconds, profile webhook performance, and scope rules narrowly.


Quick Reference

# List all webhook configurations
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

# Inspect webhook rules and failure policy
kubectl describe validatingwebhookconfiguration backup-operator-validating-webhook

# Temporarily disable a webhook for debugging (dangerous in production)
kubectl delete validatingwebhookconfiguration backup-operator-validating-webhook

# Check webhook endpoint certificate
kubectl get secret backup-operator-webhook-server-cert \
  -n backup-operator-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# Test webhook is reachable from a cluster node
kubectl run webhook-test --image=curlimages/curl --rm -it --restart=Never -- \
  curl -k https://backup-operator-webhook-service.backup-operator-system.svc:443/healthz

Key Takeaways

  • Mutating webhooks modify objects at admission; validating webhooks approve or reject them — mutating runs before validating
  • Use CEL for rules that depend only on the submitted object; use webhooks when you need external lookups or cross-resource checks
  • failurePolicy: Fail blocks API requests if the webhook is unreachable — ensure high availability before using it
  • Always exclude system namespaces and scope rules to specific resource types to minimize the blast radius of webhook failures
  • OPA/Gatekeeper and Kyverno are admission webhook platforms for policy-as-code — prefer them over custom Go handlers for organizational policy enforcement

What’s Next

EP10: Kubernetes CRDs in Production ties the full series together — finalizer design patterns, status condition conventions, owner references, RBAC for multi-tenant CRD usage, and the production failure modes that catch teams off guard.

Get EP10 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Versioning: From v1alpha1 to v1 Without Breaking Clients

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 8
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes CRD versioning lets you evolve your API from v1alpha1 to v1 without deleting existing custom resources or breaking clients still using the old version
    (storage version = the version etcd actually stores objects in; served versions = the versions the API server responds to; you can serve v1alpha1 and v1 simultaneously while migrating)
  • The hub-and-spoke model is the recommended conversion architecture: one “hub” version (usually v1) that every other version converts to/from
  • Without a conversion webhook, the API server only allows one served version at a time — you must use a webhook to serve multiple versions with schema differences
  • kubectl storage-version-migrator (or manual re-apply) migrates existing objects from the old storage version to the new one after you update storage: true
  • Changing field names between versions without a conversion webhook corrupts data silently — always test conversion round-trips before promoting a version

The Big Picture

  CRD VERSION LIFECYCLE

  Stage 1: Alpha                 Stage 2: Beta              Stage 3: Stable
  ──────────────────             ──────────────             ──────────────
  v1alpha1                       v1alpha1 (deprecated)      v1alpha1 (removed)
    served: true                   served: true               served: false
    storage: true                  storage: false             storage: false
                                 v1beta1                    v1beta1 (deprecated)
                                   served: true               served: true
                                   storage: false             storage: false
                                 v1                         v1
                                   served: true               served: true
                                   storage: true              storage: true

  Clients using v1alpha1:         The API server converts     Eventually remove
  still work via conversion       on the fly                  old served versions
  webhook

Kubernetes CRD versioning is what allows you to ship BackupPolicy v1alpha1 today, learn from real usage, evolve the schema to v1 with renamed fields and new constraints, and keep existing clusters running without a migration window.


Why Versioning Is Necessary

When BackupPolicy v1alpha1 shipped, the spec used retentionDays. After six months of production use, the team learns:

  • retentionDays should be renamed to retention.days (nested under a retention object for future extensibility)
  • A new required field backupFormat needs to be added with a default of tar.gz
  • The targets field should be renamed to includedNamespaces

These are breaking changes. Clients (GitOps repos, Helm charts, other operators) still have YAML referencing v1alpha1 with the old field names. You cannot simply rename the fields.

The solution: add v1 with the new schema, run both versions simultaneously via a conversion webhook, migrate objects to the new storage version, then deprecate v1alpha1.


Simple Case: Non-Breaking Addition (No Webhook Needed)

If you only add new optional fields to the schema — no renames, no removals — you can add a new version without a conversion webhook, as long as only one version is served at a time.

versions:
  - name: v1alpha1
    served: false      # stop serving old version
    storage: false
    schema: ...
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              schedule:
                type: string
              retentionDays:
                type: integer
              backupFormat:          # new optional field
                type: string
                default: "tar.gz"

Existing objects stored as v1alpha1 are served as v1 with the new field defaulted. This works for purely additive changes because the stored bytes are compatible with the new schema.

When this is not enough: field renames, type changes, field removal, or structural reorganization all require a conversion webhook.


The Hub-and-Spoke Model

For breaking schema changes, the API server needs a conversion webhook. The recommended architecture is hub-and-spoke:

  HUB-AND-SPOKE CONVERSION

       v1alpha1
          │
          ▼ convert to hub
         v1  (hub)
          ▲
          │ convert to hub
       v1beta1

  Every version converts TO the hub and FROM the hub.
  The hub is always the storage version.
  Two-version conversion: v1alpha1 → v1 → v1beta1
  Never directly: v1alpha1 → v1beta1

This means you only write N conversion functions (one per version) rather than N² (one per version pair). As you add versions, the conversion complexity grows linearly.


Writing a Conversion Webhook

The conversion webhook is an HTTPS endpoint that the API server calls when it needs to convert an object between versions.

1. Define the conversion hub

In the kubebuilder project, mark v1 as the hub:

In api/v1/backuppolicy_conversion.go:

package v1

// Hub marks this type as the conversion hub.
func (*BackupPolicy) Hub() {}

2. Implement conversion in v1alpha1

In api/v1alpha1/backuppolicy_conversion.go:

package v1alpha1

import (
    "fmt"
    v1 "github.com/example/backup-operator/api/v1"
    "sigs.k8s.io/controller-runtime/pkg/conversion"
)

// ConvertTo converts v1alpha1 BackupPolicy to v1 (the hub).
func (src *BackupPolicy) ConvertTo(dstRaw conversion.Hub) error {
    dst := dstRaw.(*v1.BackupPolicy)

    // Metadata
    dst.ObjectMeta = src.ObjectMeta

    // Field mapping: v1alpha1 → v1
    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.BackupFormat  = "tar.gz"           // new field: default for old objects
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended

    // Renamed field: retentionDays → retention.days
    dst.Spec.Retention = v1.RetentionSpec{
        Days: src.Spec.RetentionDays,
    }

    // Renamed field: targets → includedNamespaces
    for _, t := range src.Spec.Targets {
        dst.Spec.IncludedNamespaces = append(dst.Spec.IncludedNamespaces,
            v1.NamespaceTarget{
                Namespace:      t.Namespace,
                IncludeSecrets: t.IncludeSecrets,
            })
    }

    dst.Status = v1.BackupPolicyStatus(src.Status)
    return nil
}

// ConvertFrom converts v1 (hub) BackupPolicy back to v1alpha1.
func (dst *BackupPolicy) ConvertFrom(srcRaw conversion.Hub) error {
    src := srcRaw.(*v1.BackupPolicy)

    dst.ObjectMeta = src.ObjectMeta

    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended
    dst.Spec.RetentionDays = src.Spec.Retention.Days

    for _, n := range src.Spec.IncludedNamespaces {
        dst.Spec.Targets = append(dst.Spec.Targets, BackupTarget{
            Namespace:      n.Namespace,
            IncludeSecrets: n.IncludeSecrets,
        })
    }

    // backupFormat cannot be round-tripped to v1alpha1 (no such field)
    // Store it in an annotation to preserve the value if the object is
    // re-converted back to v1.
    if src.Spec.BackupFormat != "" && src.Spec.BackupFormat != "tar.gz" {
        if dst.Annotations == nil {
            dst.Annotations = make(map[string]string)
        }
        dst.Annotations["storage.example.com/backup-format"] = src.Spec.BackupFormat
    }

    dst.Status = BackupPolicyStatus(src.Status)
    return nil
}

3. Register the webhook

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --conversion

This generates the webhook server setup. Deploy with a TLS certificate (cert-manager can manage this automatically via the kubebuilder //+kubebuilder:webhook:... marker).


Updating the CRD to Reference the Webhook

spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: backup-operator-webhook-service
          namespace: backup-operator-system
          path: /convert
      conversionReviewVersions: ["v1", "v1beta1"]
  versions:
    - name: v1alpha1
      served: true
      storage: false
      schema: ...
    - name: v1
      served: true
      storage: true
      schema: ...

Once applied, kubectl get backuppolicies.v1alpha1.storage.example.com/nightly and kubectl get backuppolicies.v1.storage.example.com/nightly both work — the API server converts transparently.


Migrating Existing Objects to the New Storage Version

After changing storage: true from v1alpha1 to v1, existing objects in etcd are still stored as v1alpha1 bytes. They are served correctly (via conversion) but are not yet migrated.

Migrate them:

# Option 1: Manual re-apply (works for small object counts)
kubectl get backuppolicies -A -o name | while read name; do
  kubectl apply -f <(kubectl get $name -o yaml)
done

# Option 2: Storage Version Migrator (automated, for large clusters)
# Install: https://github.com/kubernetes-sigs/kube-storage-version-migrator
kubectl apply -f storageVersionMigration.yaml

After migration, all objects in etcd are stored as v1. You can then set v1alpha1 served: false to stop serving the old version.


Storage Version Migration Checklist

  SAFE VERSION PROMOTION CHECKLIST

  □ New version (v1) has served: true, storage: true
  □ Old version (v1alpha1) has served: true, storage: false
  □ Conversion webhook deployed and healthy
  □ Round-trip conversion tested (v1alpha1 → v1 → v1alpha1 preserves all data)
  □ kubectl get backuppolicies works at both versions
  □ Existing objects migrated (re-applied or migration job run)
  □ Old version set to served: false (stop serving)
  □ Old version removed from CRD after N release cycles

⚠ Common Mistakes

Changing the storage version without a conversion webhook. If you flip storage: true from v1alpha1 to v1 while still serving v1alpha1, the API server tries to read stored v1alpha1 bytes as v1 and fails. Always deploy the conversion webhook before changing the storage version.

Lossy conversion. If ConvertFrom (v1 → v1alpha1) drops a field that exists in v1, objects are silently corrupted when a v1alpha1 client reads and re-saves them. Round-trip test every conversion: original → hub → original must produce identical objects (or use annotations to preserve fields that cannot round-trip).

Forgetting to migrate existing objects. After changing the storage version, existing objects are still stored in the old format. They convert on read, but etcd still holds old bytes. Until migrated, your etcd backup/restore story is broken — restoring from backup would restore old-format bytes that need conversion.


Quick Reference

# Check which version is currently the storage version
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.storedVersions}'
# output: ["v1alpha1"]  or  ["v1alpha1","v1"]  or  ["v1"]

# Verify conversion webhook is reachable
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.spec.conversion.webhook.clientConfig}'

# Read an object at a specific version
kubectl get backuppolicies.v1alpha1.storage.example.com/nightly -n demo -o yaml
kubectl get backuppolicies.v1.storage.example.com/nightly -n demo -o yaml

# Check CRD conditions (NamesAccepted, Established)
kubectl describe crd backuppolicies.storage.example.com | grep -A5 Conditions

Key Takeaways

  • CRD versioning lets you evolve the schema without a migration window — old and new versions coexist via a conversion webhook
  • The hub-and-spoke model minimizes conversion code: N functions, not N² — the hub version is always the storage version
  • Never change the storage version without a deployed conversion webhook for breaking schema changes
  • Conversion must be lossless — fields that cannot round-trip should be preserved in annotations
  • Migrate existing objects to the new storage version after promoting it, then deprecate the old served version

What’s Next

EP09: Admission Webhooks completes the Kubernetes extension picture — validating and mutating webhooks that intercept API requests before they reach etcd, when to use them alongside CRDs, and how they differ from CEL validation.

Get EP09 in your inbox when it publishes → subscribe at linuxcent.com

Build a Simple Kubernetes Operator with controller-runtime and kubebuilder

Reading Time: 7 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 7
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Building a Kubernetes operator means writing a Go reconciler with controller-runtime — kubebuilder scaffolds the project structure, RBAC markers, and Makefile targets so you focus on the reconcile logic
    (kubebuilder = a CLI and framework that generates the operator project scaffold; controller-runtime = the Go library that provides the informer cache, work queue, and reconciler interface)
  • The reconciler for BackupPolicy in this episode creates and manages a CronJob — it is the behavior layer for the CRD built in EP03–EP05
  • RBAC is expressed as Go code comments (//+kubebuilder:rbac:...) — kubebuilder generates the ClusterRole YAML from them
  • Run the operator locally with make run during development; no cluster deployment needed until ready
  • The same project that builds the operator also builds and installs the CRD — make install applies the CRD YAML generated from your Go types
  • Testing: the operator ships with envtest — a local API server + etcd for controller testing without a real cluster

The Big Picture

  OPERATOR PROJECT STRUCTURE (kubebuilder scaffold)

  backup-operator/
  ├── api/v1alpha1/
  │   ├── backuppolicy_types.go     ← Go types that define CRD schema
  │   └── groupversion_info.go
  ├── internal/controller/
  │   └── backuppolicy_controller.go ← reconcile logic (our main focus)
  ├── config/
  │   ├── crd/                       ← generated CRD YAML
  │   ├── rbac/                      ← generated RBAC YAML
  │   └── manager/                   ← controller Deployment YAML
  ├── cmd/main.go                    ← entrypoint, sets up the manager
  └── Makefile                       ← build, test, install, deploy targets

  FLOW:
  Go types → kubebuilder generate → CRD YAML + RBAC YAML
  Reconcile function → runs in cluster → watches BackupPolicy → manages CronJobs

Building a Kubernetes operator with controller-runtime is where CRDs become living infrastructure — the BackupPolicy objects created in EP04 now get actual behavior attached to them.


Prerequisites

# Go 1.22+
go version

# kubebuilder CLI
curl -L -o kubebuilder \
  https://github.com/kubernetes-sigs/kubebuilder/releases/latest/download/kubebuilder_linux_amd64
chmod +x kubebuilder
sudo mv kubebuilder /usr/local/bin/

# A running cluster (kind works well for development)
kind create cluster --name operator-dev

# Verify kubectl works
kubectl cluster-info --context kind-operator-dev

Step 1: Scaffold the Project

mkdir backup-operator && cd backup-operator

# Initialize the Go module and project structure
kubebuilder init \
  --domain storage.example.com \
  --repo github.com/example/backup-operator

# Create the API (Go types + controller scaffold)
kubebuilder create api \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --resource \
  --controller

When prompted:

Create Resource [y/n]: y
Create Controller [y/n]: y

The generated directory tree:

backup-operator/
├── api/
│   └── v1alpha1/
│       ├── backuppolicy_types.go
│       └── groupversion_info.go
├── internal/
│   └── controller/
│       └── backuppolicy_controller.go
├── cmd/
│   └── main.go
├── config/
│   ├── crd/bases/
│   ├── rbac/
│   └── manager/
├── go.mod
├── go.sum
└── Makefile

Step 2: Define the Go Types

Edit api/v1alpha1/backuppolicy_types.go to match the schema from EP03:

package v1alpha1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BackupTarget specifies a namespace to include in the backup.
type BackupTarget struct {
    Namespace      string `json:"namespace"`
    IncludeSecrets bool   `json:"includeSecrets,omitempty"`
}

// BackupPolicySpec defines the desired state of BackupPolicy.
type BackupPolicySpec struct {
    // Schedule is a cron expression for when to run backups.
    // +kubebuilder:validation:Pattern=`^(\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+)$`
    Schedule string `json:"schedule"`

    // RetentionDays is how long to keep backup snapshots.
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=365
    RetentionDays int32 `json:"retentionDays"`

    // StorageClass is the storage class to use for backup volumes.
    // +kubebuilder:default=standard
    // +kubebuilder:validation:Enum=standard;premium;encrypted;archive
    StorageClass string `json:"storageClass,omitempty"`

    // Targets lists the namespaces and resources to include.
    // +kubebuilder:validation:MaxItems=20
    Targets []BackupTarget `json:"targets,omitempty"`

    // Suspended pauses backup execution when true.
    // +kubebuilder:default=false
    Suspended bool `json:"suspended,omitempty"`
}

// BackupPolicyStatus defines the observed state of BackupPolicy.
type BackupPolicyStatus struct {
    // Conditions reflect the current state of the BackupPolicy.
    Conditions []metav1.Condition `json:"conditions,omitempty"`

    // LastBackupTime is when the most recent backup completed.
    LastBackupTime *metav1.Time `json:"lastBackupTime,omitempty"`

    // CronJobName is the name of the managed CronJob.
    CronJobName string `json:"cronJobName,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Schedule",type=string,JSONPath=`.spec.schedule`
// +kubebuilder:printcolumn:name="Retention",type=integer,JSONPath=`.spec.retentionDays`
// +kubebuilder:printcolumn:name="Suspended",type=boolean,JSONPath=`.spec.suspended`
// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=='Ready')].status`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// BackupPolicy is the Schema for the backuppolicies API.
type BackupPolicy struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   BackupPolicySpec   `json:"spec,omitempty"`
    Status BackupPolicyStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// BackupPolicyList contains a list of BackupPolicy.
type BackupPolicyList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []BackupPolicy `json:"items"`
}

func init() {
    SchemeBuilder.Register(&BackupPolicy{}, &BackupPolicyList{})
}

Regenerate the CRD YAML and DeepCopy methods:

make generate   # regenerates zz_generated.deepcopy.go
make manifests  # regenerates CRD YAML under config/crd/bases/

Step 3: Write the Reconciler

Edit internal/controller/backuppolicy_controller.go:

package controller

import (
    "context"
    "fmt"

    batchv1 "k8s.io/api/batch/v1"
    corev1 "k8s.io/api/core/v1"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/meta"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    storagev1alpha1 "github.com/example/backup-operator/api/v1alpha1"
)

// BackupPolicyReconciler reconciles BackupPolicy objects.
type BackupPolicyReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// RBAC markers — kubebuilder generates ClusterRole YAML from these comments.
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/finalizers,verbs=update
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Step 1: Fetch the BackupPolicy
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        if apierrors.IsNotFound(err) {
            // Object deleted before we could reconcile — nothing to do.
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, fmt.Errorf("fetching BackupPolicy: %w", err)
    }

    // Step 2: Define the desired CronJob name
    cronJobName := fmt.Sprintf("%s-backup", bp.Name)

    // Step 3: Fetch the existing CronJob (if any)
    existing := &batchv1.CronJob{}
    err := r.Get(ctx, types.NamespacedName{Name: cronJobName, Namespace: bp.Namespace}, existing)
    notFound := apierrors.IsNotFound(err)
    if err != nil && !notFound {
        return ctrl.Result{}, fmt.Errorf("fetching CronJob: %w", err)
    }

    // Step 4: Build the desired CronJob
    desired := r.buildCronJob(bp, cronJobName)

    // Step 5: Create or update
    if notFound {
        logger.Info("Creating CronJob", "name", cronJobName)
        if err := r.Create(ctx, desired); err != nil {
            return ctrl.Result{}, fmt.Errorf("creating CronJob: %w", err)
        }
    } else {
        // Update schedule and suspend state if they differ
        if existing.Spec.Schedule != desired.Spec.Schedule ||
            existing.Spec.Suspend != desired.Spec.Suspend {
            existing.Spec.Schedule = desired.Spec.Schedule
            existing.Spec.Suspend = desired.Spec.Suspend
            logger.Info("Updating CronJob", "name", cronJobName)
            if err := r.Update(ctx, existing); err != nil {
                return ctrl.Result{}, fmt.Errorf("updating CronJob: %w", err)
            }
        }
    }

    // Step 6: Update status
    bpCopy := bp.DeepCopy()
    meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
        Type:               "Ready",
        Status:             metav1.ConditionTrue,
        Reason:             "CronJobReady",
        Message:            fmt.Sprintf("CronJob %s is configured", cronJobName),
        ObservedGeneration: bp.Generation,
    })
    bpCopy.Status.CronJobName = cronJobName

    if err := r.Status().Update(ctx, bpCopy); err != nil {
        return ctrl.Result{}, fmt.Errorf("updating status: %w", err)
    }

    return ctrl.Result{}, nil
}

func (r *BackupPolicyReconciler) buildCronJob(bp *storagev1alpha1.BackupPolicy, name string) *batchv1.CronJob {
    suspend := bp.Spec.Suspended
    retentionArg := fmt.Sprintf("--retention-days=%d", bp.Spec.RetentionDays)

    cj := &batchv1.CronJob{
        ObjectMeta: metav1.ObjectMeta{
            Name:      name,
            Namespace: bp.Namespace,
            Labels: map[string]string{
                "app.kubernetes.io/managed-by": "backup-operator",
                "backuppolicy":                 bp.Name,
            },
        },
        Spec: batchv1.CronJobSpec{
            Schedule: bp.Spec.Schedule,
            Suspend:  &suspend,
            JobTemplate: batchv1.JobTemplateSpec{
                Spec: batchv1.JobSpec{
                    Template: corev1.PodTemplateSpec{
                        Spec: corev1.PodSpec{
                            RestartPolicy: corev1.RestartPolicyOnFailure,
                            Containers: []corev1.Container{
                                {
                                    Name:    "backup",
                                    Image:   "backup-tool:latest",
                                    Args:    []string{retentionArg},
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    // Set owner reference — CronJob is garbage-collected when BackupPolicy is deleted
    _ = ctrl.SetControllerReference(bp, cj, r.Scheme)
    return cj
}

// SetupWithManager registers the controller with the manager and declares what to watch.
func (r *BackupPolicyReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&storagev1alpha1.BackupPolicy{}).
        Owns(&batchv1.CronJob{}).    // reconcile BackupPolicy when owned CronJob changes
        Complete(r)
}

Step 4: Install the CRD and Run Locally

# Install the CRD into the cluster
make install
customresourcedefinition.apiextensions.k8s.io/backuppolicies.storage.example.com created
# Run the controller locally (outside the cluster)
make run
2026-04-25T08:00:00Z  INFO  Starting manager
2026-04-25T08:00:00Z  INFO  Starting workers  {"controller": "backuppolicy", "worker count": 1}

In a separate terminal:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: default
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
EOF

Watch the controller output:

2026-04-25T08:01:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

Check the result:

kubectl get bp nightly
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       True    10s
kubectl get cronjob nightly-backup
NAME             SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
nightly-backup   0 2 * * *   False     0        <none>          10s

Test self-healing — delete the CronJob and watch the controller recreate it:

kubectl delete cronjob nightly-backup
# Controller output:
# 2026-04-25T08:02:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

kubectl get cronjob nightly-backup
# Back within seconds

Test suspend:

kubectl patch bp nightly --type=merge -p '{"spec":{"suspended":true}}'
kubectl get cronjob nightly-backup -o jsonpath='{.spec.suspend}'
# true

Step 5: Deploy to Cluster

When ready for in-cluster deployment:

# Build and push the controller image
make docker-build docker-push IMG=your-registry/backup-operator:v0.1.0

# Deploy to cluster (creates Deployment, RBAC, CRD)
make deploy IMG=your-registry/backup-operator:v0.1.0
kubectl get pods -n backup-operator-system
NAME                                          READY   STATUS    RESTARTS   AGE
backup-operator-controller-manager-abc123     2/2     Running   0          30s

Understanding the RBAC Markers

The //+kubebuilder:rbac:... comments in the controller generate the ClusterRole YAML when you run make manifests:

//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

Generated YAML under config/rbac/role.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: manager-role
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

This approach keeps RBAC co-located with the code that needs it — if you add a new resource access in the controller, you add the marker next to it.


⚠ Common Mistakes

Not setting an owner reference on child resources. Without ctrl.SetControllerReference(parent, child, scheme), deleting the BackupPolicy leaves orphaned CronJobs. Owner references enable automatic garbage collection of child resources.

Updating the object after r.Get() without handling conflicts. If two reconciles run concurrently (possible after a controller restart), both may try to update the same resource. The API server uses resource version for optimistic concurrency — you will get a conflict error. Retry the reconcile on conflict errors rather than failing.

Writing to bp directly instead of bp.DeepCopy() for status updates. If the status update fails and you retry, the original bp object now has the modified status in memory. Always update a deep copy when writing status so the in-memory state stays consistent with what was actually persisted.

Not watching owned resources. If you forget .Owns(&batchv1.CronJob{}) in SetupWithManager, the controller will not reconcile when a CronJob is deleted. Self-healing requires watching the resources you manage.


Quick Reference

# Scaffold a new API + controller
kubebuilder create api --group mygroup --version v1alpha1 --kind MyKind

# Regenerate deep copy methods after changing types
make generate

# Regenerate CRD YAML + RBAC from markers
make manifests

# Install CRD into current cluster
make install

# Run controller locally (outside cluster)
make run

# Build + push image, then deploy to cluster
make docker-build docker-push IMG=registry/operator:tag
make deploy IMG=registry/operator:tag

# Uninstall CRD (WARNING: deletes all instances)
make uninstall

Key Takeaways

  • kubebuilder scaffolds the project; you write the types and the reconcile function
  • Go struct markers (//+kubebuilder:...) generate the CRD YAML and RBAC — keep them close to the code they describe
  • ctrl.SetControllerReference enables automatic garbage collection of child resources
  • Always deep-copy the object before writing status; retry on conflict errors
  • make run runs the controller locally — no Docker build needed during development

What’s Next

EP08: Kubernetes CRD Versioning covers how to evolve the BackupPolicy schema from v1alpha1 to v1 without breaking existing clients — storage versions, conversion webhooks, and the hub-and-spoke model for safe API evolution in production clusters.

Get EP08 in your inbox when it publishes → subscribe at linuxcent.com

The Kubernetes Controller Reconcile Loop: How CRDs Come Alive at Runtime

Reading Time: 7 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 6
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • The Kubernetes controller reconcile loop is the mechanism that makes CRDs do something — it watches custom resources, compares desired state (spec) to actual state, and takes actions to close the gap
    (reconcile = “make actual match desired”; the loop runs repeatedly because the world is not static — things drift, fail, and change)
  • Controllers do not receive events like webhooks — they receive object names from a work queue, then re-read the full object from the API server cache
  • The reconcile function is idempotent: calling it ten times with the same object must produce the same result as calling it once
  • controller-runtime is the Go library that provides the informer cache, work queue, and reconciler interface — kubebuilder scaffolds controllers on top of it
  • Kubernetes uses the same reconcile loop internally — the Deployment controller, ReplicaSet controller, and node lifecycle controller all follow this exact pattern
  • A failed reconcile returns an error or explicit requeue request; the controller retries with exponential backoff, not an infinite tight loop

The Big Picture

  THE KUBERNETES CONTROLLER RECONCILE LOOP

  etcd
   │ change event
   ▼
  Informer cache
  (API server-side list+watch,
   local in-memory replica)
   │ cache update → enqueue object name
   ▼
  Work queue
  (rate-limited, deduplicating)
   │ dequeue: "demo/nightly"
   ▼
  Reconcile(ctx, Request{Name, Namespace})
   │
   ├── 1. Fetch object from cache
   │        if not found → ignore (already deleted)
   │
   ├── 2. Read spec (desired state)
   │
   ├── 3. Read actual state
   │        (check child resources, external systems)
   │
   ├── 4. Compare: actual vs desired
   │
   ├── 5. Act: create/update/delete child resources
   │        OR update external system
   │
   └── 6. Update status with outcome
           └── return Result{}, nil      → done
               return Result{Requeue}, nil → retry after delay
               return Result{}, err     → immediate retry + backoff

The Kubernetes controller reconcile loop is what separates a CRD (validated storage) from an operator (automated behavior). Understanding this loop is the prerequisite for writing controllers that work correctly under failure, partial completion, and concurrent modification.


What “Reconcile” Actually Means

Reconcile means: look at what the user asked for (spec), look at what actually exists, and do whatever is needed to make actual match desired.

The key insight is that this is not event-driven in the traditional sense. A controller does not receive a “diff” — it receives a name. It reads the full current state of the object and acts accordingly.

This matters because:

  1. Multiple events get deduplicated. If a BackupPolicy is updated five times in one second, the work queue delivers one reconcile call, not five.
  2. The reconcile is stateless. The controller should not maintain in-memory state about what it “did last time.” It re-reads everything on each reconcile.
  3. Partial failure is safe. If the reconcile fails halfway through, the next reconcile re-reads actual state and continues from where it left off.

The Informer Cache

Controllers do not call the API server directly for every read. They use an informer — a list-and-watch mechanism that maintains a local in-memory copy of all objects of a given type.

  HOW THE INFORMER CACHE WORKS

  Controller startup:
  ┌─────────────────────────────────────────────────────┐
  │ 1. List all BackupPolicies from API server          │
  │    → populate local cache                           │
  │ 2. Establish a Watch stream                         │
  │    → receive incremental updates                    │
  │ 3. For each update: update cache + enqueue object   │
  └─────────────────────────────────────────────────────┘

  On reconcile:
  ┌─────────────────────────────────────────────────────┐
  │ controller reads from LOCAL cache (not API server)  │
  │ → fast, no network round-trip per reconcile         │
  │ → cache is eventually consistent                    │
  └─────────────────────────────────────────────────────┘

Cache consistency: After writing a change (creating a child Secret, for example), re-reading from the cache may return the old state for a brief period. This is normal and expected. Well-written controllers handle this by returning a requeue rather than assuming the write is immediately visible.


Walking Through a Reconcile for BackupPolicy

Suppose a user creates this BackupPolicy:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
  targets:
    - namespace: production

The controller’s reconcile function runs. Here is what it does conceptually:

Reconcile(ctx, {Namespace: "demo", Name: "nightly"})

Step 1: Fetch BackupPolicy "demo/nightly" from cache
  → found; spec.schedule = "0 2 * * *", spec.retentionDays = 30

Step 2: Check if a CronJob for this BackupPolicy exists
  → kubectl get cronjob nightly-backup -n demo
  → not found

Step 3: Gap detected: CronJob should exist but doesn't
  → Create CronJob "nightly-backup" in namespace "demo"
    spec.schedule = "0 2 * * *"
    spec.jobTemplate.spec.template.spec.containers[0].args = ["--retention=30"]

Step 4: Set owner reference on CronJob pointing to BackupPolicy
  → CronJob is now garbage-collected if BackupPolicy is deleted

Step 5: Update BackupPolicy status
  → conditions: [{type: Ready, status: True, reason: CronJobCreated}]
  → lastScheduleTime: null (not yet run)

Step 6: Return Result{}, nil   → reconcile complete

Next time the BackupPolicy is modified (e.g., suspended: true):

Reconcile(ctx, {Namespace: "demo", Name: "nightly"})

Step 1: Fetch → spec.suspended = true

Step 2: Fetch CronJob "nightly-backup"
  → found; spec.suspend = false  ← actual state

Step 3: Gap: CronJob.spec.suspend should be true but is false
  → Patch CronJob: set spec.suspend = true

Step 4: Update status
  → conditions: [{type: Ready, status: True, reason: Suspended}]

Step 5: Return Result{}, nil

Idempotency: The Essential Property

The reconcile function must be idempotent. If it runs ten times with the same object state, the result must be the same as if it ran once.

Why? Because the controller framework delivers at-least-once semantics — your reconcile function will be called more than once for the same object state, especially at startup (the informer re-lists all objects) and after controller restarts.

Non-idempotent (wrong):

// Creates a new CronJob every time, even if one already exists
err := r.Create(ctx, cronJob)

Idempotent (correct):

// Only creates if it doesn't exist; updates if it does
existing := &batchv1.CronJob{}
err := r.Get(ctx, types.NamespacedName{Name: jobName, Namespace: ns}, existing)
if apierrors.IsNotFound(err) {
    err = r.Create(ctx, cronJob)
} else if err == nil {
    // update if spec differs
    existing.Spec = cronJob.Spec
    err = r.Update(ctx, existing)
}

The get-before-create pattern is the most basic idempotency mechanism. controller-runtime provides CreateOrUpdate helpers that codify this.


Requeue and Retry Semantics

The reconcile function returns a (Result, error) pair:

return Result{}, nil
  → Reconcile succeeded. Re-run only if object changes again.

return Result{RequeueAfter: 5 * time.Minute}, nil
  → Reconcile succeeded, but requeue in 5 minutes regardless.
  → Used for: polling external system, TTL-based refresh.

return Result{Requeue: true}, nil
  → Requeue immediately (with rate limiting).
  → Used for: cache not yet consistent after a write.

return Result{}, err
  → Reconcile failed. Retry with exponential backoff.
  → Used for: API errors, transient failures.
  RETRY BEHAVIOR

  First failure  → retry after ~1s
  Second failure → retry after ~2s
  Third failure  → retry after ~4s
  ...
  Max backoff    → ~16min (controller-runtime default)

  Object changes (new version from informer) → reset backoff, reconcile immediately

Do not return Result{Requeue: true}, nil in a tight loop — this saturates the work queue and starves other objects. If you need to poll, use RequeueAfter with a meaningful interval.


Watches: What Triggers a Reconcile

The controller does not only watch the primary resource (BackupPolicy). It also watches child resources and maps child changes back to the parent:

  WATCH CONFIGURATION (conceptual)

  Controller watches:
    BackupPolicy (primary) → reconcile when BackupPolicy changes
    CronJob (child/owned)  → reconcile BackupPolicy owner when CronJob changes
    ConfigMap (watched)    → reconcile BackupPolicy when referenced ConfigMap changes

If a user accidentally deletes the CronJob that the controller created:

  1. CronJob deletion event arrives in the informer
  2. Controller maps the deleted CronJob → its owner BackupPolicy
  3. BackupPolicy is enqueued
  4. Reconcile runs, detects missing CronJob, recreates it

This “self-healing” behavior — where controllers reconcile the world back to desired state — is the core operational value of operators. It is not magic; it is the result of watching child resources and re-running reconcile when they drift.


Level-Triggered vs Edge-Triggered

Kubernetes controllers are level-triggered, not edge-triggered. This distinction matters:

  EDGE-TRIGGERED (not what Kubernetes uses)
  → "BackupPolicy was updated FROM retained-30 TO retained-7"
  → If event is lost, the update is lost forever

  LEVEL-TRIGGERED (what Kubernetes uses)
  → "BackupPolicy exists with retentionDays=7"
  → On every reconcile, the controller reads the current level (state)
  → Missing an event is safe — the next reconcile corrects the state

Level-triggered design is why controllers survive restarts, network partitions, and lost events gracefully. The reconcile does not need to track “what changed” — it only needs to know “what is the desired state right now.”


The Same Pattern in Kubernetes Core

Every built-in Kubernetes controller follows this loop:

Controller Watches Manages Reconciles
Deployment controller Deployment ReplicaSets desired replicas ↔ actual ReplicaSet count
ReplicaSet controller ReplicaSet Pods desired replicas ↔ running Pod count
Node lifecycle controller Node Node conditions NotReady nodes → taint, evict pods
Service controller (cloud) Service LoadBalancer cloud LB exists ↔ Service spec

The BackupPolicy controller you will build in EP07 follows exactly the same structure as the Deployment controller.


⚠ Common Mistakes

Reading from the API server directly instead of the cache. Every reconcile reading directly from the API server (not the informer cache) creates N×M load on the API server as the number of objects and reconcile frequency grows. Always read via the controller’s cached client.

Not handling “not found” on object fetch. If a reconcile is triggered but the object has been deleted by the time reconcile runs, the cache returns “not found.” This is normal — the correct response is to return Result{}, nil, not an error.

Tight requeue loop on recoverable error. Returning Result{Requeue: true}, nil or Result{}, err on every call creates an infinite busy-loop. Use RequeueAfter for expected wait conditions, and only return errors for unexpected failures that should back off.

Mutable reconcile state. Do not store reconcile state in struct fields on the reconciler. The reconciler is shared across goroutines; mutable fields cause race conditions. Everything transient must be local to the reconcile function.


Quick Reference

Reconcile input:
  ctx context.Context
  req ctrl.Request   → {Namespace: "demo", Name: "nightly"}

Reconcile output:
  (ctrl.Result, error)

Common returns:
  Result{}, nil                        → done, wait for next change
  Result{Requeue: true}, nil           → retry now (rate limited)
  Result{RequeueAfter: 5*time.Minute}  → retry in 5 minutes
  Result{}, err                        → retry with backoff

Key operations:
  r.Get(ctx, req.NamespacedName, &obj)     → fetch from cache
  r.Create(ctx, &obj)                      → create in API server
  r.Update(ctx, &obj)                      → full update
  r.Patch(ctx, &obj, patch)                → partial update
  r.Delete(ctx, &obj)                      → delete
  r.Status().Update(ctx, &obj)             → update status only

Key Takeaways

  • The reconcile loop reads desired state from spec, reads actual state from the cluster, and closes the gap — on every trigger, not just on changes
  • Controllers use an informer cache for reads — fast, eventually consistent, does not hammer the API server
  • Idempotency is not optional: the reconcile function will be called multiple times with the same state
  • Level-triggered design means missing events is safe — the next reconcile corrects any drift
  • Return values from reconcile control retry behavior: RequeueAfter for polling, err for failures, nil for success

What’s Next

EP07: Build a Simple Kubernetes Operator with controller-runtime puts the reconcile loop into practice — kubebuilder scaffold, a complete reconciler for BackupPolicy, RBAC markers, and running the operator locally against a real cluster.

Get EP07 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD CEL Validation: Replace Admission Webhooks for Schema Rules

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 5
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes CRD CEL validation (x-kubernetes-validations) lets you write arbitrary validation rules in the CRD schema — no admission webhook needed
    (CEL = Common Expression Language, a lightweight expression language built into Kubernetes since 1.25 stable; replaces most reasons you would write a validating admission webhook)
  • CEL rules are evaluated by the API server at admit time — the same place as OpenAPI schema validation, before etcd
  • self refers to the current object’s field; oldSelf refers to the previous value (for update rules)
  • Cross-field validation: “if storageClass is premium, retentionDays must be ≤ 90″ — impossible with plain OpenAPI schema, trivial with CEL
  • Immutable fields: oldSelf == self with reason: Immutable prevents users from changing values after creation
  • CEL rules run in ~microseconds inside the API server; no external service, no TLS, no latency budget to manage

The Big Picture

  CEL VALIDATION: WHERE IT FITS IN THE ADMISSION CHAIN

  kubectl apply -f backup.yaml
         │
         ▼
  API Server admission chain
  ┌────────────────────────────────────────────────────┐
  │                                                    │
  │  1. Mutating admission webhooks (modify object)    │
  │  2. Schema validation (OpenAPI types, required,    │
  │     minimum/maximum, pattern)                      │
  │  3. CEL validation (x-kubernetes-validations)  ←  │ THIS EPISODE
  │  4. Validating admission webhooks (external)       │
  │                                                    │
  └────────────────────────────────────────────────────┘
         │
         ▼ (passes all checks)
  etcd storage

Kubernetes CRD CEL validation sits between schema validation and external webhooks. For most validation requirements, CEL eliminates the need for a webhook entirely — which means no separate deployment to maintain, no TLS certificates to rotate, no availability dependency between your CRD and a webhook server.


Why CEL Replaces Most Admission Webhooks

Before CEL (stable in Kubernetes 1.25), the only way to express “if field A has value X, field B must be present” was an admission webhook — a separate HTTP server that Kubernetes called synchronously during every API request.

Webhooks work, but they have real costs:

  • Availability dependency: if the webhook is down, creates/updates for that resource type fail
  • TLS management: webhook endpoints require valid TLS certs that must be rotated
  • Deployment overhead: another Deployment, Service, and certificate to manage
  • Latency: every API operation waits for an HTTP round-trip

CEL runs inside the API server process. There is no network call, no certificate, no separate deployment. Rules are compiled once and evaluated in microseconds.

The trade-off: CEL cannot make network calls or access state outside the object being validated. For rules that need to look up other resources (e.g., “does this referenced Secret exist?”), you still need a webhook or a controller that validates via status conditions.


CEL Syntax Basics

CEL expressions are small programs. In Kubernetes CRD validation, the key variables are:

Variable Meaning
self The current field value (or root object at top level)
oldSelf The previous value of the field (only available on update; nil on create)

CEL returns true (validation passes) or false (validation fails, API returns error).

Common patterns:

# String not empty
self.size() > 0

# String matches format
self.matches('^[a-z][a-z0-9-]*$')

# Integer in range
self >= 1 && self <= 365

# Field present (for optional fields)
has(self.fieldName)

# Conditional: if A then B
!has(self.premium) || self.retentionDays <= 90

# List not empty
self.size() > 0

# All items in list satisfy condition
self.all(item, item.namespace.size() > 0)

# Cross-field: access sibling field via parent
self.retentionDays >= self.minRetentionDays

Adding CEL Rules to the BackupPolicy CRD

Start from the CRD built in EP04. Add x-kubernetes-validations at the levels where you need them.

Rule 1: Cron expression validation

The OpenAPI pattern field can validate basic structure, but a proper cron regex is unwieldy. CEL is cleaner:

spec:
  type: object
  required: ["schedule", "retentionDays"]
  x-kubernetes-validations:
    - rule: "self.schedule.matches('^(\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+) (\\\\*|[0-9,\\\\-\\\\/]+)$')"
      message: "schedule must be a valid 5-field cron expression"

Rule 2: Cross-field validation

spec:
  type: object
  x-kubernetes-validations:
    - rule: "!(self.storageClass == 'premium') || self.retentionDays <= 90"
      message: "premium storage class supports at most 90 days retention"
    - rule: "!self.suspended || !has(self.pausedBy) || self.pausedBy.size() > 0"
      message: "when suspended is true, pausedBy must be non-empty if provided"

Rule 3: Immutable fields

Once a BackupPolicy is created, the schedule field should not be changeable without deleting and recreating:

schedule:
  type: string
  x-kubernetes-validations:
    - rule: "self == oldSelf"
      message: "schedule is immutable after creation"
      reason: Immutable

reason field: Available reasons are FieldValueInvalid (default), FieldValueForbidden, FieldValueRequired, and Immutable. Using Immutable returns HTTP 422 with a clear message that the field cannot be changed.

Rule 4: Conditional required field

If storageClass is encrypted, then encryptionKeyRef must be present:

spec:
  type: object
  x-kubernetes-validations:
    - rule: "self.storageClass != 'encrypted' || has(self.encryptionKeyRef)"
      message: "encryptionKeyRef is required when storageClass is 'encrypted'"

Rule 5: List element validation

Ensure each target namespace is a valid RFC 1123 DNS label:

targets:
  type: array
  items:
    type: object
    x-kubernetes-validations:
      - rule: "self.namespace.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?$')"
        message: "namespace must be a valid DNS label"

The Complete Updated CRD with CEL

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  scope: Namespaced
  names:
    plural:     backuppolicies
    singular:   backuppolicy
    kind:       BackupPolicy
    shortNames: [bp]
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: ["spec"]
          properties:
            spec:
              type: object
              required: ["schedule", "retentionDays"]
              x-kubernetes-validations:
                - rule: "!(self.storageClass == 'premium') || self.retentionDays <= 90"
                  message: "premium storage class supports at most 90 days retention"
              properties:
                schedule:
                  type: string
                  x-kubernetes-validations:
                    - rule: "self == oldSelf"
                      message: "schedule is immutable after creation"
                      reason: Immutable
                retentionDays:
                  type: integer
                  minimum: 1
                  maximum: 365
                storageClass:
                  type: string
                  default: "standard"
                  enum: ["standard", "premium", "encrypted", "archive"]
                encryptionKeyRef:
                  type: string
                targets:
                  type: array
                  maxItems: 20
                  items:
                    type: object
                    required: ["namespace"]
                    x-kubernetes-validations:
                      - rule: "self.namespace.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?$')"
                        message: "namespace must be a valid DNS label"
                    properties:
                      namespace:
                        type: string
                      includeSecrets:
                        type: boolean
                        default: false
                suspended:
                  type: boolean
                  default: false
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}
      additionalPrinterColumns:
        - name: Schedule
          type: string
          jsonPath: .spec.schedule
        - name: Retention
          type: integer
          jsonPath: .spec.retentionDays
        - name: Ready
          type: string
          jsonPath: .status.conditions[?(@.type=='Ready')].status
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Testing CEL Rules

Apply the updated CRD:

kubectl apply -f backuppolicies-crd-cel.yaml

Test cross-field validation:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: premium-long
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 180          # violates: premium + > 90 days
  storageClass: premium
EOF
The BackupPolicy "premium-long" is invalid:
  spec: Invalid value: "object":
    premium storage class supports at most 90 days retention

Test immutability:

# Create valid policy
kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: immutable-test
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
EOF

# Try to change the schedule
kubectl patch bp immutable-test -n demo \
  --type=merge -p '{"spec":{"schedule":"0 3 * * *"}}'
The BackupPolicy "immutable-test" is invalid:
  spec.schedule: Invalid value: "0 3 * * *":
    schedule is immutable after creation

Test list element validation:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad-namespace
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: 7
  targets:
    - namespace: "UPPERCASE_IS_INVALID"
EOF
The BackupPolicy "bad-namespace" is invalid:
  spec.targets[0]: Invalid value: "object":
    namespace must be a valid DNS label

CEL Cost and Limits

CEL expressions are evaluated at admission time in the API server. Kubernetes imposes cost limits to prevent expressions from consuming excessive CPU:

  • Each expression is assigned a cost based on its operations (string matches, list iteration, etc.)
  • If the expression cost exceeds the per-validation limit, the API server rejects the CRD itself when you apply it
  • Complex all() over large lists is the most common way to hit cost limits

If you hit a cost limit error:

CustomResourceDefinition is invalid: spec.validation.openAPIV3Schema...
  CEL expression cost exceeds budget

Solutions:
– Reduce list traversal in CEL rules; enforce list length with maxItems instead
– Split one expensive rule into multiple simpler rules
– Move the expensive validation to a controller (status condition) rather than admission


⚠ Common Mistakes

Using oldSelf on create. On create operations, oldSelf is nil/unset. A rule like self == oldSelf for immutability will panic on create unless you guard it: oldSelf == null || self == oldSelf. In practice, Kubernetes applies immutable rules only on updates (the reason: Immutable annotation helps here), but be explicit in rules that reference oldSelf.

Forgetting has() checks for optional fields. If encryptionKeyRef is optional (not in required) and you write a rule like self.encryptionKeyRef.size() > 0, it will fail with a “no such key” error when the field is absent. Always guard optional field access with has(self.fieldName).

Overloading CEL for what a controller should do. CEL validates fields at admission. If your rule needs to verify that a referenced Secret actually exists, CEL cannot do that — it only sees the object being submitted. Use a controller status condition for existence checks, not CEL.


Quick Reference: Common CEL Patterns

# String not empty
self.size() > 0

# String matches regex
self.matches('^[a-z][a-z0-9-]{1,62}$')

# Optional field guard
!has(self.fieldName) || self.fieldName.size() > 0

# Conditional requirement
!(condition) || has(self.requiredWhenConditionIsTrue)

# Immutable field (update only)
self == oldSelf

# All list items satisfy condition
self.all(item, item.namespace.size() > 0)

# At least one list item satisfies condition
self.exists(item, item.type == 'primary')

# Cross-field comparison
self.minReplicas <= self.maxReplicas

# Enum-style check
self.in(['standard', 'premium', 'archive'])

Key Takeaways

  • x-kubernetes-validations with CEL rules replaces most validating admission webhooks for CRD-specific logic
  • CEL runs inside the API server — no external service, no TLS, no separate deployment
  • Cross-field validation, immutable fields, and conditional requirements are all expressible in CEL
  • Use has() guards for optional fields; use oldSelf carefully (it is nil on create)
  • CEL has cost limits — avoid unbounded list iteration; use maxItems to bound lists first

What’s Next

EP06: The Kubernetes Controller Reconcile Loop explains how a controller watches BackupPolicy objects and acts on them — the mechanism that makes CRDs useful beyond validated configuration storage. Before writing code in EP07, you need to understand the reconcile loop conceptually.

Get EP06 in your inbox when it publishes → subscribe at linuxcent.com

Write Your First Kubernetes CRD: A Hands-On YAML Walkthrough

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 4
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Writing a Kubernetes CRD requires five YAML files: the CRD itself, a ClusterRole/ClusterRoleBinding, a namespaced Role/RoleBinding for consumers, and a sample custom resource
  • The BackupPolicy CRD built in this episode is the running example throughout the rest of the series — operators, versioning, and production patterns all use it
  • Apply the CRD, verify it with kubectl get crds, create a custom resource, and watch the API server validate your spec
  • RBAC for CRDs follows the same Role/ClusterRole model as built-in resources — the generated resource name is {plural}.{group}
  • Schema validation fires at apply time: bad field types, missing required fields, and out-of-range values all return clear errors before anything reaches etcd
  • Without a controller, a BackupPolicy is stored in etcd but nothing acts on it — that is the topic of EP05 and EP07

The Big Picture

  WHAT WE'RE BUILDING IN THIS EPISODE

  1. backuppolicies-crd.yaml        ← registers the BackupPolicy type
  2. backuppolicies-rbac.yaml       ← controls who can create/view/delete
  3. nightly-backup.yaml            ← our first custom resource instance

  After applying:

  kubectl get crds | grep backup      ← BackupPolicy type exists
  kubectl get backuppolicies -n demo  ← nightly instance exists
  kubectl describe bp nightly -n demo ← spec visible, status empty
  kubectl apply -f bad-backup.yaml    ← schema validation rejects bad data

Writing your first Kubernetes CRD is the step that bridges understanding CRDs conceptually to operating them in a real cluster. This episode is hands-on — every block of YAML is something you apply and verify.


Prerequisites

You need a running Kubernetes cluster and kubectl configured. Any of these work:

# Local options
kind create cluster --name crd-demo
# or
minikube start

# Verify cluster access
kubectl cluster-info
kubectl get nodes

Step 1: Write the CRD

Save this as backuppolicies-crd.yaml:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  scope: Namespaced
  names:
    plural:     backuppolicies
    singular:   backuppolicy
    kind:       BackupPolicy
    shortNames:
      - bp
    categories:
      - storage
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: ["spec"]
          properties:
            spec:
              type: object
              required: ["schedule", "retentionDays"]
              properties:
                schedule:
                  type: string
                  description: "Cron expression (e.g. '0 2 * * *' for 02:00 daily)"
                retentionDays:
                  type: integer
                  minimum: 1
                  maximum: 365
                  description: "How many days to retain backup snapshots"
                storageClass:
                  type: string
                  default: "standard"
                  description: "StorageClass to use for backup volumes"
                targets:
                  type: array
                  description: "Namespaces and resources to include in the backup"
                  maxItems: 20
                  items:
                    type: object
                    required: ["namespace"]
                    properties:
                      namespace:
                        type: string
                      includeSecrets:
                        type: boolean
                        default: false
                suspended:
                  type: boolean
                  default: false
                  description: "Set to true to pause backup execution"
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}
      additionalPrinterColumns:
        - name: Schedule
          type: string
          jsonPath: .spec.schedule
        - name: Retention
          type: integer
          jsonPath: .spec.retentionDays
        - name: Suspended
          type: boolean
          jsonPath: .spec.suspended
        - name: Ready
          type: string
          jsonPath: .status.conditions[?(@.type=='Ready')].status
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Apply it:

kubectl apply -f backuppolicies-crd.yaml

Verify it registered correctly:

kubectl get crds backuppolicies.storage.example.com
NAME                                    CREATED AT
backuppolicies.storage.example.com      2026-04-25T08:00:00Z

Check the API server now knows about it:

kubectl api-resources | grep backuppolic
backuppolicies    bp    storage.example.com/v1alpha1    true    BackupPolicy

Check it is Established:

kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.conditions[?(@.type=="Established")].status}'
True

If you see False or empty output, wait a few seconds and retry — the API server takes a moment to register new CRDs.


Step 2: Write RBAC

CRDs follow the same RBAC model as built-in resources. The resource name is {plural}.{group}.

Save this as backuppolicies-rbac.yaml:

# ClusterRole for operators/controllers that manage BackupPolicy objects
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
---
# Role for application teams to manage BackupPolicies in their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# Read-only role for auditors
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]
kubectl apply -f backuppolicies-rbac.yaml

Verify the roles exist:

kubectl get clusterrole | grep backuppolicy
backuppolicy-controller   2026-04-25T08:01:00Z
backuppolicy-editor       2026-04-25T08:01:00Z
backuppolicy-viewer       2026-04-25T08:01:00Z

Note on backuppolicies/status: The separate status RBAC rule is only meaningful if you enabled the status subresource (we did). Without it, status and spec share the same update path.


Step 3: Create a Namespace and Your First Custom Resource

kubectl create namespace demo

Save this as nightly-backup.yaml:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: demo
  labels:
    app.kubernetes.io/managed-by: manual
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
  storageClass: standard
  targets:
    - namespace: production
      includeSecrets: false
    - namespace: staging
      includeSecrets: false
  suspended: false

Apply it:

kubectl apply -f nightly-backup.yaml

Get it back:

kubectl get backuppolicies -n demo
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       <none>  5s

The Ready column is <none> because there is no controller writing status yet. The custom resource exists and is stored in etcd, but nothing is acting on it.

Describe it:

kubectl describe bp nightly -n demo
Name:         nightly
Namespace:    demo
Labels:       app.kubernetes.io/managed-by=manual
Annotations:  <none>
API Version:  storage.example.com/v1alpha1
Kind:         BackupPolicy
Metadata:
  Creation Timestamp:  2026-04-25T08:05:00Z
  ...
Spec:
  Retention Days:  30
  Schedule:        0 2 * * *
  Storage Class:   standard
  Suspended:       false
  Targets:
    Include Secrets:  false
    Namespace:        production
    Include Secrets:  false
    Namespace:        staging
Status:
Events:  <none>

Step 4: Test Schema Validation

The API server now validates every BackupPolicy against the schema. Try creating an invalid one:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad-policy
  namespace: demo
spec:
  schedule: "not-a-cron"
  retentionDays: 500
EOF
The BackupPolicy "bad-policy" is invalid:
  spec.retentionDays: Invalid value: 500:
    spec.retentionDays in body should be less than or equal to 365

Missing required field:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: missing-schedule
  namespace: demo
spec:
  retentionDays: 7
EOF
The BackupPolicy "missing-schedule" is invalid:
  spec.schedule: Required value

Wrong type:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: wrong-type
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: "thirty"
EOF
The BackupPolicy "wrong-type" is invalid:
  spec.retentionDays: Invalid value: "string":
    spec.retentionDays in body must be of type integer: "string"

All validation fires at the API boundary — before etcd, before any controller sees the object.


Step 5: Verify Default Values Apply

The schema defines storageClass: default: "standard" and suspended: default: false. Verify they are applied even when not specified:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: minimal
  namespace: demo
spec:
  schedule: "0 0 * * 0"
  retentionDays: 7
EOF

kubectl get bp minimal -n demo -o jsonpath='{.spec.storageClass}'
standard
kubectl get bp minimal -n demo -o jsonpath='{.spec.suspended}'
false

Defaults are injected by the API server at admission time. They appear in etcd and in every kubectl get -o yaml output — the stored object includes the defaults even if the user did not specify them.


Step 6: Explore the API Endpoints

Your custom resource is now available at standard REST endpoints:

kubectl proxy --port=8001 &

# List all BackupPolicies in the demo namespace
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies \
  | jq '.items[].metadata.name'
"nightly"
"minimal"
# Get a specific BackupPolicy
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies/nightly \
  | jq '.spec'

This is how controllers discover and watch custom resources — via the same API server endpoints, using informers that wrap these REST calls with efficient list-and-watch semantics.


Step 7: Clean Up

kubectl delete namespace demo
kubectl delete -f backuppolicies-rbac.yaml
kubectl delete -f backuppolicies-crd.yaml   # WARNING: deletes all BackupPolicy instances first

⚠ Common Mistakes

metadata.name does not match {plural}.{group}. The most common error. If you name the CRD backuppolicy.storage.example.com (singular) but the spec says plural: backuppolicies, the API server rejects it. The name must always be {plural}.{group}.

No required fields on spec. Without required constraints, kubectl apply accepts an empty spec: {}. The controller then receives objects with no configuration and has to handle the nil case. Define required fields in the schema.

Forgetting subresources: status: {}. Without this, controllers writing .status also overwrite .spec on full PUT updates. This causes status updates to reset user edits. Enable the status subresource from day one.

Not testing validation errors. Schema validation is the first line of defense. Always explicitly test that your required fields are required, types are enforced, and range constraints work — before deploying the controller.


Quick Reference

# All kubectl operations work on custom resources
kubectl get      backuppolicies -n demo
kubectl get      bp -n demo                  # shortName
kubectl describe bp nightly -n demo
kubectl edit     bp nightly -n demo
kubectl delete   bp nightly -n demo

# Output formats
kubectl get bp -n demo -o yaml
kubectl get bp -n demo -o json
kubectl get bp -n demo -o jsonpath='{.items[*].metadata.name}'

# Watch for changes
kubectl get bp -n demo -w

# List across all namespaces
kubectl get bp -A

# Patch spec
kubectl patch bp nightly -n demo \
  --type=merge -p '{"spec":{"suspended":true}}'

Key Takeaways

  • A working CRD deployment needs: the CRD YAML, RBAC ClusterRoles, and at least one sample custom resource
  • The API server validates all custom resources against the schema at apply time — errors are surfaced immediately, not inside the controller
  • Default values in the schema are injected at admission time and appear in every stored object
  • RBAC for custom resources uses {plural}.{group} as the resource name — status and finalizers are separate sub-resources
  • Without a controller, custom resources are stored in etcd and serve as validated configuration — nothing acts on them until a controller is deployed

What’s Next

EP05: Kubernetes CRD CEL Validation extends schema validation beyond simple type and range checks — cross-field rules (“if storageClass is premium, retentionDays must be at most 90″), regex validation beyond pattern, and immutable field enforcement. All without an admission webhook.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Schema Explained: Versions, Validation, and Status Subresource

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 3
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • The Kubernetes CRD schema is defined in spec.versions[].schema.openAPIV3Schema — the API server uses it to validate every custom resource create and update before storing in etcd
    (OpenAPI v3 schema = a JSON Schema dialect that describes the structure, types, and constraints of your resource’s fields)
  • spec.versions is a list — CRDs can serve multiple API versions simultaneously; exactly one version must have storage: true
  • scope: Namespaced vs scope: Cluster controls whether custom resources live inside a namespace or at cluster level (like PersistentVolume vs PersistentVolumeClaim)
  • spec.names defines the plural, singular, kind, and optional shortNames used in kubectl and RBAC
  • The status subresource (subresources.status: {}) separates user writes (spec) from controller writes (status) — enabling optimistic concurrency and kubectl status support
  • The scale subresource (subresources.scale) makes your custom resource compatible with kubectl scale and the HorizontalPodAutoscaler

The Big Picture

  ANATOMY OF A CUSTOMRESOURCEDEFINITION

  apiVersion: apiextensions.k8s.io/v1
  kind: CustomResourceDefinition
  metadata:
    name: {plural}.{group}        ← MUST be exactly this format
  spec:
    group: {group}                ← API group (e.g. storage.example.com)
    scope: Namespaced | Cluster   ← where instances live
    names:                        ← how kubectl refers to this resource
      plural: backuppolicies
      singular: backuppolicy
      kind: BackupPolicy
      shortNames: [bp]
    versions:                     ← can be a list; one must have storage: true
      - name: v1alpha1
        served: true              ← API server responds to this version
        storage: true             ← etcd stores objects in this version
        schema:
          openAPIV3Schema:        ← validation schema for ALL objects of this type
            type: object
            properties:
              spec: {...}
              status: {...}
        subresources:
          status: {}              ← enables separate status write path
          scale:                  ← enables kubectl scale + HPA
            specReplicasPath: .spec.replicas
            statusReplicasPath: .status.replicas
        additionalPrinterColumns: ← extra columns in kubectl get output
          - name: Schedule
            type: string
            jsonPath: .spec.schedule

Understanding the Kubernetes CRD schema is the prerequisite for writing a CRD that behaves correctly in production — validation catches bad data at the API boundary, the status subresource prevents controller race conditions, and scope determines your entire RBAC and multi-tenancy model.


spec.group and metadata.name

The group is a reverse-DNS identifier for your API. Convention:

storage.example.com     ← domain you control + functional area
monitoring.myteam.io
databases.platform.company.com

The CRD’s metadata.name must be exactly {plural}.{group}:

metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  names:
    plural: backuppolicies

If these do not match, the API server rejects the CRD with a validation error. This is the most common first-timer mistake.


spec.scope: Namespaced vs Cluster

  SCOPE DETERMINES WHERE INSTANCES LIVE

  Namespaced (scope: Namespaced)       Cluster (scope: Cluster)
  ─────────────────────────────         ──────────────────────────
  kubectl get backuppolicies -n prod    kubectl get clusterbackuppolicies
  kubectl get backuppolicies -A         (no -n flag, no namespace)

  Analogous to: Pod, Deployment,        Analogous to: PersistentVolume,
                ConfigMap                             ClusterRole, Node

Namespaced: Use when instances are per-tenant or per-application. Users with namespace-scoped RBAC can manage their own instances without cluster-admin. Most CRDs should be namespaced.

Cluster-scoped: Use when instances represent cluster-wide configuration — a ClusterIssuer (cert-manager), ClusterSecretStore (ESO), a StorageClass-like concept. Requires cluster-level RBAC to create/modify.

You cannot change scope after a CRD is created without deleting and recreating it (which deletes all instances). Choose carefully.


spec.versions: Serving Multiple API Versions

spec:
  versions:
    - name: v1alpha1
      served: true
      storage: false       # not stored; converted on read
      schema:
        openAPIV3Schema: {...}
    - name: v1beta1
      served: true
      storage: false
      schema:
        openAPIV3Schema: {...}
    - name: v1
      served: true
      storage: true        # etcd stores in this version
      schema:
        openAPIV3Schema: {...}

Rules:
served: true means the API server accepts requests at this version
served: false means the API server returns 404 for that version — use to deprecate
– Exactly one version must have storage: true — this is what gets written to etcd
– When a client requests a non-storage version, the API server converts on the fly (or calls your conversion webhook — see EP08)

Early in development, start with v1alpha1 storage: true. Promote to v1 when the schema is stable. EP08 covers how to do this without losing data.


spec.names: What kubectl Sees

spec:
  names:
    plural:     backuppolicies     # kubectl get backuppolicies
    singular:   backuppolicy       # kubectl get backuppolicy (also works)
    kind:       BackupPolicy       # used in YAML apiVersion/kind
    listKind:   BackupPolicyList   # optional; auto-derived if omitted
    shortNames:                    # kubectl get bp
      - bp
    categories:                    # kubectl get all includes this type
      - all

categories is worth noting: if you add all to categories, your custom resources appear when someone runs kubectl get all -n mynamespace. Most CRDs deliberately do not add this — it clutters get all output. Only add it if your resource is a primary operational concern.


schema.openAPIV3Schema: Validation

The schema is where you define field types, required fields, constraints, and descriptions. The API server validates every create and update against this schema before writing to etcd.

schema:
  openAPIV3Schema:
    type: object
    required: ["spec"]
    properties:
      spec:
        type: object
        required: ["schedule", "retentionDays"]
        properties:
          schedule:
            type: string
            description: "Cron expression for backup schedule"
            pattern: '^(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)$'
          retentionDays:
            type: integer
            minimum: 1
            maximum: 365
          storageClass:
            type: string
            default: "standard"        # default value (Kubernetes 1.17+)
          targets:
            type: array
            maxItems: 10
            items:
              type: object
              required: ["name"]
              properties:
                name:
                  type: string
                namespace:
                  type: string
                  default: "default"
      status:
        type: object
        x-kubernetes-preserve-unknown-fields: true   # controllers write arbitrary status

Field types available

Type Usage
string Text values; supports format, pattern, enum, minLength, maxLength
integer Whole numbers; supports minimum, maximum
number Floating point
boolean true/false
object Nested structure; use properties to define fields
array List; use items to define element schema; supports minItems, maxItems

x-kubernetes-preserve-unknown-fields: true

This tells the API server not to prune fields it does not know about. Use it on status (controllers write whatever they need) and on fields that are intentionally free-form (like a config field that accepts arbitrary YAML). Avoid it on spec — it bypasses validation.

Validation behavior in practice

# This will fail with a clear error:
kubectl apply -f - <<EOF
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad
  namespace: default
spec:
  schedule: "not-a-cron"    # fails pattern validation
  retentionDays: 500         # fails maximum: 365
EOF
The BackupPolicy "bad" is invalid:
  spec.schedule: Invalid value: "not-a-cron": spec.schedule in body should match
    '^(\*|[0-9,\-\/]+)\s+...'
  spec.retentionDays: Invalid value: 500: spec.retentionDays in body should be
    less than or equal to 365

Schema validation catches configuration mistakes at apply time, not at runtime inside a pod. This is one of the core advantages of expressing domain configuration as CRDs rather than ConfigMaps.


additionalPrinterColumns: What kubectl get Shows

By default, kubectl get backuppolicies shows only NAME and AGE. You can add columns:

additionalPrinterColumns:
  - name: Schedule
    type: string
    jsonPath: .spec.schedule
    description: Cron schedule for backups
  - name: Retention
    type: integer
    jsonPath: .spec.retentionDays
    priority: 1          # 0 = always shown; 1 = only with -o wide
  - name: Ready
    type: string
    jsonPath: .status.conditions[?(@.type=='Ready')].status
  - name: Age
    type: date
    jsonPath: .metadata.creationTimestamp

Result:

NAME        SCHEDULE      READY   AGE
nightly     0 2 * * *     True    3d
weekly      0 0 * * 0     False   7d

Good printer columns turn kubectl get into a useful operational dashboard. Include Ready (from status conditions) so operators can immediately see which custom resources are healthy without running kubectl describe.


The Status Subresource

subresources:
  status: {}

Without the status subresource, spec and status are part of the same object. Any user with update permission on the CRD can modify both. Controllers write status through the same path as users write spec.

With the status subresource enabled:
kubectl apply / kubectl patch only update spec — the status block is stripped
– Controllers use the /status subresource endpoint to write status
– RBAC can grant update on backuppolicies (spec) independently from update on backuppolicies/status

  WITHOUT status subresource:         WITH status subresource:
  ─────────────────────────            ──────────────────────────
  PUT /backuppolicies/nightly          PUT /backuppolicies/nightly
  → updates spec AND status            → updates spec only

                                       PUT /backuppolicies/nightly/status
                                       → updates status only (controller path)

Always enable the status subresource on production CRDs. The split between spec and status is fundamental to the Kubernetes API contract. Without it, a controller updating status can accidentally overwrite spec changes made by a user at the same time.


The Scale Subresource

subresources:
  scale:
    specReplicasPath: .spec.replicas
    statusReplicasPath: .status.replicas
    labelSelectorPath: .status.labelSelector

This makes your custom resource compatible with:

kubectl scale backuppolicy nightly --replicas=3

And with HorizontalPodAutoscaler targeting your custom resource. If your CRD manages something replica-based (workers, shards, connections), enabling the scale subresource lets it plug into the standard Kubernetes autoscaling ecosystem without extra plumbing.


⚠ Common Mistakes

Forgetting x-kubernetes-preserve-unknown-fields: true on status. If you validate the status field with a strict schema but do not add this, the API server will prune any status fields the controller writes that are not in the schema. The controller’s status updates will silently lose fields. Either define the full status schema or use x-kubernetes-preserve-unknown-fields: true.

Using scope: Cluster for resources that should be namespaced. Once a CRD is created as cluster-scoped, you cannot make it namespaced without deleting and recreating it. Plan scope before deploying to production.

Not enabling the status subresource. Without it, controllers writing status can race with users updating spec. It also means kubectl patch --subresource=status does not work and some tooling behaves unexpectedly. Enable it from the start.

Loose schema with no required fields. An openAPIV3Schema with no required constraint accepts objects with empty spec. This usually means your controller gets called with a resource that is missing mandatory configuration. Define required fields and validate them at the API boundary, not inside the controller.


Quick Reference

# Inspect the full schema of a CRD
kubectl get crd backuppolicies.storage.example.com -o yaml | \
  yq '.spec.versions[0].schema'

# Check what subresources are enabled
kubectl get crd certificates.cert-manager.io -o jsonpath=\
  '{.spec.versions[0].subresources}'

# See all served versions for a CRD
kubectl get crd prometheuses.monitoring.coreos.com \
  -o jsonpath='{.spec.versions[*].name}'

# Check which version is the storage version
kubectl get crd certificates.cert-manager.io \
  -o jsonpath='{.spec.versions[?(@.storage==true)].name}'

# Describe the printer columns for a CRD
kubectl get crd scaledobjects.keda.sh \
  -o jsonpath='{.spec.versions[0].additionalPrinterColumns}'

Key Takeaways

  • spec.versions allows serving and storing multiple API versions; only one version has storage: true
  • scope (Namespaced vs Cluster) cannot be changed after creation — choose deliberately
  • openAPIV3Schema validates every CR at the API boundary, before etcd storage
  • The status subresource separates the user write path (spec) from the controller write path (status) — always enable it
  • additionalPrinterColumns makes kubectl get operationally useful; include a Ready column from status conditions

What’s Next

EP04: Write Your First Kubernetes CRD puts the anatomy into practice — a complete hands-on walkthrough building a BackupPolicy CRD from scratch, applying it to a cluster, creating instances, and verifying validation, RBAC, and status behavior.

Get EP04 in your inbox when it publishes → subscribe at linuxcent.com

CRDs You Already Use: cert-manager, KEDA, and External Secrets Explained

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 2
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • cert-manager, KEDA, and External Secrets Operator are all CRD-based systems — understanding their custom resources shows you what a well-designed CRD looks like before you build one
  • cert-manager’s Certificate CRD expresses desired TLS state; the cert-manager controller reconciles that state by issuing, renewing, and storing certificates in Secrets
  • KEDA’s ScaledObject extends the HorizontalPodAutoscaler with external metrics (queue depth, Kafka lag, Prometheus queries) — the KEDA operator translates ScaledObjects into native HPA objects
  • External Secrets Operator’s ExternalSecret abstracts over secret backends (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) — the controller pulls values and writes Kubernetes Secrets
  • All three follow the same pattern: you describe desired state in a custom resource; the operator reconciles actual state to match
  • Kubernetes custom resources examples like these are the fastest way to internalize the CRD mental model before writing your own

The Big Picture

  THREE CRD-BASED OPERATORS AND WHAT THEY MANAGE

  ┌─────────────────────────────────────────────────────────────┐
  │  cert-manager                                               │
  │  Certificate CR  →  controller issues cert  →  TLS Secret  │
  └─────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │  KEDA                                                       │
  │  ScaledObject CR  →  controller creates HPA  →  Pod count  │
  └─────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │  External Secrets Operator                                  │
  │  ExternalSecret CR  →  controller pulls  →  K8s Secret      │
  │                         from Vault/AWS/GCP                  │
  └─────────────────────────────────────────────────────────────┘

  In every case:
  User creates CR  →  Operator watches CR  →  Operator acts  →  Status updated

Kubernetes custom resources examples from real tools like these reveal the design pattern you will use in every CRD you build: express desired state declaratively, let the controller bridge the gap to actual state, surface the outcome in the status subresource.


Why Look at Existing CRDs First?

Before designing your own CRD, you want to understand what good CRD design looks like from the user’s perspective. The engineers at Jetstack (cert-manager), KEDACORE (KEDA), and External Secrets contributors have collectively solved the same problems you will face:

  • What goes in spec vs status?
  • How do you reference other Kubernetes objects?
  • How do you handle secrets and credentials securely?
  • What does a healthy vs unhealthy custom resource look like?

Studying these before writing your own saves you from the most common first-timer mistakes.


cert-manager: The Certificate CRD

cert-manager is the most widely deployed CRD-based system in Kubernetes. It manages TLS certificates from Let’s Encrypt, internal CAs, and cloud providers.

The core CRDs

kubectl get crds | grep cert-manager
certificates.cert-manager.io
certificaterequests.cert-manager.io
challenges.acme.cert-manager.io
clusterissuers.cert-manager.io
issuers.cert-manager.io
orders.acme.cert-manager.io

The one you interact with most is Certificate. Here is a real example:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: production
spec:
  secretName: api-tls-cert        # cert-manager writes the TLS Secret here
  duration: 2160h                 # 90 days
  renewBefore: 720h               # renew 30 days before expiry
  subject:
    organizations:
      - example.com
  dnsNames:
    - api.example.com
    - api-internal.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

What happens after you apply this:

  1. cert-manager controller sees the new Certificate object
  2. It contacts the referenced ClusterIssuer (Let’s Encrypt in this case)
  3. It completes the ACME challenge, obtains the certificate
  4. It writes the certificate and private key into the api-tls-cert Secret
  5. It updates the Certificate object’s status to reflect success
kubectl describe certificate api-tls -n production
Status:
  Conditions:
    Last Transition Time:  2026-04-10T08:00:00Z
    Message:               Certificate is up to date and has not expired
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2026-07-09T08:00:00Z
  Not Before:              2026-04-10T08:00:00Z
  Renewal Time:            2026-06-09T08:00:00Z

What this teaches you about CRD design

  • spec.secretName — the CR references an output object by name. The controller creates or updates that object.
  • spec.issuerRef — the CR references another custom resource (ClusterIssuer) by name. This is a common pattern for separating configuration concerns.
  • status.conditions — the standard Kubernetes condition pattern: type, status, reason, message. You will use the same structure in your own CRDs.
  • The controller owns status — users own spec. This separation is a core convention.

KEDA: The ScaledObject CRD

KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes autoscaling beyond CPU and memory. It can scale deployments based on queue depth, Kafka consumer lag, Prometheus metric values, and dozens of other event sources.

The core CRDs

kubectl get crds | grep keda
clustertriggerauthentications.keda.sh
scaledjobs.keda.sh
scaledobjects.keda.sh
triggerauthentications.keda.sh

A ScaledObject ties a Deployment to an external scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor        # the Deployment to scale
  minReplicaCount: 0             # scale to zero when idle
  maxReplicaCount: 50
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
        queueLength: "5"         # target: 5 messages per pod
        awsRegion: us-east-1
      authenticationRef:
        name: keda-sqs-auth      # TriggerAuthentication for AWS credentials

What KEDA does with this:

  1. KEDA controller sees the ScaledObject
  2. It creates a native HorizontalPodAutoscaler object targeting the order-processor Deployment
  3. KEDA’s metrics adapter polls the SQS queue depth and exposes it as a custom metric
  4. The HPA uses that metric to scale replicas — including to zero when the queue is empty
kubectl get scaledobject order-processor-scaler -n production
NAME                       SCALETARGETKIND      SCALETARGETNAME    MIN   MAX   TRIGGERS         READY   ACTIVE
order-processor-scaler     apps/Deployment      order-processor    0     50    aws-sqs-queue    True    True

What this teaches you about CRD design

  • spec.scaleTargetRef — targeting another object by name. The controller acts on that object, not on the CR itself.
  • spec.triggers — a list of trigger specifications. Lists of typed sub-objects are a recurring CRD pattern.
  • spec.minReplicaCount: 0 — expressing scale-to-zero as a first-class concept in the API. Built-in HPA does not support this; KEDA’s CRD extends the vocabulary of what is expressible.
  • The KEDA operator translates ScaledObject → native HPA. The CRD is an abstraction over a more complex Kubernetes object. This “translate and manage child resources” pattern is extremely common in operators.

External Secrets Operator: The ExternalSecret CRD

External Secrets Operator (ESO) solves a specific problem: secrets live in external systems (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager), but Kubernetes workloads need them as Kubernetes Secrets. ESO bridges the gap.

The core CRDs

kubectl get crds | grep external-secrets
clusterexternalsecrets.external-secrets.io
clustersecretstores.external-secrets.io
externalsecrets.external-secrets.io
secretstores.external-secrets.io

A SecretStore defines the backend connection:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: eso-sa            # uses IRSA/workload identity

An ExternalSecret defines what to pull and how to map it:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-creds
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: database-secret          # Kubernetes Secret to create/update
    creationPolicy: Owner
  data:
    - secretKey: username          # key in the K8s Secret
      remoteRef:
        key: prod/database         # path in AWS Secrets Manager
        property: username         # property within that secret
    - secretKey: password
      remoteRef:
        key: prod/database
        property: password

After ESO reconciles this:

kubectl get secret database-secret -n production -o jsonpath='{.data.username}' | base64 -d
# outputs: db_user
kubectl describe externalsecret database-creds -n production
Status:
  Conditions:
    Last Transition Time:   2026-04-10T08:00:00Z
    Message:                Secret was synced
    Reason:                 SecretSynced
    Status:                 True
    Type:                   Ready
  Refresh Time:             2026-04-10T09:00:00Z
  Synced Resource Version:  1-abc123

What this teaches you about CRD design

  • spec.secretStoreRef — referencing a configuration CRD (SecretStore) from an operational CRD (ExternalSecret). This layering of CRDs to separate concerns is a mature pattern.
  • spec.refreshInterval — the CR expresses a desired behavior (periodic sync), not just a desired state snapshot. CRDs can express temporal behaviors.
  • spec.target.creationPolicy: Owner — ESO will set an owner reference on the created Secret, so deleting the ExternalSecret cascades to deleting the Secret. This is how controllers manage lifecycle.
  • Sensitive values never appear in the CR — only paths and references. The controller handles the actual secret retrieval. This is a key security pattern in CRD design.

The Common Pattern Across All Three

  OPERATOR PATTERN (cert-manager / KEDA / ESO / every other operator)

  User applies CR
        │
        ▼
  Controller watches CRDs
  (informer cache, events queue)
        │
        ▼
  Controller reconciles:
  actual state ──→ compare ──→ desired state
        │              │
        │         (gap found)
        │              │
        ▼              ▼
  Takes action      Updates status
  (issue cert,      conditions in CR
   create HPA,
   sync Secret)
        │
        └──── loops back, watches for next change

The design contract:
Users write spec — what they want
Controllers read spec, write status — what actually happened
Status conditions are truthReady: True/False with reason and message tell operators what the controller knows

This pattern, explained in depth in EP06, is why CRDs and controllers are designed the way they are.


⚠ Common Mistakes

Installing CRDs without the controller. If you install cert-manager’s CRDs from the crds.yaml manifest without installing cert-manager itself, Certificate objects will be accepted by the API server but never reconciled. The Ready condition will never appear. Always install the operator alongside its CRDs.

Editing status fields directly. Many teams try kubectl patch or kubectl edit to update a custom resource’s status to work around a stuck controller. Most well-written controllers overwrite status every reconcile loop — your manual change will be wiped. Fix the underlying issue, not the status display.

Assuming CRD deletion is safe. Covered in EP01 but worth repeating: deleting a CRD cascades to deleting all instances. If you kubectl delete crd certificates.cert-manager.io, every Certificate object in every namespace is gone and cert-manager will stop issuing. Back up CRDs and their instances before any CRD deletion.


Quick Reference

# See all CRDs installed by cert-manager
kubectl get crds | grep cert-manager.io

# Get all Certificates across all namespaces
kubectl get certificates -A

# Watch cert-manager reconcile a new Certificate
kubectl get certificate api-tls -n production -w

# See all ScaledObjects and their current state
kubectl get scaledobjects -A

# Check ESO sync status for all ExternalSecrets
kubectl get externalsecrets -A

# Inspect what APIs a CRD exposes
kubectl api-resources | grep cert-manager

Key Takeaways

  • cert-manager, KEDA, and ESO are canonical examples of well-designed CRD-based operators
  • All three follow the same pattern: user writes spec, controller reconciles to actual state, status reflects outcome
  • spec expresses desired state declaratively; the controller figures out how to achieve it
  • Status conditions (type, status, reason, message) are the standard way to surface controller outcomes
  • Sensitive values never appear in the CR — controllers retrieve them from external systems using references and credentials

What’s Next

EP03: CRD Anatomy opens the YAML of a CRD itself — spec.versions, OpenAPI schema properties, scope, names, and subresources. You have seen CRDs from the outside; next we look at how they are structured on the inside.

Get EP03 in your inbox when it publishes → subscribe at linuxcent.com

What Is a Kubernetes CRD? How Custom Resources Extend the API

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 1
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • A Kubernetes CRD (Custom Resource Definition) is how you add new resource types to the Kubernetes API — the same way Deployment and Service exist natively, you can make BackupPolicy or Certificate exist too
    (CRD = the schema/blueprint; Custom Resource = an instance of that schema, just like a Pod is an instance of the Pod schema)
  • Every kubectl get crds on a real cluster shows dozens of them — cert-manager, KEDA, Prometheus Operator, Crossplane all ship their own CRDs
  • CRDs are served by the same API server as built-in resources — kubectl, RBAC, watches, and events all work identically
  • A CRD alone does nothing — a controller watches the custom resources and acts on them; together they form an Operator
  • CRDs live in etcd just like Pods and Deployments — they survive API server restarts and cluster upgrades
  • You do not need to modify Kubernetes source code or restart the API server to add a CRD

The Big Picture

  HOW KUBERNETES CRDs EXTEND THE API

  ┌──────────────────────────────────────────────────────────────┐
  │  Kubernetes API Server                                       │
  │                                                              │
  │  Built-in resources          Custom resources (via CRD)      │
  │  ─────────────────           ──────────────────────────      │
  │  Pod                         Certificate     (cert-manager)  │
  │  Deployment                  ScaledObject    (KEDA)          │
  │  Service                     ExternalSecret  (ESO)           │
  │  ConfigMap                   BackupPolicy    (your team)     │
  │  ...                         ...                             │
  │                                                              │
  │  All resources: same API, same kubectl, same RBAC, same etcd │
  └──────────────────────────────────────────────────────────────┘
            ▲                          ▲
            │ built in                 │ registered at runtime
            │                         │
         Kubernetes              CustomResourceDefinition
          binary                    (a YAML you apply)

What is a Kubernetes CRD? It is a resource that defines resources — a schema registration that teaches the API server about a new object type you want to use in your cluster.


What Problem CRDs Solve

Kubernetes ships with roughly 50 resource types: Pods, Deployments, Services, ConfigMaps, Secrets, PersistentVolumes, and so on. These cover the general-purpose building blocks for running containerized workloads.

But the moment you operate real infrastructure, you hit the edges. You want to express:

  • “This database should have three replicas with point-in-time recovery enabled” — not a Deployment
  • “This TLS certificate for api.example.com should renew 30 days before expiry” — not a Secret
  • “This queue consumer should scale to zero when the queue is empty” — not a HorizontalPodAutoscaler

Before CRDs (pre-2017), the only options were: use ConfigMaps as a poor substitute (no schema, no validation, no dedicated RBAC), or fork Kubernetes and add the resource natively (impractical for everyone outside the core team).

CRDs, introduced as stable in Kubernetes 1.16, solved this by letting you register a new resource type with the API server at runtime — without touching Kubernetes source code, without restarting the API server, without any special access beyond being able to create cluster-scoped resources.


The Kubernetes API: A Brief Mental Model

Before CRDs make sense, the API model needs to be clear.

  KUBERNETES API STRUCTURE

  apiVersion: apps/v1       ← API group (apps) + version (v1)
  kind: Deployment          ← resource type
  metadata:
    name: web               ← instance name
    namespace: default      ← namespace scope
  spec:
    replicas: 3             ← desired state

Every Kubernetes resource has:
– A group (e.g., apps, batch, networking.k8s.io) — or no group for core resources
– A version (e.g., v1, v1beta1)
– A kind (e.g., Deployment, Pod)
– A scope: namespaced or cluster-wide

The API server is a registry. Each group/version/kind combination maps to a Go struct that knows how to validate, store, and serve that resource type.

A CRD registers a new entry in that registry. You supply the group, version, kind, and schema. The API server handles everything else — serving it via REST, storing it in etcd, exposing it to kubectl.


What a CRD Looks Like

Here is the smallest possible CRD — it creates a new BackupPolicy resource type in the storage.example.com API group:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                schedule:
                  type: string
                retentionDays:
                  type: integer
  scope: Namespaced
  names:
    plural: backuppolicies
    singular: backuppolicy
    kind: BackupPolicy
    shortNames:
      - bp

Apply it:

kubectl apply -f backuppolicy-crd.yaml

Now create an instance:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: default
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
kubectl apply -f nightly-backup.yaml
kubectl get backuppolicies
kubectl get bp            # shortName works
kubectl describe bp nightly

The API server validates the spec against the schema, stores it in etcd, and returns it via all the standard API endpoints — all without a single line of custom code.


CRD vs Built-In Resource: What Is Different?

Not much, deliberately.

Capability Built-in resource Custom resource (CRD)
kubectl get / describe / delete Yes Yes
RBAC (Roles, ClusterRoles) Yes Yes
Watch (informers, events) Yes Yes
Stored in etcd Yes Yes
OpenAPI schema validation Yes Yes (you define the schema)
Admission webhooks Yes Yes
Status subresource Yes Optional (you enable it)
Scale subresource Yes Optional (you enable it)
Built-in controller behavior Yes No — you write the controller

The last row is the critical one. When you create a Deployment, the deployment controller immediately starts managing ReplicaSets. When you create a BackupPolicy, nothing happens — until you write and deploy a controller that watches BackupPolicy objects and acts on them.

That controller + the CRD is what people call an Operator.


A Real Cluster: What You Actually See

Run this on any cluster running cert-manager, Prometheus Operator, or any other tooling:

kubectl get crds

Sample output (abbreviated):

NAME                                                  CREATED AT
certificates.cert-manager.io                          2024-11-01T08:12:00Z
certificaterequests.cert-manager.io                   2024-11-01T08:12:00Z
issuers.cert-manager.io                               2024-11-01T08:12:00Z
clusterissuers.cert-manager.io                        2024-11-01T08:12:00Z
scaledobjects.keda.sh                                 2024-11-01T08:13:00Z
scaledjobs.keda.sh                                    2024-11-01T08:13:00Z
externalsecrets.external-secrets.io                   2024-11-01T08:14:00Z
prometheuses.monitoring.coreos.com                    2024-11-01T08:15:00Z
servicemonitors.monitoring.coreos.com                 2024-11-01T08:15:00Z

Every tool that ships as a CRD-based system registers its resource types here first. The count often surprises engineers: a production cluster with a typical toolchain easily has 40–80 CRDs.

Check how many are on your cluster:

kubectl get crds --no-headers | wc -l

How the API Server Handles a CRD

When you apply a CRD, the API server does three things:

  CRD REGISTRATION FLOW

  kubectl apply -f my-crd.yaml
          │
          ▼
  1. API server validates the CRD manifest
     (is the schema valid OpenAPI v3? are names correct?)
          │
          ▼
  2. CRD stored in etcd
     (under /registry/apiextensions.k8s.io/customresourcedefinitions/)
          │
          ▼
  3. New REST endpoints activated immediately:
     GET  /apis/storage.example.com/v1alpha1/namespaces/{ns}/backuppolicies
     POST /apis/storage.example.com/v1alpha1/namespaces/{ns}/backuppolicies
     ...

From this point, any kubectl get backuppolicies or API call to those endpoints is handled exactly like a built-in resource call — the API server serves it from etcd, applies RBAC, runs admission webhooks, and returns standard JSON.

No restart required. The new endpoints appear within seconds.


The Difference Between CRD and CR

Two terms that are easily confused:

  • CRD (CustomResourceDefinition) — the schema/blueprint. There is one CRD per resource type. certificates.cert-manager.io is a CRD.
  • CR (Custom Resource) — an instance of a CRD. Every Certificate object you create is a custom resource. You can have thousands of CRs per CRD.
  CRD (one)          →  Custom Resource (many)
  ─────────             ─────────────────────
  certificates          web-tls           (namespace: production)
  .cert-manager.io      api-tls           (namespace: production)
                        admin-tls         (namespace: staging)
                        ...

The CRD is applied once (usually by the tool’s Helm chart). Custom resources are created by your users, your CI pipeline, or your GitOps system throughout the life of the cluster.


Where CRDs Fit in the Kubernetes Extension Model

CRDs are one of three ways to extend Kubernetes:

  KUBERNETES EXTENSION MECHANISMS

  1. CRDs + Controllers (Operators)
     Add new resource types + behavior
     → cert-manager, KEDA, Argo CD, Crossplane
     Used for: domain-specific abstractions, infrastructure management

  2. Admission Webhooks
     Intercept API requests to validate or mutate objects
     → OPA/Gatekeeper, Kyverno, Istio injection
     Used for: policy enforcement, sidecar injection, defaulting

  3. API Aggregation (AA)
     Register a fully separate API server behind the main API server
     → metrics-server, custom autoscalers
     Used for: when you need non-CRUD semantics (e.g. exec, attach, streaming)

For 95% of use cases, CRDs + controllers are the right mechanism. API aggregation is complex and only warranted for non-standard API semantics. Admission webhooks are complementary to CRDs, not an alternative.


⚠ Common Mistakes

Confusing the CRD with the controller. The CRD is just a schema registration — it does not execute code. If you apply a CRD but do not deploy its controller, creating custom resources will succeed (the API server accepts them) but nothing will happen. This catches many people the first time they try to use cert-manager by only applying the CRDs without installing the cert-manager controller.

Assuming CRD deletion is safe. Deleting a CRD deletes all custom resources of that type from etcd. There is no “are you sure?” prompt. If you delete the certificates.cert-manager.io CRD, every Certificate object in every namespace is gone.

Treating CRDs as ConfigMap replacements. Some teams store configuration in CRDs purely to get schema validation. This works, but without a controller, the custom resources are inert data. If you only need configuration storage with validation, a CRD is viable — just be explicit that there is no reconciliation loop.


Quick Reference

# List all CRDs in the cluster
kubectl get crds

# Inspect a specific CRD's schema
kubectl get crd certificates.cert-manager.io -o yaml

# List all custom resources of a type
kubectl get certificates -A

# Get details on a specific custom resource
kubectl describe certificate web-tls -n production

# Delete a CRD (WARNING: deletes all instances)
kubectl delete crd backuppolicies.storage.example.com

# Check if a CRD is established (ready to use)
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.conditions[?(@.type=="Established")].status}'
# Returns: True

Key Takeaways

  • A Kubernetes CRD registers a new resource type with the API server — no source code changes, no restart required
  • Custom resources behave identically to built-in resources: kubectl, RBAC, watches, etcd, admission webhooks all work the same way
  • The CRD is just the schema; a controller gives custom resources behavior — together they form an Operator
  • Every production cluster running modern tooling already uses dozens of CRDs
  • Deleting a CRD deletes all its instances — treat CRDs as production-critical objects

What’s Next

EP02: CRDs You Already Use makes this concrete before we go deeper — we walk through cert-manager’s Certificate, KEDA’s ScaledObject, and External Secrets’ ExternalSecret as working examples, so you understand what a well-designed CRD looks like from a user’s perspective before you design your own.

Get EP02 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes Today: v1.33 to v1.35, In-Place Resize GA, and What Comes Next

Reading Time: 6 minutes


Introduction

Ten years after the first commit, Kubernetes is not exciting in the way it was in 2015. That’s a compliment. The system is stable. The APIs are mature. The migrations — dockershim, PSP, cloud provider code — are behind us.

What the 1.33–1.35 cycle shows is a project focused on precision: removing edge cases, promoting long-running alpha features to stable, and making the scheduler, storage, and security model more correct rather than more powerful. That’s what a mature infrastructure platform looks like.

Here’s what happened and where the project is headed.


Kubernetes 1.33 — Sidecar Resize, In-Place Resize Beta (April 2025)

Code name: Octarine

In-Place Pod Vertical Scaling reaches Beta

After landing as alpha in 1.27, in-place pod resource resizing became beta in 1.33 — enabled by default via the InPlacePodVerticalScaling feature gate.

The capability: change CPU and memory requests/limits on a running container without terminating and restarting the pod.

# Resize a running container's CPU limit without restart
kubectl patch pod api-pod-xyz --type='json' -p='[
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/requests/cpu",
    "value": "2"
  },
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/limits/cpu",
    "value": "4"
  }
]'

# Verify the resize was applied
kubectl get pod api-pod-xyz -o jsonpath='{.status.containerStatuses[0].resources}'

Why this matters operationally: Before in-place resize, vertical scaling meant terminating the pod, losing in-memory state, waiting for a new pod to become ready. For databases with warm buffer pools, JVM applications with loaded heap caches, or any workload where startup cost is significant, this was a serious limitation. Vertical Pod Autoscaler (VPA) worked around it by restarting pods — acceptable for stateless workloads, problematic for stateful ones.

In 1.33, resizing also works for sidecar containers, combining two 1.32-stable features.

Sidecar Containers — Full Maturity

The first feature to formally combine sidecar and in-place resize: you can now vertically scale a service mesh proxy (Envoy sidecar) without restarting the application pod. For high-traffic services where the proxy itself becomes the CPU bottleneck, this is directly actionable.


Gateway API v1.4 (October 2025)

Gateway API continued its rapid iteration with v1.4:

BackendTLSPolicy (Standard channel): Configure TLS between the gateway and the backend service — not just TLS termination at the gateway, but end-to-end encryption:

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTLSPolicy
metadata:
  name: api-backend-tls
spec:
  targetRefs:
  - group: ""
    kind: Service
    name: api-service
  validation:
    caCertificateRefs:
    - name: internal-ca
      group: ""
      kind: ConfigMap
    hostname: api.internal.corp

Gateway Client Certificate Validation: The gateway can now validate client certificates — mutual TLS for ingress traffic, not just between services.

TLSRoute to Standard: TLS routing (based on SNI, not HTTP host headers) graduated to the standard channel — enabling TCP workloads with TLS passthrough through the Gateway API model.

ListenerSet: Group multiple Gateway listeners — useful for shared infrastructure where multiple teams need to attach routes to the same gateway without managing separate Gateway resources.


Kubernetes 1.34 — Scheduler Improvements, DRA Continues (August 2025)

The 1.34 release focused on the scheduler and Dynamic Resource Allocation:

DRA structured parameters stabilization: The Dynamic Resource Allocation API matured its parameter model — resource drivers can expose structured claims that the scheduler understands, enabling topology-aware placement of GPU workloads:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: gpu-claim
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
      selectors:
      - cel:
          expression: device.attributes["nvidia.com/gpu-product"].string() == "A100-SXM4-80GB"
      count: 2

Scheduler QueueingHint stable: Plugins can now tell the scheduler when to re-queue a pod for scheduling — instead of the scheduler periodically retrying all unschedulable pods, plugins signal when relevant cluster state has changed. This significantly reduces scheduler CPU consumption in large clusters with many unschedulable pods.

Fine-grained node authorization improvements: Kubelets can now be restricted from accessing Service resources they don’t need — further reducing the blast radius of a compromised kubelet.


Kubernetes 1.35 — In-Place Resize GA, Memory Limits Unlocked (December 2025)

In-Place Pod Vertical Scaling Graduates to Stable

After landing in alpha (1.27), beta (1.33), in-place resize graduated to GA in 1.35. Two significant improvements accompanied GA:

Memory limit decreases now permitted: Previously, you could increase memory limits in-place but not decrease them. The restriction existed because the kernel doesn’t immediately reclaim memory when the limit is lowered — the OOM killer would need to run. 1.35 lifts this restriction with proper handling: the kernel is instructed to reclaim, and the pod status reflects the resize progress.

Pod-Level Resources (alpha in 1.35): Specify resource requests and limits at the pod level rather than per-container — with in-place resize support. Useful for init containers and sidecar patterns where total pod resources matter more than per-container allocation.

spec:
  # Pod-level resources (alpha) — total budget for all containers
  resources:
    requests:
      cpu: "4"
      memory: "8Gi"
  containers:
  - name: application
    image: myapp:latest
    # No per-container resources; pod-level applies
  - name: log-collector
    image: fluentbit:latest
    restartPolicy: Always  # sidecar

Other 1.35 Highlights

Topology Spread Constraints improvements: Better handling of unschedulable scenarios — whenUnsatisfiable: ScheduleAnyway now has smarter fallback behavior.

VolumeAttributesClass stable: Change storage performance characteristics (IOPS, throughput) of a PersistentVolume without re-provisioning — the storage equivalent of in-place pod resize.

# Change volume IOPS without re-provisioning
kubectl patch pvc database-pvc --type='merge' -p='
  {"spec": {"volumeAttributesClassName": "high-performance"}}'

Job success policy improvements: Declare a Job successful when a subset of pods complete successfully — for distributed training jobs where not all workers need to finish.


What’s in Kubernetes 1.36 (April 22, 2026)

Kubernetes 1.36 is on track for April 22, 2026 release. Based on the enhancement tracking and KEP (Kubernetes Enhancement Proposal) pipeline, expected highlights include:

  • DRA continuing toward stable
  • Pod-level resources moving to beta
  • Scheduler improvements for AI/ML workload placement
  • Further Gateway API integration as core networking model

The project has reached a rhythm: four releases per year, each focused on advancing a predictable set of features through alpha → beta → stable. The drama of the 2019–2022 period (PSP, dockershim, API removals) is behind it.


The State of the Ecosystem in 2026

Control Plane Deployment Models

Model Examples Best For
Managed (cloud provider) GKE, EKS, AKS Most organizations; no control plane ops
Self-managed kubeadm, k3s, Talos Air-gapped, on-prem, specific compliance requirements
Managed (platform) Rancher, OpenShift Enterprises that need multi-cluster management + vendor support

CNI Landscape

CNI Model Notable Feature
Cilium eBPF kube-proxy replacement, network policy at kernel, Hubble observability
Calico eBPF or iptables BGP-based networking, hybrid cloud routing
Flannel VXLAN/host-gw Simple, low overhead, no network policy
Weave Mesh overlay Easy multi-host setup

eBPF-based CNIs (Cilium, Calico in eBPF mode) are now the default recommendation for production clusters. The iptables era of Kubernetes networking is ending.

Security Stack in 2026

A hardened Kubernetes cluster in 2026 runs:

Cluster provisioning:    Cluster API + GitOps (Flux/ArgoCD)
Admission control:       Pod Security Admission (restricted) + Kyverno or OPA/Gatekeeper
Runtime security:        Falco (eBPF-based syscall monitoring)
Network security:        Cilium NetworkPolicy + Cilium Cluster Mesh for multi-cluster
Image security:          Cosign signing in CI + admission webhook for signature verification
Secret management:       External Secrets Operator → HashiCorp Vault or cloud KMS
Observability:           Prometheus + Grafana + Hubble (network flows) + OpenTelemetry

The Permanent Principles That Haven’t Changed

Looking across twelve years and 35 minor versions, some things have not changed:

The API as the universal interface: Everything in Kubernetes is a resource. This remains the most important architectural decision — it makes every tool, every controller, every GitOps system work with the same model.

Reconciliation loops: Every Kubernetes controller watches actual state and drives it toward desired state. The controller pattern from 2014 is unchanged. CRDs and Operators are just more instances of it.

Labels and selectors: The flexible grouping mechanism from 1.0 is still the primary way Kubernetes components find each other. Services find pods. HPA finds Deployments. Operators find their managed resources.

Declarative, not imperative: You describe what you want. Kubernetes figures out how to achieve and maintain it. This principle, inherited from Borg’s BCL configuration, underlies everything from Deployments to Crossplane’s cloud resource management.


What’s Coming: The Next Five Years

WebAssembly on Kubernetes: The Wasm ecosystem (wasmCloud, SpinKube) is building toward running WebAssembly workloads as first-class Kubernetes pods — near-native performance, smaller images, stronger isolation than containers. Still early, but gaining real adoption.

AI inference as infrastructure: LLM serving is becoming a cluster primitive. Tools like KServe and vLLM on Kubernetes are moving from research to production. The scheduler, resource model, and networking will continue adapting to inference workload patterns.

Confidential computing: AMD SEV, Intel TDX, and ARM CCA provide hardware-level memory encryption for pods. The RuntimeClass mechanism and ongoing kernel work are making confidential Kubernetes workloads operational rather than experimental.

Leaner distributions: k3s, k0s, Talos, and Flatcar-based minimal Kubernetes distributions are growing in adoption for edge, IoT, and resource-constrained environments. The pressure is toward smaller, more auditable control planes.


Key Takeaways

  • In-place pod vertical scaling went from alpha (1.27) to stable (1.35) — live CPU and memory resize without pod restart changes the economics of stateful workload management
  • Gateway API v1.4 completes the ingress replacement story: BackendTLSPolicy, client certificate validation, and TLSRoute in standard channel
  • VolumeAttributesClass stable (1.35): Change storage performance in-place — the storage parallel to pod resource resize
  • The eBPF era of Kubernetes networking is established: Cilium as default CNI in GKE, growing in EKS/AKS, replacing iptables-based kube-proxy
  • The Kubernetes project in 2026 is focused on precision — promoting mature features to stable, reducing edge cases, improving scheduler efficiency — not adding new abstractions
  • WebAssembly, confidential computing, and AI inference scheduling are the frontiers to watch

Series Wrap-Up

Era Defining Change
2003–2014 Borg and Omega build the playbook internally at Google
2014–2016 Kubernetes 1.0, CNCF, and winning the container orchestration wars
2016–2018 RBAC stable, CRDs, cloud providers all-in on managed K8s
2018–2020 Operators, service mesh, OPA/Gatekeeper — the extensibility era
2020–2022 Supply chain crisis, PSP deprecated, API removals, dockershim exit
2022–2023 Dockershim and PSP removed, eBPF networking takes over
2023–2025 GitOps standard, sidecar stable, DRA, AI/ML workloads
2025–2026 In-place resize GA, VolumeAttributesClass, Gateway API complete

From 47,501 lines of Go in a 250-file GitHub commit to the operating system of the cloud — and still reconciling.


← EP07: Platform Engineering Era

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com