Kubernetes CRDs in Production: Finalizers, Status Conditions, and RBAC Patterns

Reading Time: 8 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 10
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Finalizers block deletion until cleanup completes — they prevent orphaned external resources but cause stuck objects if the controller crashes mid-cleanup; always implement a removal timeout
  • Status conditions are the standard communication channel between controller and user: use type, status, reason, message, and observedGeneration on every condition; never invent ad-hoc status fields
  • Owner references wire automatic garbage collection — when the parent custom resource is deleted, Kubernetes deletes owned child objects; use them for every object your controller creates in the same namespace
  • RBAC for CRDs in multi-tenant clusters must include separate ClusterRoles for controller, editor, and viewer; grant status and finalizers as separate sub-resources; never give application teams cluster-scoped create/delete on CRDs
  • The three most common Kubernetes CRD production failure modes: finalizer death loop, status thrash, and CRD deletion cascade — all avoidable with the patterns in this episode
  • Running kubectl get crds on a healthy cluster should show Established: True for every CRD; non-Established CRDs silently reject all create requests

The Big Picture

  PRODUCTION CRD LIFECYCLE: FULL PICTURE

  Create         Reconcile        Suspend/Resume      Delete
  ──────         ─────────        ──────────────      ──────
  User applies   Controller       User patches         User deletes
  BackupPolicy   creates CronJob, spec.suspended=true  BackupPolicy
      │          sets status          │                    │
      ▼              │                ▼                    ▼
  Admission      │           Controller          Finalizer blocks
  webhook        │           suspends CronJob     deletion
  (if any)       │                               Controller:
      │          │                                 1. Delete CronJob
      ▼          ▼                                 2. Remove external state
  Schema       Status                              3. Remove finalizer
  validation   conditions                          Object deleted from etcd
      │        updated
      ▼
  Controller
  reconcile
  triggered

Kubernetes CRD production readiness is not just about making the happy path work — it is about designing for the failure modes: controllers crashing mid-operation, deletion races, and status messages that confuse operators at 2am.


Finalizers: Controlled Deletion

A finalizer is a string in metadata.finalizers. Kubernetes will not delete an object that has finalizers, regardless of who issues the delete command.

metadata:
  name: nightly
  namespace: demo
  finalizers:
    - storage.example.com/backup-cleanup  # ← your controller put this here

When kubectl delete bp nightly runs:

  1. API server sets metadata.deletionTimestamp  (does NOT delete yet)
  2. Object is visible as "Terminating"
  3. Controller sees deletionTimestamp set
  4. Controller runs cleanup:
       - delete backup data from S3
       - delete CronJob (or let owner references handle it)
       - release any external locks
  5. Controller removes the finalizer:
       patch bp nightly --type=json \
         -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
  6. API server sees finalizers list is now empty → deletes the object

Adding a finalizer in Go

const finalizerName = "storage.example.com/backup-cleanup"

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Deletion path
    if !bp.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(bp, finalizerName) {
            if err := r.cleanupExternalResources(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(bp, finalizerName)
            if err := r.Update(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }

    // Normal path: ensure finalizer is present
    if !controllerutil.ContainsFinalizer(bp, finalizerName) {
        controllerutil.AddFinalizer(bp, finalizerName)
        if err := r.Update(ctx, bp); err != nil {
            return ctrl.Result{}, err
        }
    }

    // ... rest of reconcile
}

Finalizer death loop and the timeout pattern

If cleanupExternalResources always returns an error (external system down, bug in cleanup code), the object gets stuck in Terminating forever. The operator cannot delete it; kubectl delete --force does not help with finalizers.

Prevention: add a cleanup deadline with status tracking.

func (r *BackupPolicyReconciler) cleanupExternalResources(ctx context.Context, bp *storagev1alpha1.BackupPolicy) error {
    // Check if we've been trying to clean up for too long
    if bp.DeletionTimestamp != nil {
        deadline := bp.DeletionTimestamp.Add(10 * time.Minute)
        if time.Now().After(deadline) {
            // Log the failure, abandon cleanup, let the object be deleted.
            log.FromContext(ctx).Error(nil, "cleanup deadline exceeded, removing finalizer anyway",
                "name", bp.Name)
            return nil   // returning nil removes the finalizer
        }
    }
    // ... actual cleanup
}

Recovery for a stuck object (use only when cleanup truly cannot succeed):

kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'

Status Conditions: The Right Way

The Kubernetes standard condition format is defined in k8s.io/apimachinery/pkg/apis/meta/v1.Condition:

type Condition struct {
    Type               string          // e.g. "Ready", "Synced", "Degraded"
    Status             ConditionStatus // "True", "False", "Unknown"
    ObservedGeneration int64           // the .metadata.generation this condition reflects
    LastTransitionTime metav1.Time     // when Status last changed
    Reason             string          // machine-readable, CamelCase, e.g. "CronJobCreated"
    Message            string          // human-readable, may contain details
}

Standard condition types

Type Meaning
Ready The resource is fully reconciled and operational
Synced The resource has been synced with an external system
Progressing An operation is actively in progress
Degraded The resource is operating in a reduced capacity

Use Ready: True only when the full reconcile is complete and the resource is functional. Use Ready: False with a clear Message when reconcile fails or is blocked.

Setting conditions in Go

meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionFalse,
    ObservedGeneration: bp.Generation,
    Reason:             "CronJobCreateFailed",
    Message:            fmt.Sprintf("failed to create CronJob: %v", err),
})

meta.SetStatusCondition handles deduplication — it updates an existing condition of the same Type rather than appending a duplicate.

observedGeneration is critical

metadata.generation      = 5   (increments on every spec change)
status.observedGeneration = 3  (set by controller on each reconcile)

If observedGeneration < generation:
  → controller has not yet reconciled the latest spec change
  → status.conditions reflect an older state
  → do NOT alert based on conditions that lag generation

Always set ObservedGeneration: bp.Generation when writing status conditions. Tooling (Argo CD, Flux, kubectl wait) depends on this to know whether status is current.

kubectl wait uses conditions

# Wait until BackupPolicy is Ready
kubectl wait bp/nightly -n demo \
  --for=condition=Ready \
  --timeout=60s

This works because kubectl wait reads the status.conditions array.


Owner References: Automatic Garbage Collection

Owner references wire a parent-child relationship between Kubernetes objects. When the parent is deleted, Kubernetes garbage-collects all owned children automatically.

metadata:
  name: nightly-backup       # CronJob
  ownerReferences:
    - apiVersion: storage.example.com/v1alpha1
      kind: BackupPolicy
      name: nightly
      uid: a1b2c3d4-...
      controller: true          # only one owner can be the controller
      blockOwnerDeletion: true  # the GC waits for this owner before deleting child

Set in Go using ctrl.SetControllerReference:

if err := ctrl.SetControllerReference(bp, cronJob, r.Scheme); err != nil {
    return ctrl.Result{}, err
}

Owner reference rules

  • Owner and owned object must be in the same namespace — cluster-scoped objects cannot own namespaced objects
  • Only one object can be the controller: true owner; others can be non-controller owners
  • Deleting the owner cascades to deleting owned objects — this is garbage collection, not finalizer-based cleanup

Without owner references, deleting a BackupPolicy leaves the CronJob as an orphan. This is hard to detect and accumulates over time.


RBAC Patterns for Multi-Tenant CRD Usage

A production CRD deployment needs three distinct RBAC roles:

# 1. Controller role — full access for the operator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 2. Editor role — for application teams (namespaced binding)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  # No status write — only the controller writes status
  # No finalizers write — prevents deletion blocking by non-controllers
---
# 3. Viewer role — for audit, monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]

Bind editor/viewer roles at namespace scope, not cluster scope:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-backup-editor
  namespace: team-alpha
subjects:
  - kind: Group
    name: team-alpha
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: backuppolicy-editor
  apiGroup: rbac.authorization.k8s.io

This pattern gives team-alpha full control over BackupPolicies in their namespace but no access to other namespaces — standard Kubernetes multi-tenancy.


The Three Production Failure Modes

1. Finalizer death loop

Symptoms: Object stuck in Terminating for hours; kubectl get bp nightly shows DeletionTimestamp set but object exists.

Cause: cleanupExternalResources always returns an error.

Detection:

kubectl get bp nightly -n demo -o jsonpath='{.metadata.deletionTimestamp}'
# non-empty = stuck in termination
kubectl describe bp nightly -n demo
# look for repeated reconcile error events

Fix: Add cleanup deadline in controller; use kubectl patch to remove finalizer as last resort.

2. Status thrash

Symptoms: Controller sets Ready: True, then Ready: False, then Ready: True in a rapid loop. Alert noise, confusing dashboards.

Cause: Each reconcile compares actual state incorrectly due to cache lag — it sees its own status write as a change, re-reconciles, and flips the status again.

Fix: Set ObservedGeneration on every condition. Compare generation with observedGeneration before re-reconciling. Use meta.IsStatusConditionTrue to check current condition before overwriting it with the same value.

// Only update status if it changed
current := meta.FindStatusCondition(bp.Status.Conditions, "Ready")
if current == nil || current.Status != desired.Status || current.Reason != desired.Reason {
    meta.SetStatusCondition(&bpCopy.Status.Conditions, desired)
    r.Status().Update(ctx, bpCopy)
}

3. CRD deletion cascade

Symptoms: A team deletes a CRD for cleanup purposes; all instances across all namespaces disappear silently.

Cause: kubectl delete crd backuppolicies.storage.example.com — the API server cascades the deletion to all custom resources of that type.

Prevention:
– Add a resourcelock annotation on production CRDs managed by your operator
– Use GitOps (Argo CD, Flux) to manage CRD installation — a deleted CRD is automatically re-applied from the Git source
– Back up CRDs and instances with velero or equivalent before any CRD management operations


Production Readiness Checklist

CRD DEFINITION
  □ spec.versions has exactly one storage: true version
  □ Status subresource enabled (subresources.status: {})
  □ additionalPrinterColumns includes Ready column from status.conditions
  □ OpenAPI schema defines required fields and types
  □ CEL rules cover cross-field constraints

CONTROLLER
  □ Owner references set on all child resources
  □ Finalizer logic includes cleanup deadline
  □ Status conditions use standard format with observedGeneration
  □ Reconcile function is idempotent
  □ Not-found errors handled cleanly (return nil, not error)
  □ At least 2 replicas with leader election enabled

RBAC
  □ Three ClusterRoles: controller, editor, viewer
  □ Status and finalizers are separate RBAC sub-resources
  □ Editor/viewer bound at namespace scope, not cluster scope
  □ Controller ServiceAccount has only necessary permissions

OPERATIONS
  □ CRD installed via GitOps or Helm (not manual kubectl apply)
  □ Backup of CRDs and instances included in cluster backup
  □ kubectl get crds shows Established: True for all CRDs
  □ Monitoring for stuck Terminating objects (finalizer deadlock)
  □ Alert on controller reconcile error rate, not just pod health

⚠ Common Mistakes

Granting update on backuppolicies but not backuppolicies/status to the controller. If the controller cannot write status, status updates silently fail. The controller appears to run but status conditions never update. Grant both backuppolicies (for spec/metadata writes) and backuppolicies/status (for the status subresource path).

Setting Ready: True before all owned resources are healthy. If the controller sets Ready: True after creating the CronJob but before verifying the CronJob is actually active, users see a false-positive health signal. Only set Ready: True when you have confirmed the desired state is actually achieved.

Not setting observedGeneration on status conditions. Tools like Argo CD and kubectl wait --for=condition=Ready will report incorrect health status if observedGeneration is stale. Always set ObservedGeneration: obj.Generation in every condition write.

Using kubectl delete crd in a production cluster without a backup. This is irreversible. Treat CRDs as production-critical infrastructure — require GitOps review, backup verification, and team approval before any CRD deletion.


Quick Reference

# Check for stuck Terminating objects
kubectl get backuppolicies -A --field-selector metadata.deletionTimestamp!=''

# Force-remove a stuck finalizer (use only when cleanup is truly impossible)
kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'

# Check all CRDs are Established
kubectl get crds -o jsonpath='{range .items[*]}{.metadata.name} {.status.conditions[?(@.type=="Established")].status}{"\n"}{end}'

# Watch status conditions update during reconcile
kubectl get bp nightly -n demo -w -o \
  jsonpath='{.status.conditions[?(@.type=="Ready")].status} {.status.conditions[?(@.type=="Ready")].message}{"\n"}'

# Verify owner references are set on child CronJob
kubectl get cronjob nightly-backup -n demo \
  -o jsonpath='{.metadata.ownerReferences}'

# List all objects owned by a BackupPolicy (by label)
kubectl get all -n demo -l backuppolicy=nightly

Key Takeaways

  • Finalizers block deletion until cleanup completes — always implement a cleanup deadline to prevent permanent stuck objects
  • Status conditions must use the standard format with observedGeneration — tooling depends on it for correctness
  • Owner references enable automatic garbage collection of child resources when the parent is deleted
  • RBAC needs three roles (controller, editor, viewer) with status and finalizers as separate sub-resources
  • The three production failure modes — finalizer death loop, status thrash, CRD deletion cascade — are all preventable with the patterns covered in this episode

Series Complete

You now have the full picture of Kubernetes CRDs and Operators: from understanding what a CRD is (EP01), through real examples (EP02), schema design (EP03), hands-on YAML (EP04), CEL validation (EP05), the controller loop (EP06), building an operator (EP07), versioning (EP08), admission webhooks (EP09), to production patterns in this episode.

The next series in the Kubernetes learning arc on linuxcent.com covers Kubernetes Networking Deep Dive — Services, Ingress, Gateway API, CNI, and eBPF networking. Subscribe below to get it when it launches.

Stay subscribed → linuxcent.com

Leave a Comment