Kubernetes CRDs & Operators: Extending the API, Episode 10
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production
TL;DR
- Finalizers block deletion until cleanup completes — they prevent orphaned external resources but cause stuck objects if the controller crashes mid-cleanup; always implement a removal timeout
- Status conditions are the standard communication channel between controller and user: use
type,status,reason,message, andobservedGenerationon every condition; never invent ad-hoc status fields - Owner references wire automatic garbage collection — when the parent custom resource is deleted, Kubernetes deletes owned child objects; use them for every object your controller creates in the same namespace
- RBAC for CRDs in multi-tenant clusters must include separate ClusterRoles for controller, editor, and viewer; grant
statusandfinalizersas separate sub-resources; never give application teams cluster-scoped create/delete on CRDs - The three most common Kubernetes CRD production failure modes: finalizer death loop, status thrash, and CRD deletion cascade — all avoidable with the patterns in this episode
- Running
kubectl get crdson a healthy cluster should showEstablished: Truefor every CRD; non-Established CRDs silently reject all create requests
The Big Picture
PRODUCTION CRD LIFECYCLE: FULL PICTURE
Create Reconcile Suspend/Resume Delete
────── ───────── ────────────── ──────
User applies Controller User patches User deletes
BackupPolicy creates CronJob, spec.suspended=true BackupPolicy
│ sets status │ │
▼ │ ▼ ▼
Admission │ Controller Finalizer blocks
webhook │ suspends CronJob deletion
(if any) │ Controller:
│ │ 1. Delete CronJob
▼ ▼ 2. Remove external state
Schema Status 3. Remove finalizer
validation conditions Object deleted from etcd
│ updated
▼
Controller
reconcile
triggered
Kubernetes CRD production readiness is not just about making the happy path work — it is about designing for the failure modes: controllers crashing mid-operation, deletion races, and status messages that confuse operators at 2am.
Finalizers: Controlled Deletion
A finalizer is a string in metadata.finalizers. Kubernetes will not delete an object that has finalizers, regardless of who issues the delete command.
metadata:
name: nightly
namespace: demo
finalizers:
- storage.example.com/backup-cleanup # ← your controller put this here
When kubectl delete bp nightly runs:
1. API server sets metadata.deletionTimestamp (does NOT delete yet)
2. Object is visible as "Terminating"
3. Controller sees deletionTimestamp set
4. Controller runs cleanup:
- delete backup data from S3
- delete CronJob (or let owner references handle it)
- release any external locks
5. Controller removes the finalizer:
patch bp nightly --type=json \
-p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
6. API server sees finalizers list is now empty → deletes the object
Adding a finalizer in Go
const finalizerName = "storage.example.com/backup-cleanup"
func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
bp := &storagev1alpha1.BackupPolicy{}
if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Deletion path
if !bp.DeletionTimestamp.IsZero() {
if controllerutil.ContainsFinalizer(bp, finalizerName) {
if err := r.cleanupExternalResources(ctx, bp); err != nil {
return ctrl.Result{}, err
}
controllerutil.RemoveFinalizer(bp, finalizerName)
if err := r.Update(ctx, bp); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Normal path: ensure finalizer is present
if !controllerutil.ContainsFinalizer(bp, finalizerName) {
controllerutil.AddFinalizer(bp, finalizerName)
if err := r.Update(ctx, bp); err != nil {
return ctrl.Result{}, err
}
}
// ... rest of reconcile
}
Finalizer death loop and the timeout pattern
If cleanupExternalResources always returns an error (external system down, bug in cleanup code), the object gets stuck in Terminating forever. The operator cannot delete it; kubectl delete --force does not help with finalizers.
Prevention: add a cleanup deadline with status tracking.
func (r *BackupPolicyReconciler) cleanupExternalResources(ctx context.Context, bp *storagev1alpha1.BackupPolicy) error {
// Check if we've been trying to clean up for too long
if bp.DeletionTimestamp != nil {
deadline := bp.DeletionTimestamp.Add(10 * time.Minute)
if time.Now().After(deadline) {
// Log the failure, abandon cleanup, let the object be deleted.
log.FromContext(ctx).Error(nil, "cleanup deadline exceeded, removing finalizer anyway",
"name", bp.Name)
return nil // returning nil removes the finalizer
}
}
// ... actual cleanup
}
Recovery for a stuck object (use only when cleanup truly cannot succeed):
kubectl patch bp nightly -n demo --type=json \
-p '[{"op":"remove","path":"/metadata/finalizers"}]'
Status Conditions: The Right Way
The Kubernetes standard condition format is defined in k8s.io/apimachinery/pkg/apis/meta/v1.Condition:
type Condition struct {
Type string // e.g. "Ready", "Synced", "Degraded"
Status ConditionStatus // "True", "False", "Unknown"
ObservedGeneration int64 // the .metadata.generation this condition reflects
LastTransitionTime metav1.Time // when Status last changed
Reason string // machine-readable, CamelCase, e.g. "CronJobCreated"
Message string // human-readable, may contain details
}
Standard condition types
| Type | Meaning |
|---|---|
Ready |
The resource is fully reconciled and operational |
Synced |
The resource has been synced with an external system |
Progressing |
An operation is actively in progress |
Degraded |
The resource is operating in a reduced capacity |
Use Ready: True only when the full reconcile is complete and the resource is functional. Use Ready: False with a clear Message when reconcile fails or is blocked.
Setting conditions in Go
meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
ObservedGeneration: bp.Generation,
Reason: "CronJobCreateFailed",
Message: fmt.Sprintf("failed to create CronJob: %v", err),
})
meta.SetStatusCondition handles deduplication — it updates an existing condition of the same Type rather than appending a duplicate.
observedGeneration is critical
metadata.generation = 5 (increments on every spec change)
status.observedGeneration = 3 (set by controller on each reconcile)
If observedGeneration < generation:
→ controller has not yet reconciled the latest spec change
→ status.conditions reflect an older state
→ do NOT alert based on conditions that lag generation
Always set ObservedGeneration: bp.Generation when writing status conditions. Tooling (Argo CD, Flux, kubectl wait) depends on this to know whether status is current.
kubectl wait uses conditions
# Wait until BackupPolicy is Ready
kubectl wait bp/nightly -n demo \
--for=condition=Ready \
--timeout=60s
This works because kubectl wait reads the status.conditions array.
Owner References: Automatic Garbage Collection
Owner references wire a parent-child relationship between Kubernetes objects. When the parent is deleted, Kubernetes garbage-collects all owned children automatically.
metadata:
name: nightly-backup # CronJob
ownerReferences:
- apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
name: nightly
uid: a1b2c3d4-...
controller: true # only one owner can be the controller
blockOwnerDeletion: true # the GC waits for this owner before deleting child
Set in Go using ctrl.SetControllerReference:
if err := ctrl.SetControllerReference(bp, cronJob, r.Scheme); err != nil {
return ctrl.Result{}, err
}
Owner reference rules
- Owner and owned object must be in the same namespace — cluster-scoped objects cannot own namespaced objects
- Only one object can be the
controller: trueowner; others can be non-controller owners - Deleting the owner cascades to deleting owned objects — this is garbage collection, not finalizer-based cleanup
Without owner references, deleting a BackupPolicy leaves the CronJob as an orphan. This is hard to detect and accumulates over time.
RBAC Patterns for Multi-Tenant CRD Usage
A production CRD deployment needs three distinct RBAC roles:
# 1. Controller role — full access for the operator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: backuppolicy-controller
rules:
- apiGroups: ["storage.example.com"]
resources: ["backuppolicies"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["storage.example.com"]
resources: ["backuppolicies/status"]
verbs: ["get", "update", "patch"]
- apiGroups: ["storage.example.com"]
resources: ["backuppolicies/finalizers"]
verbs: ["update"]
- apiGroups: ["batch"]
resources: ["cronjobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 2. Editor role — for application teams (namespaced binding)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: backuppolicy-editor
rules:
- apiGroups: ["storage.example.com"]
resources: ["backuppolicies"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# No status write — only the controller writes status
# No finalizers write — prevents deletion blocking by non-controllers
---
# 3. Viewer role — for audit, monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: backuppolicy-viewer
rules:
- apiGroups: ["storage.example.com"]
resources: ["backuppolicies"]
verbs: ["get", "list", "watch"]
Bind editor/viewer roles at namespace scope, not cluster scope:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: team-alpha-backup-editor
namespace: team-alpha
subjects:
- kind: Group
name: team-alpha
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: backuppolicy-editor
apiGroup: rbac.authorization.k8s.io
This pattern gives team-alpha full control over BackupPolicies in their namespace but no access to other namespaces — standard Kubernetes multi-tenancy.
The Three Production Failure Modes
1. Finalizer death loop
Symptoms: Object stuck in Terminating for hours; kubectl get bp nightly shows DeletionTimestamp set but object exists.
Cause: cleanupExternalResources always returns an error.
Detection:
kubectl get bp nightly -n demo -o jsonpath='{.metadata.deletionTimestamp}'
# non-empty = stuck in termination
kubectl describe bp nightly -n demo
# look for repeated reconcile error events
Fix: Add cleanup deadline in controller; use kubectl patch to remove finalizer as last resort.
2. Status thrash
Symptoms: Controller sets Ready: True, then Ready: False, then Ready: True in a rapid loop. Alert noise, confusing dashboards.
Cause: Each reconcile compares actual state incorrectly due to cache lag — it sees its own status write as a change, re-reconciles, and flips the status again.
Fix: Set ObservedGeneration on every condition. Compare generation with observedGeneration before re-reconciling. Use meta.IsStatusConditionTrue to check current condition before overwriting it with the same value.
// Only update status if it changed
current := meta.FindStatusCondition(bp.Status.Conditions, "Ready")
if current == nil || current.Status != desired.Status || current.Reason != desired.Reason {
meta.SetStatusCondition(&bpCopy.Status.Conditions, desired)
r.Status().Update(ctx, bpCopy)
}
3. CRD deletion cascade
Symptoms: A team deletes a CRD for cleanup purposes; all instances across all namespaces disappear silently.
Cause: kubectl delete crd backuppolicies.storage.example.com — the API server cascades the deletion to all custom resources of that type.
Prevention:
– Add a resourcelock annotation on production CRDs managed by your operator
– Use GitOps (Argo CD, Flux) to manage CRD installation — a deleted CRD is automatically re-applied from the Git source
– Back up CRDs and instances with velero or equivalent before any CRD management operations
Production Readiness Checklist
CRD DEFINITION
□ spec.versions has exactly one storage: true version
□ Status subresource enabled (subresources.status: {})
□ additionalPrinterColumns includes Ready column from status.conditions
□ OpenAPI schema defines required fields and types
□ CEL rules cover cross-field constraints
CONTROLLER
□ Owner references set on all child resources
□ Finalizer logic includes cleanup deadline
□ Status conditions use standard format with observedGeneration
□ Reconcile function is idempotent
□ Not-found errors handled cleanly (return nil, not error)
□ At least 2 replicas with leader election enabled
RBAC
□ Three ClusterRoles: controller, editor, viewer
□ Status and finalizers are separate RBAC sub-resources
□ Editor/viewer bound at namespace scope, not cluster scope
□ Controller ServiceAccount has only necessary permissions
OPERATIONS
□ CRD installed via GitOps or Helm (not manual kubectl apply)
□ Backup of CRDs and instances included in cluster backup
□ kubectl get crds shows Established: True for all CRDs
□ Monitoring for stuck Terminating objects (finalizer deadlock)
□ Alert on controller reconcile error rate, not just pod health
⚠ Common Mistakes
Granting update on backuppolicies but not backuppolicies/status to the controller. If the controller cannot write status, status updates silently fail. The controller appears to run but status conditions never update. Grant both backuppolicies (for spec/metadata writes) and backuppolicies/status (for the status subresource path).
Setting Ready: True before all owned resources are healthy. If the controller sets Ready: True after creating the CronJob but before verifying the CronJob is actually active, users see a false-positive health signal. Only set Ready: True when you have confirmed the desired state is actually achieved.
Not setting observedGeneration on status conditions. Tools like Argo CD and kubectl wait --for=condition=Ready will report incorrect health status if observedGeneration is stale. Always set ObservedGeneration: obj.Generation in every condition write.
Using kubectl delete crd in a production cluster without a backup. This is irreversible. Treat CRDs as production-critical infrastructure — require GitOps review, backup verification, and team approval before any CRD deletion.
Quick Reference
# Check for stuck Terminating objects
kubectl get backuppolicies -A --field-selector metadata.deletionTimestamp!=''
# Force-remove a stuck finalizer (use only when cleanup is truly impossible)
kubectl patch bp nightly -n demo --type=json \
-p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
# Check all CRDs are Established
kubectl get crds -o jsonpath='{range .items[*]}{.metadata.name} {.status.conditions[?(@.type=="Established")].status}{"\n"}{end}'
# Watch status conditions update during reconcile
kubectl get bp nightly -n demo -w -o \
jsonpath='{.status.conditions[?(@.type=="Ready")].status} {.status.conditions[?(@.type=="Ready")].message}{"\n"}'
# Verify owner references are set on child CronJob
kubectl get cronjob nightly-backup -n demo \
-o jsonpath='{.metadata.ownerReferences}'
# List all objects owned by a BackupPolicy (by label)
kubectl get all -n demo -l backuppolicy=nightly
Key Takeaways
- Finalizers block deletion until cleanup completes — always implement a cleanup deadline to prevent permanent stuck objects
- Status conditions must use the standard format with
observedGeneration— tooling depends on it for correctness - Owner references enable automatic garbage collection of child resources when the parent is deleted
- RBAC needs three roles (controller, editor, viewer) with
statusandfinalizersas separate sub-resources - The three production failure modes — finalizer death loop, status thrash, CRD deletion cascade — are all preventable with the patterns covered in this episode
Series Complete
You now have the full picture of Kubernetes CRDs and Operators: from understanding what a CRD is (EP01), through real examples (EP02), schema design (EP03), hands-on YAML (EP04), CEL validation (EP05), the controller loop (EP06), building an operator (EP07), versioning (EP08), admission webhooks (EP09), to production patterns in this episode.
The next series in the Kubernetes learning arc on linuxcent.com covers Kubernetes Networking Deep Dive — Services, Ingress, Gateway API, CNI, and eBPF networking. Subscribe below to get it when it launches.
Stay subscribed → linuxcent.com