Admission Webhooks: Validating and Mutating Requests Before They Reach etcd

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 9
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes admission webhooks are HTTPS endpoints called by the API server synchronously on every create/update/delete — before the object reaches etcd
    (two types: mutating webhooks modify the object; validating webhooks approve or reject it — mutating runs first, then validating)
  • Use a validating webhook when you need to reject objects based on state you cannot express in CEL: checking if a referenced Secret exists, enforcing cross-resource quotas, consulting an external policy engine
  • Use a mutating webhook when you need to inject defaults or sidecar containers that depend on context you cannot express in the CRD schema (environment-specific defaults, sidecar injection)
  • Admission webhooks are an availability dependency — if your webhook is unreachable, the API requests it covers will fail. failurePolicy: Ignore is the safety valve; use it only for non-critical webhooks
  • OPA/Gatekeeper and Kyverno are admission webhook platforms — they let you write policy as code (Rego, YAML) instead of writing Go webhook handlers
  • For CRD-specific validation that only depends on the object itself, prefer CEL (EP05) — webhooks are for rules that require external lookups or cross-resource checks

The Big Picture

  KUBERNETES ADMISSION CHAIN (full picture)

  kubectl apply -f backuppolicy.yaml
        │
        ▼
  API Server: authentication + authorization
        │
        ▼
  1. Mutating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return modified object │
     │ Examples: inject annotations,          │
     │ set defaults, add sidecars            │
     └───────────────────────────────────────┘
        │
        ▼
  2. Schema validation (OpenAPI + CEL)
        │
        ▼
  3. Validating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return allow/deny     │
     │ Examples: quota checks, cross-        │
     │ resource validation, policy engines   │
     └───────────────────────────────────────┘
        │
        ▼ (allowed)
  etcd storage

Kubernetes admission webhooks are how tools like Istio inject sidecars, Kyverno enforces policies, and OPA/Gatekeeper applies organizational guardrails — all without modifying Kubernetes source code. Understanding them completes the picture of how Kubernetes is extended beyond CRDs.


Validating vs Mutating: When to Use Each

  DECISION TREE: CEL vs Validating Webhook vs Mutating Webhook

  "I need to validate a field value"
      │
      ├── Depends only on the object being submitted?
      │   → Use CEL (x-kubernetes-validations) — EP05
      │
      └── Needs to look up another resource, quota, or external system?
          → Use Validating Admission Webhook

  "I need to set default values or inject content"
      │
      ├── Defaults depend only on other fields in the same object?
      │   → Use OpenAPI schema defaults or CEL
      │
      └── Defaults depend on environment, namespace labels, or external config?
          → Use Mutating Admission Webhook

Practical examples:

Rule Right tool
retentionDays must be ≤ 365 CEL
if storageClass=premium then retentionDays ≤ 90 CEL
Referenced SecretStore must exist in the same namespace Validating webhook
BackupPolicy count per namespace must not exceed team quota Validating webhook
Inject costCenter annotation from namespace labels Mutating webhook
Inject backup-agent sidecar into all Pods in labeled namespaces Mutating webhook
Enforce that all BackupPolicies have a team label Kyverno or OPA policy

The Webhook Request/Response Contract

Both webhook types receive an AdmissionReview object and return an AdmissionReview response.

Request (from API server to webhook):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "kind": {"group": "storage.example.com", "version": "v1alpha1", "kind": "BackupPolicy"},
    "resource": {"group": "storage.example.com", "version": "v1alpha1", "resource": "backuppolicies"},
    "operation": "CREATE",
    "userInfo": {"username": "alice", "groups": ["system:authenticated"]},
    "object": { /* full BackupPolicy JSON */ },
    "oldObject": null
  }
}

Response for a validating webhook (allow):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "allowed": true
  }
}

Response for a validating webhook (deny):

{
  "response": {
    "uid": "...",
    "allowed": false,
    "status": {
      "code": 422,
      "message": "referenced SecretStore 'aws-secrets-manager' not found in namespace 'production'"
    }
  }
}

Response for a mutating webhook (allow + patch):

{
  "response": {
    "uid": "...",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL21ldGFkYXRhL2Fubm90YXRpb25zL2Nvc3RDZW50ZXIiLCJ2YWx1ZSI6ImVuZ2luZWVyaW5nIn1d"
    // base64-encoded JSON patch:
    // [{"op":"add","path":"/metadata/annotations/costCenter","value":"engineering"}]
  }
}

Writing a Validating Webhook with kubebuilder

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --programmatic-validation

Edit api/v1alpha1/backuppolicy_webhook.go:

package v1alpha1

import (
    "context"
    "fmt"

    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/webhook/admission"
    esov1beta1 "github.com/external-secrets/external-secrets/apis/externalsecrets/v1beta1"
)

type BackupPolicyCustomValidator struct {
    Client client.Client
}

//+kubebuilder:webhook:path=/validate-storage-example-com-v1alpha1-backuppolicy,mutating=false,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create;update,versions=v1alpha1,name=vbackuppolicy.kb.io,admissionReviewVersions=v1

func (v *BackupPolicyCustomValidator) SetupWebhookWithManager(mgr ctrl.Manager) error {
    v.Client = mgr.GetClient()
    return ctrl.NewWebhookManagedBy(mgr).
        For(&BackupPolicy{}).
        WithValidator(v).
        Complete()
}

// ValidateCreate validates a new BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    bp := obj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateUpdate validates an updated BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
    bp := newObj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateDelete is a no-op here.
func (v *BackupPolicyCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    return nil, nil
}

// validateSecretStoreRef checks that the referenced SecretStore exists in the same namespace.
func (v *BackupPolicyCustomValidator) validateSecretStoreRef(ctx context.Context, bp *BackupPolicy) error {
    ref := bp.Spec.SecretStoreRef
    if ref == "" {
        return nil  // optional field; CEL handles it if required
    }

    store := &esov1beta1.SecretStore{}
    err := v.Client.Get(ctx, types.NamespacedName{Name: ref, Namespace: bp.Namespace}, store)
    if apierrors.IsNotFound(err) {
        return fmt.Errorf("referenced SecretStore %q not found in namespace %q",
            ref, bp.Namespace)
    }
    return err  // nil on found, real error on API failure
}

Writing a Mutating Webhook: Cost Center Injection

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --defaulting

Edit the defaulting webhook:

//+kubebuilder:webhook:path=/mutate-storage-example-com-v1alpha1-backuppolicy,mutating=true,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create,versions=v1alpha1,name=mbackuppolicy.kb.io,admissionReviewVersions=v1

func (r *BackupPolicy) Default() {
    // Default is called by kubebuilder's webhook framework on admission.
    // The webhook handler calls this and patches the object.
    //
    // This runs AFTER API server schema defaults — use it for context-dependent defaults.
}

// For namespace-label-based injection, implement the full webhook handler instead:
type BackupPolicyMutator struct {
    Client client.Client
}

func (m *BackupPolicyMutator) Handle(ctx context.Context, req admission.Request) admission.Response {
    bp := &BackupPolicy{}
    if err := json.Unmarshal(req.Object.Raw, bp); err != nil {
        return admission.Errored(http.StatusBadRequest, err)
    }

    // Fetch the namespace to read its labels
    ns := &corev1.Namespace{}
    if err := m.Client.Get(ctx, types.NamespacedName{Name: bp.Namespace}, ns); err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }

    // Inject costCenter annotation from namespace label
    if costCenter, ok := ns.Labels["billing/cost-center"]; ok {
        if bp.Annotations == nil {
            bp.Annotations = make(map[string]string)
        }
        bp.Annotations["billing/cost-center"] = costCenter
    }

    marshaled, err := json.Marshal(bp)
    if err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }
    return admission.PatchResponseFromRaw(req.Object.Raw, marshaled)
}

The WebhookConfiguration Resource

The ValidatingWebhookConfiguration tells the API server which webhooks exist and which resources/operations they handle:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: backup-operator-validating-webhook
  annotations:
    cert-manager.io/inject-ca-from: backup-operator-system/backup-operator-serving-cert
webhooks:
  - name: vbackuppolicy.kb.io
    admissionReviewVersions: ["v1"]
    clientConfig:
      service:
        name: backup-operator-webhook-service
        namespace: backup-operator-system
        path: /validate-storage-example-com-v1alpha1-backuppolicy
    rules:
      - apiGroups:   ["storage.example.com"]
        apiVersions: ["v1alpha1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["backuppolicies"]
    failurePolicy: Fail          # Fail = reject request if webhook unreachable
    sideEffects: None
    timeoutSeconds: 10
    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values: ["kube-system"]  # never webhook kube-system objects

failurePolicy: Fail vs Ignore

  failurePolicy: Fail (default)
  ──────────────────────────────
  If webhook is unreachable → API request fails with 500
  Use when: the validation is critical (quota enforcement, policy)
  Risk: your webhook becoming unavailable breaks all covered API operations

  failurePolicy: Ignore
  ──────────────────────────────
  If webhook is unreachable → API request proceeds as if webhook allowed it
  Use when: the webhook is advisory or can be bypassed safely
  Risk: policy is silently not enforced during webhook outage

For production operators, use failurePolicy: Fail but ensure high availability:
– Run at least 2 webhook pod replicas with PodDisruptionBudget
– Use cert-manager for automatic TLS certificate rotation
– Set timeoutSeconds to a value that allows graceful degradation (5–10s)
– Exclude system namespaces with namespaceSelector


OPA/Gatekeeper and Kyverno: Webhooks as Policy Platforms

Writing raw webhook handlers in Go is powerful but heavyweight for policy enforcement. OPA/Gatekeeper and Kyverno are webhook-based policy engines that let you express policies as code:

Kyverno (YAML-based policies):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-backup-label
spec:
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds: ["BackupPolicy"]
      validate:
        message: "BackupPolicy must have a 'team' label"
        pattern:
          metadata:
            labels:
              team: "?*"

OPA/Gatekeeper (Rego-based policies):

package backuppolicy

deny[msg] {
    input.request.kind.kind == "BackupPolicy"
    not input.request.object.metadata.labels["team"]
    msg := "BackupPolicy must have a 'team' label"
}

Both run as admission webhooks that the API server calls. The policy language sits on top of the webhook plumbing. For organizational policy enforcement across many resource types, these tools outperform custom Go webhook handlers.


⚠ Common Mistakes

Webhook covering * resources or * operations. A webhook covering all resources in the cluster is a reliability risk — a bug in the webhook or an outage breaks everything. Scope webhooks to exactly the resources and operations they need with rules[].resources and rules[].operations.

No TLS certificate rotation. Webhook endpoints require a TLS certificate that the API server trusts. Certificates expire. Using cert-manager with the cert-manager.io/inject-ca-from annotation automates this. Without it, expired certificates cause silent webhook outages (the API server rejects the TLS handshake, triggering failurePolicy behavior).

Not excluding system namespaces. If a validating webhook covers Pods and has failurePolicy: Fail, and the webhook pod itself crashes, the API server cannot create a new webhook pod because the webhook rejects the creation. Use namespaceSelector to exclude kube-system and your operator’s own namespace.

Treating webhook latency as free. Every API operation covered by a webhook adds a synchronous HTTP round-trip. On a busy cluster creating thousands of objects per minute, a 100ms webhook latency becomes significant. Set timeoutSeconds, profile webhook performance, and scope rules narrowly.


Quick Reference

# List all webhook configurations
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

# Inspect webhook rules and failure policy
kubectl describe validatingwebhookconfiguration backup-operator-validating-webhook

# Temporarily disable a webhook for debugging (dangerous in production)
kubectl delete validatingwebhookconfiguration backup-operator-validating-webhook

# Check webhook endpoint certificate
kubectl get secret backup-operator-webhook-server-cert \
  -n backup-operator-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# Test webhook is reachable from a cluster node
kubectl run webhook-test --image=curlimages/curl --rm -it --restart=Never -- \
  curl -k https://backup-operator-webhook-service.backup-operator-system.svc:443/healthz

Key Takeaways

  • Mutating webhooks modify objects at admission; validating webhooks approve or reject them — mutating runs before validating
  • Use CEL for rules that depend only on the submitted object; use webhooks when you need external lookups or cross-resource checks
  • failurePolicy: Fail blocks API requests if the webhook is unreachable — ensure high availability before using it
  • Always exclude system namespaces and scope rules to specific resource types to minimize the blast radius of webhook failures
  • OPA/Gatekeeper and Kyverno are admission webhook platforms for policy-as-code — prefer them over custom Go handlers for organizational policy enforcement

What’s Next

EP10: Kubernetes CRDs in Production ties the full series together — finalizer design patterns, status condition conventions, owner references, RBAC for multi-tenant CRD usage, and the production failure modes that catch teams off guard.

Get EP10 in your inbox when it publishes → subscribe at linuxcent.com