Build a Simple Kubernetes Operator with controller-runtime and kubebuilder

Reading Time: 7 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 7
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Building a Kubernetes operator means writing a Go reconciler with controller-runtime — kubebuilder scaffolds the project structure, RBAC markers, and Makefile targets so you focus on the reconcile logic
    (kubebuilder = a CLI and framework that generates the operator project scaffold; controller-runtime = the Go library that provides the informer cache, work queue, and reconciler interface)
  • The reconciler for BackupPolicy in this episode creates and manages a CronJob — it is the behavior layer for the CRD built in EP03–EP05
  • RBAC is expressed as Go code comments (//+kubebuilder:rbac:...) — kubebuilder generates the ClusterRole YAML from them
  • Run the operator locally with make run during development; no cluster deployment needed until ready
  • The same project that builds the operator also builds and installs the CRD — make install applies the CRD YAML generated from your Go types
  • Testing: the operator ships with envtest — a local API server + etcd for controller testing without a real cluster

The Big Picture

  OPERATOR PROJECT STRUCTURE (kubebuilder scaffold)

  backup-operator/
  ├── api/v1alpha1/
  │   ├── backuppolicy_types.go     ← Go types that define CRD schema
  │   └── groupversion_info.go
  ├── internal/controller/
  │   └── backuppolicy_controller.go ← reconcile logic (our main focus)
  ├── config/
  │   ├── crd/                       ← generated CRD YAML
  │   ├── rbac/                      ← generated RBAC YAML
  │   └── manager/                   ← controller Deployment YAML
  ├── cmd/main.go                    ← entrypoint, sets up the manager
  └── Makefile                       ← build, test, install, deploy targets

  FLOW:
  Go types → kubebuilder generate → CRD YAML + RBAC YAML
  Reconcile function → runs in cluster → watches BackupPolicy → manages CronJobs

Building a Kubernetes operator with controller-runtime is where CRDs become living infrastructure — the BackupPolicy objects created in EP04 now get actual behavior attached to them.


Prerequisites

# Go 1.22+
go version

# kubebuilder CLI
curl -L -o kubebuilder \
  https://github.com/kubernetes-sigs/kubebuilder/releases/latest/download/kubebuilder_linux_amd64
chmod +x kubebuilder
sudo mv kubebuilder /usr/local/bin/

# A running cluster (kind works well for development)
kind create cluster --name operator-dev

# Verify kubectl works
kubectl cluster-info --context kind-operator-dev

Step 1: Scaffold the Project

mkdir backup-operator && cd backup-operator

# Initialize the Go module and project structure
kubebuilder init \
  --domain storage.example.com \
  --repo github.com/example/backup-operator

# Create the API (Go types + controller scaffold)
kubebuilder create api \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --resource \
  --controller

When prompted:

Create Resource [y/n]: y
Create Controller [y/n]: y

The generated directory tree:

backup-operator/
├── api/
│   └── v1alpha1/
│       ├── backuppolicy_types.go
│       └── groupversion_info.go
├── internal/
│   └── controller/
│       └── backuppolicy_controller.go
├── cmd/
│   └── main.go
├── config/
│   ├── crd/bases/
│   ├── rbac/
│   └── manager/
├── go.mod
├── go.sum
└── Makefile

Step 2: Define the Go Types

Edit api/v1alpha1/backuppolicy_types.go to match the schema from EP03:

package v1alpha1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BackupTarget specifies a namespace to include in the backup.
type BackupTarget struct {
    Namespace      string `json:"namespace"`
    IncludeSecrets bool   `json:"includeSecrets,omitempty"`
}

// BackupPolicySpec defines the desired state of BackupPolicy.
type BackupPolicySpec struct {
    // Schedule is a cron expression for when to run backups.
    // +kubebuilder:validation:Pattern=`^(\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+) (\*|[0-9,\-\/]+)$`
    Schedule string `json:"schedule"`

    // RetentionDays is how long to keep backup snapshots.
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=365
    RetentionDays int32 `json:"retentionDays"`

    // StorageClass is the storage class to use for backup volumes.
    // +kubebuilder:default=standard
    // +kubebuilder:validation:Enum=standard;premium;encrypted;archive
    StorageClass string `json:"storageClass,omitempty"`

    // Targets lists the namespaces and resources to include.
    // +kubebuilder:validation:MaxItems=20
    Targets []BackupTarget `json:"targets,omitempty"`

    // Suspended pauses backup execution when true.
    // +kubebuilder:default=false
    Suspended bool `json:"suspended,omitempty"`
}

// BackupPolicyStatus defines the observed state of BackupPolicy.
type BackupPolicyStatus struct {
    // Conditions reflect the current state of the BackupPolicy.
    Conditions []metav1.Condition `json:"conditions,omitempty"`

    // LastBackupTime is when the most recent backup completed.
    LastBackupTime *metav1.Time `json:"lastBackupTime,omitempty"`

    // CronJobName is the name of the managed CronJob.
    CronJobName string `json:"cronJobName,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Schedule",type=string,JSONPath=`.spec.schedule`
// +kubebuilder:printcolumn:name="Retention",type=integer,JSONPath=`.spec.retentionDays`
// +kubebuilder:printcolumn:name="Suspended",type=boolean,JSONPath=`.spec.suspended`
// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=='Ready')].status`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// BackupPolicy is the Schema for the backuppolicies API.
type BackupPolicy struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   BackupPolicySpec   `json:"spec,omitempty"`
    Status BackupPolicyStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// BackupPolicyList contains a list of BackupPolicy.
type BackupPolicyList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []BackupPolicy `json:"items"`
}

func init() {
    SchemeBuilder.Register(&BackupPolicy{}, &BackupPolicyList{})
}

Regenerate the CRD YAML and DeepCopy methods:

make generate   # regenerates zz_generated.deepcopy.go
make manifests  # regenerates CRD YAML under config/crd/bases/

Step 3: Write the Reconciler

Edit internal/controller/backuppolicy_controller.go:

package controller

import (
    "context"
    "fmt"

    batchv1 "k8s.io/api/batch/v1"
    corev1 "k8s.io/api/core/v1"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/meta"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    storagev1alpha1 "github.com/example/backup-operator/api/v1alpha1"
)

// BackupPolicyReconciler reconciles BackupPolicy objects.
type BackupPolicyReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// RBAC markers — kubebuilder generates ClusterRole YAML from these comments.
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/finalizers,verbs=update
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Step 1: Fetch the BackupPolicy
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        if apierrors.IsNotFound(err) {
            // Object deleted before we could reconcile — nothing to do.
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, fmt.Errorf("fetching BackupPolicy: %w", err)
    }

    // Step 2: Define the desired CronJob name
    cronJobName := fmt.Sprintf("%s-backup", bp.Name)

    // Step 3: Fetch the existing CronJob (if any)
    existing := &batchv1.CronJob{}
    err := r.Get(ctx, types.NamespacedName{Name: cronJobName, Namespace: bp.Namespace}, existing)
    notFound := apierrors.IsNotFound(err)
    if err != nil && !notFound {
        return ctrl.Result{}, fmt.Errorf("fetching CronJob: %w", err)
    }

    // Step 4: Build the desired CronJob
    desired := r.buildCronJob(bp, cronJobName)

    // Step 5: Create or update
    if notFound {
        logger.Info("Creating CronJob", "name", cronJobName)
        if err := r.Create(ctx, desired); err != nil {
            return ctrl.Result{}, fmt.Errorf("creating CronJob: %w", err)
        }
    } else {
        // Update schedule and suspend state if they differ
        if existing.Spec.Schedule != desired.Spec.Schedule ||
            existing.Spec.Suspend != desired.Spec.Suspend {
            existing.Spec.Schedule = desired.Spec.Schedule
            existing.Spec.Suspend = desired.Spec.Suspend
            logger.Info("Updating CronJob", "name", cronJobName)
            if err := r.Update(ctx, existing); err != nil {
                return ctrl.Result{}, fmt.Errorf("updating CronJob: %w", err)
            }
        }
    }

    // Step 6: Update status
    bpCopy := bp.DeepCopy()
    meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
        Type:               "Ready",
        Status:             metav1.ConditionTrue,
        Reason:             "CronJobReady",
        Message:            fmt.Sprintf("CronJob %s is configured", cronJobName),
        ObservedGeneration: bp.Generation,
    })
    bpCopy.Status.CronJobName = cronJobName

    if err := r.Status().Update(ctx, bpCopy); err != nil {
        return ctrl.Result{}, fmt.Errorf("updating status: %w", err)
    }

    return ctrl.Result{}, nil
}

func (r *BackupPolicyReconciler) buildCronJob(bp *storagev1alpha1.BackupPolicy, name string) *batchv1.CronJob {
    suspend := bp.Spec.Suspended
    retentionArg := fmt.Sprintf("--retention-days=%d", bp.Spec.RetentionDays)

    cj := &batchv1.CronJob{
        ObjectMeta: metav1.ObjectMeta{
            Name:      name,
            Namespace: bp.Namespace,
            Labels: map[string]string{
                "app.kubernetes.io/managed-by": "backup-operator",
                "backuppolicy":                 bp.Name,
            },
        },
        Spec: batchv1.CronJobSpec{
            Schedule: bp.Spec.Schedule,
            Suspend:  &suspend,
            JobTemplate: batchv1.JobTemplateSpec{
                Spec: batchv1.JobSpec{
                    Template: corev1.PodTemplateSpec{
                        Spec: corev1.PodSpec{
                            RestartPolicy: corev1.RestartPolicyOnFailure,
                            Containers: []corev1.Container{
                                {
                                    Name:    "backup",
                                    Image:   "backup-tool:latest",
                                    Args:    []string{retentionArg},
                                },
                            },
                        },
                    },
                },
            },
        },
    }

    // Set owner reference — CronJob is garbage-collected when BackupPolicy is deleted
    _ = ctrl.SetControllerReference(bp, cj, r.Scheme)
    return cj
}

// SetupWithManager registers the controller with the manager and declares what to watch.
func (r *BackupPolicyReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&storagev1alpha1.BackupPolicy{}).
        Owns(&batchv1.CronJob{}).    // reconcile BackupPolicy when owned CronJob changes
        Complete(r)
}

Step 4: Install the CRD and Run Locally

# Install the CRD into the cluster
make install
customresourcedefinition.apiextensions.k8s.io/backuppolicies.storage.example.com created
# Run the controller locally (outside the cluster)
make run
2026-04-25T08:00:00Z  INFO  Starting manager
2026-04-25T08:00:00Z  INFO  Starting workers  {"controller": "backuppolicy", "worker count": 1}

In a separate terminal:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: default
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
EOF

Watch the controller output:

2026-04-25T08:01:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

Check the result:

kubectl get bp nightly
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       True    10s
kubectl get cronjob nightly-backup
NAME             SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
nightly-backup   0 2 * * *   False     0        <none>          10s

Test self-healing — delete the CronJob and watch the controller recreate it:

kubectl delete cronjob nightly-backup
# Controller output:
# 2026-04-25T08:02:00Z  INFO  Creating CronJob  {"name": "nightly-backup"}

kubectl get cronjob nightly-backup
# Back within seconds

Test suspend:

kubectl patch bp nightly --type=merge -p '{"spec":{"suspended":true}}'
kubectl get cronjob nightly-backup -o jsonpath='{.spec.suspend}'
# true

Step 5: Deploy to Cluster

When ready for in-cluster deployment:

# Build and push the controller image
make docker-build docker-push IMG=your-registry/backup-operator:v0.1.0

# Deploy to cluster (creates Deployment, RBAC, CRD)
make deploy IMG=your-registry/backup-operator:v0.1.0
kubectl get pods -n backup-operator-system
NAME                                          READY   STATUS    RESTARTS   AGE
backup-operator-controller-manager-abc123     2/2     Running   0          30s

Understanding the RBAC Markers

The //+kubebuilder:rbac:... comments in the controller generate the ClusterRole YAML when you run make manifests:

//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=storage.example.com,resources=backuppolicies/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete

Generated YAML under config/rbac/role.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: manager-role
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

This approach keeps RBAC co-located with the code that needs it — if you add a new resource access in the controller, you add the marker next to it.


⚠ Common Mistakes

Not setting an owner reference on child resources. Without ctrl.SetControllerReference(parent, child, scheme), deleting the BackupPolicy leaves orphaned CronJobs. Owner references enable automatic garbage collection of child resources.

Updating the object after r.Get() without handling conflicts. If two reconciles run concurrently (possible after a controller restart), both may try to update the same resource. The API server uses resource version for optimistic concurrency — you will get a conflict error. Retry the reconcile on conflict errors rather than failing.

Writing to bp directly instead of bp.DeepCopy() for status updates. If the status update fails and you retry, the original bp object now has the modified status in memory. Always update a deep copy when writing status so the in-memory state stays consistent with what was actually persisted.

Not watching owned resources. If you forget .Owns(&batchv1.CronJob{}) in SetupWithManager, the controller will not reconcile when a CronJob is deleted. Self-healing requires watching the resources you manage.


Quick Reference

# Scaffold a new API + controller
kubebuilder create api --group mygroup --version v1alpha1 --kind MyKind

# Regenerate deep copy methods after changing types
make generate

# Regenerate CRD YAML + RBAC from markers
make manifests

# Install CRD into current cluster
make install

# Run controller locally (outside cluster)
make run

# Build + push image, then deploy to cluster
make docker-build docker-push IMG=registry/operator:tag
make deploy IMG=registry/operator:tag

# Uninstall CRD (WARNING: deletes all instances)
make uninstall

Key Takeaways

  • kubebuilder scaffolds the project; you write the types and the reconcile function
  • Go struct markers (//+kubebuilder:...) generate the CRD YAML and RBAC — keep them close to the code they describe
  • ctrl.SetControllerReference enables automatic garbage collection of child resources
  • Always deep-copy the object before writing status; retry on conflict errors
  • make run runs the controller locally — no Docker build needed during development

What’s Next

EP08: Kubernetes CRD Versioning covers how to evolve the BackupPolicy schema from v1alpha1 to v1 without breaking existing clients — storage versions, conversion webhooks, and the hub-and-spoke model for safe API evolution in production clusters.

Get EP08 in your inbox when it publishes → subscribe at linuxcent.com

Write Your First Kubernetes CRD: A Hands-On YAML Walkthrough

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 4
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Writing a Kubernetes CRD requires five YAML files: the CRD itself, a ClusterRole/ClusterRoleBinding, a namespaced Role/RoleBinding for consumers, and a sample custom resource
  • The BackupPolicy CRD built in this episode is the running example throughout the rest of the series — operators, versioning, and production patterns all use it
  • Apply the CRD, verify it with kubectl get crds, create a custom resource, and watch the API server validate your spec
  • RBAC for CRDs follows the same Role/ClusterRole model as built-in resources — the generated resource name is {plural}.{group}
  • Schema validation fires at apply time: bad field types, missing required fields, and out-of-range values all return clear errors before anything reaches etcd
  • Without a controller, a BackupPolicy is stored in etcd but nothing acts on it — that is the topic of EP05 and EP07

The Big Picture

  WHAT WE'RE BUILDING IN THIS EPISODE

  1. backuppolicies-crd.yaml        ← registers the BackupPolicy type
  2. backuppolicies-rbac.yaml       ← controls who can create/view/delete
  3. nightly-backup.yaml            ← our first custom resource instance

  After applying:

  kubectl get crds | grep backup      ← BackupPolicy type exists
  kubectl get backuppolicies -n demo  ← nightly instance exists
  kubectl describe bp nightly -n demo ← spec visible, status empty
  kubectl apply -f bad-backup.yaml    ← schema validation rejects bad data

Writing your first Kubernetes CRD is the step that bridges understanding CRDs conceptually to operating them in a real cluster. This episode is hands-on — every block of YAML is something you apply and verify.


Prerequisites

You need a running Kubernetes cluster and kubectl configured. Any of these work:

# Local options
kind create cluster --name crd-demo
# or
minikube start

# Verify cluster access
kubectl cluster-info
kubectl get nodes

Step 1: Write the CRD

Save this as backuppolicies-crd.yaml:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  scope: Namespaced
  names:
    plural:     backuppolicies
    singular:   backuppolicy
    kind:       BackupPolicy
    shortNames:
      - bp
    categories:
      - storage
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required: ["spec"]
          properties:
            spec:
              type: object
              required: ["schedule", "retentionDays"]
              properties:
                schedule:
                  type: string
                  description: "Cron expression (e.g. '0 2 * * *' for 02:00 daily)"
                retentionDays:
                  type: integer
                  minimum: 1
                  maximum: 365
                  description: "How many days to retain backup snapshots"
                storageClass:
                  type: string
                  default: "standard"
                  description: "StorageClass to use for backup volumes"
                targets:
                  type: array
                  description: "Namespaces and resources to include in the backup"
                  maxItems: 20
                  items:
                    type: object
                    required: ["namespace"]
                    properties:
                      namespace:
                        type: string
                      includeSecrets:
                        type: boolean
                        default: false
                suspended:
                  type: boolean
                  default: false
                  description: "Set to true to pause backup execution"
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}
      additionalPrinterColumns:
        - name: Schedule
          type: string
          jsonPath: .spec.schedule
        - name: Retention
          type: integer
          jsonPath: .spec.retentionDays
        - name: Suspended
          type: boolean
          jsonPath: .spec.suspended
        - name: Ready
          type: string
          jsonPath: .status.conditions[?(@.type=='Ready')].status
        - name: Age
          type: date
          jsonPath: .metadata.creationTimestamp

Apply it:

kubectl apply -f backuppolicies-crd.yaml

Verify it registered correctly:

kubectl get crds backuppolicies.storage.example.com
NAME                                    CREATED AT
backuppolicies.storage.example.com      2026-04-25T08:00:00Z

Check the API server now knows about it:

kubectl api-resources | grep backuppolic
backuppolicies    bp    storage.example.com/v1alpha1    true    BackupPolicy

Check it is Established:

kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.conditions[?(@.type=="Established")].status}'
True

If you see False or empty output, wait a few seconds and retry — the API server takes a moment to register new CRDs.


Step 2: Write RBAC

CRDs follow the same RBAC model as built-in resources. The resource name is {plural}.{group}.

Save this as backuppolicies-rbac.yaml:

# ClusterRole for operators/controllers that manage BackupPolicy objects
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
---
# Role for application teams to manage BackupPolicies in their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# Read-only role for auditors
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]
kubectl apply -f backuppolicies-rbac.yaml

Verify the roles exist:

kubectl get clusterrole | grep backuppolicy
backuppolicy-controller   2026-04-25T08:01:00Z
backuppolicy-editor       2026-04-25T08:01:00Z
backuppolicy-viewer       2026-04-25T08:01:00Z

Note on backuppolicies/status: The separate status RBAC rule is only meaningful if you enabled the status subresource (we did). Without it, status and spec share the same update path.


Step 3: Create a Namespace and Your First Custom Resource

kubectl create namespace demo

Save this as nightly-backup.yaml:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: demo
  labels:
    app.kubernetes.io/managed-by: manual
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
  storageClass: standard
  targets:
    - namespace: production
      includeSecrets: false
    - namespace: staging
      includeSecrets: false
  suspended: false

Apply it:

kubectl apply -f nightly-backup.yaml

Get it back:

kubectl get backuppolicies -n demo
NAME      SCHEDULE    RETENTION   SUSPENDED   READY   AGE
nightly   0 2 * * *   30          false       <none>  5s

The Ready column is <none> because there is no controller writing status yet. The custom resource exists and is stored in etcd, but nothing is acting on it.

Describe it:

kubectl describe bp nightly -n demo
Name:         nightly
Namespace:    demo
Labels:       app.kubernetes.io/managed-by=manual
Annotations:  <none>
API Version:  storage.example.com/v1alpha1
Kind:         BackupPolicy
Metadata:
  Creation Timestamp:  2026-04-25T08:05:00Z
  ...
Spec:
  Retention Days:  30
  Schedule:        0 2 * * *
  Storage Class:   standard
  Suspended:       false
  Targets:
    Include Secrets:  false
    Namespace:        production
    Include Secrets:  false
    Namespace:        staging
Status:
Events:  <none>

Step 4: Test Schema Validation

The API server now validates every BackupPolicy against the schema. Try creating an invalid one:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad-policy
  namespace: demo
spec:
  schedule: "not-a-cron"
  retentionDays: 500
EOF
The BackupPolicy "bad-policy" is invalid:
  spec.retentionDays: Invalid value: 500:
    spec.retentionDays in body should be less than or equal to 365

Missing required field:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: missing-schedule
  namespace: demo
spec:
  retentionDays: 7
EOF
The BackupPolicy "missing-schedule" is invalid:
  spec.schedule: Required value

Wrong type:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: wrong-type
  namespace: demo
spec:
  schedule: "0 2 * * *"
  retentionDays: "thirty"
EOF
The BackupPolicy "wrong-type" is invalid:
  spec.retentionDays: Invalid value: "string":
    spec.retentionDays in body must be of type integer: "string"

All validation fires at the API boundary — before etcd, before any controller sees the object.


Step 5: Verify Default Values Apply

The schema defines storageClass: default: "standard" and suspended: default: false. Verify they are applied even when not specified:

kubectl apply -f - <<'EOF'
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: minimal
  namespace: demo
spec:
  schedule: "0 0 * * 0"
  retentionDays: 7
EOF

kubectl get bp minimal -n demo -o jsonpath='{.spec.storageClass}'
standard
kubectl get bp minimal -n demo -o jsonpath='{.spec.suspended}'
false

Defaults are injected by the API server at admission time. They appear in etcd and in every kubectl get -o yaml output — the stored object includes the defaults even if the user did not specify them.


Step 6: Explore the API Endpoints

Your custom resource is now available at standard REST endpoints:

kubectl proxy --port=8001 &

# List all BackupPolicies in the demo namespace
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies \
  | jq '.items[].metadata.name'
"nightly"
"minimal"
# Get a specific BackupPolicy
curl -s http://localhost:8001/apis/storage.example.com/v1alpha1/namespaces/demo/backuppolicies/nightly \
  | jq '.spec'

This is how controllers discover and watch custom resources — via the same API server endpoints, using informers that wrap these REST calls with efficient list-and-watch semantics.


Step 7: Clean Up

kubectl delete namespace demo
kubectl delete -f backuppolicies-rbac.yaml
kubectl delete -f backuppolicies-crd.yaml   # WARNING: deletes all BackupPolicy instances first

⚠ Common Mistakes

metadata.name does not match {plural}.{group}. The most common error. If you name the CRD backuppolicy.storage.example.com (singular) but the spec says plural: backuppolicies, the API server rejects it. The name must always be {plural}.{group}.

No required fields on spec. Without required constraints, kubectl apply accepts an empty spec: {}. The controller then receives objects with no configuration and has to handle the nil case. Define required fields in the schema.

Forgetting subresources: status: {}. Without this, controllers writing .status also overwrite .spec on full PUT updates. This causes status updates to reset user edits. Enable the status subresource from day one.

Not testing validation errors. Schema validation is the first line of defense. Always explicitly test that your required fields are required, types are enforced, and range constraints work — before deploying the controller.


Quick Reference

# All kubectl operations work on custom resources
kubectl get      backuppolicies -n demo
kubectl get      bp -n demo                  # shortName
kubectl describe bp nightly -n demo
kubectl edit     bp nightly -n demo
kubectl delete   bp nightly -n demo

# Output formats
kubectl get bp -n demo -o yaml
kubectl get bp -n demo -o json
kubectl get bp -n demo -o jsonpath='{.items[*].metadata.name}'

# Watch for changes
kubectl get bp -n demo -w

# List across all namespaces
kubectl get bp -A

# Patch spec
kubectl patch bp nightly -n demo \
  --type=merge -p '{"spec":{"suspended":true}}'

Key Takeaways

  • A working CRD deployment needs: the CRD YAML, RBAC ClusterRoles, and at least one sample custom resource
  • The API server validates all custom resources against the schema at apply time — errors are surfaced immediately, not inside the controller
  • Default values in the schema are injected at admission time and appear in every stored object
  • RBAC for custom resources uses {plural}.{group} as the resource name — status and finalizers are separate sub-resources
  • Without a controller, custom resources are stored in etcd and serve as validated configuration — nothing acts on them until a controller is deployed

What’s Next

EP05: Kubernetes CRD CEL Validation extends schema validation beyond simple type and range checks — cross-field rules (“if storageClass is premium, retentionDays must be at most 90″), regex validation beyond pattern, and immutable field enforcement. All without an admission webhook.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Schema Explained: Versions, Validation, and Status Subresource

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 3
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • The Kubernetes CRD schema is defined in spec.versions[].schema.openAPIV3Schema — the API server uses it to validate every custom resource create and update before storing in etcd
    (OpenAPI v3 schema = a JSON Schema dialect that describes the structure, types, and constraints of your resource’s fields)
  • spec.versions is a list — CRDs can serve multiple API versions simultaneously; exactly one version must have storage: true
  • scope: Namespaced vs scope: Cluster controls whether custom resources live inside a namespace or at cluster level (like PersistentVolume vs PersistentVolumeClaim)
  • spec.names defines the plural, singular, kind, and optional shortNames used in kubectl and RBAC
  • The status subresource (subresources.status: {}) separates user writes (spec) from controller writes (status) — enabling optimistic concurrency and kubectl status support
  • The scale subresource (subresources.scale) makes your custom resource compatible with kubectl scale and the HorizontalPodAutoscaler

The Big Picture

  ANATOMY OF A CUSTOMRESOURCEDEFINITION

  apiVersion: apiextensions.k8s.io/v1
  kind: CustomResourceDefinition
  metadata:
    name: {plural}.{group}        ← MUST be exactly this format
  spec:
    group: {group}                ← API group (e.g. storage.example.com)
    scope: Namespaced | Cluster   ← where instances live
    names:                        ← how kubectl refers to this resource
      plural: backuppolicies
      singular: backuppolicy
      kind: BackupPolicy
      shortNames: [bp]
    versions:                     ← can be a list; one must have storage: true
      - name: v1alpha1
        served: true              ← API server responds to this version
        storage: true             ← etcd stores objects in this version
        schema:
          openAPIV3Schema:        ← validation schema for ALL objects of this type
            type: object
            properties:
              spec: {...}
              status: {...}
        subresources:
          status: {}              ← enables separate status write path
          scale:                  ← enables kubectl scale + HPA
            specReplicasPath: .spec.replicas
            statusReplicasPath: .status.replicas
        additionalPrinterColumns: ← extra columns in kubectl get output
          - name: Schedule
            type: string
            jsonPath: .spec.schedule

Understanding the Kubernetes CRD schema is the prerequisite for writing a CRD that behaves correctly in production — validation catches bad data at the API boundary, the status subresource prevents controller race conditions, and scope determines your entire RBAC and multi-tenancy model.


spec.group and metadata.name

The group is a reverse-DNS identifier for your API. Convention:

storage.example.com     ← domain you control + functional area
monitoring.myteam.io
databases.platform.company.com

The CRD’s metadata.name must be exactly {plural}.{group}:

metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  names:
    plural: backuppolicies

If these do not match, the API server rejects the CRD with a validation error. This is the most common first-timer mistake.


spec.scope: Namespaced vs Cluster

  SCOPE DETERMINES WHERE INSTANCES LIVE

  Namespaced (scope: Namespaced)       Cluster (scope: Cluster)
  ─────────────────────────────         ──────────────────────────
  kubectl get backuppolicies -n prod    kubectl get clusterbackuppolicies
  kubectl get backuppolicies -A         (no -n flag, no namespace)

  Analogous to: Pod, Deployment,        Analogous to: PersistentVolume,
                ConfigMap                             ClusterRole, Node

Namespaced: Use when instances are per-tenant or per-application. Users with namespace-scoped RBAC can manage their own instances without cluster-admin. Most CRDs should be namespaced.

Cluster-scoped: Use when instances represent cluster-wide configuration — a ClusterIssuer (cert-manager), ClusterSecretStore (ESO), a StorageClass-like concept. Requires cluster-level RBAC to create/modify.

You cannot change scope after a CRD is created without deleting and recreating it (which deletes all instances). Choose carefully.


spec.versions: Serving Multiple API Versions

spec:
  versions:
    - name: v1alpha1
      served: true
      storage: false       # not stored; converted on read
      schema:
        openAPIV3Schema: {...}
    - name: v1beta1
      served: true
      storage: false
      schema:
        openAPIV3Schema: {...}
    - name: v1
      served: true
      storage: true        # etcd stores in this version
      schema:
        openAPIV3Schema: {...}

Rules:
served: true means the API server accepts requests at this version
served: false means the API server returns 404 for that version — use to deprecate
– Exactly one version must have storage: true — this is what gets written to etcd
– When a client requests a non-storage version, the API server converts on the fly (or calls your conversion webhook — see EP08)

Early in development, start with v1alpha1 storage: true. Promote to v1 when the schema is stable. EP08 covers how to do this without losing data.


spec.names: What kubectl Sees

spec:
  names:
    plural:     backuppolicies     # kubectl get backuppolicies
    singular:   backuppolicy       # kubectl get backuppolicy (also works)
    kind:       BackupPolicy       # used in YAML apiVersion/kind
    listKind:   BackupPolicyList   # optional; auto-derived if omitted
    shortNames:                    # kubectl get bp
      - bp
    categories:                    # kubectl get all includes this type
      - all

categories is worth noting: if you add all to categories, your custom resources appear when someone runs kubectl get all -n mynamespace. Most CRDs deliberately do not add this — it clutters get all output. Only add it if your resource is a primary operational concern.


schema.openAPIV3Schema: Validation

The schema is where you define field types, required fields, constraints, and descriptions. The API server validates every create and update against this schema before writing to etcd.

schema:
  openAPIV3Schema:
    type: object
    required: ["spec"]
    properties:
      spec:
        type: object
        required: ["schedule", "retentionDays"]
        properties:
          schedule:
            type: string
            description: "Cron expression for backup schedule"
            pattern: '^(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)\s+(\*|[0-9,\-\/]+)$'
          retentionDays:
            type: integer
            minimum: 1
            maximum: 365
          storageClass:
            type: string
            default: "standard"        # default value (Kubernetes 1.17+)
          targets:
            type: array
            maxItems: 10
            items:
              type: object
              required: ["name"]
              properties:
                name:
                  type: string
                namespace:
                  type: string
                  default: "default"
      status:
        type: object
        x-kubernetes-preserve-unknown-fields: true   # controllers write arbitrary status

Field types available

Type Usage
string Text values; supports format, pattern, enum, minLength, maxLength
integer Whole numbers; supports minimum, maximum
number Floating point
boolean true/false
object Nested structure; use properties to define fields
array List; use items to define element schema; supports minItems, maxItems

x-kubernetes-preserve-unknown-fields: true

This tells the API server not to prune fields it does not know about. Use it on status (controllers write whatever they need) and on fields that are intentionally free-form (like a config field that accepts arbitrary YAML). Avoid it on spec — it bypasses validation.

Validation behavior in practice

# This will fail with a clear error:
kubectl apply -f - <<EOF
apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: bad
  namespace: default
spec:
  schedule: "not-a-cron"    # fails pattern validation
  retentionDays: 500         # fails maximum: 365
EOF
The BackupPolicy "bad" is invalid:
  spec.schedule: Invalid value: "not-a-cron": spec.schedule in body should match
    '^(\*|[0-9,\-\/]+)\s+...'
  spec.retentionDays: Invalid value: 500: spec.retentionDays in body should be
    less than or equal to 365

Schema validation catches configuration mistakes at apply time, not at runtime inside a pod. This is one of the core advantages of expressing domain configuration as CRDs rather than ConfigMaps.


additionalPrinterColumns: What kubectl get Shows

By default, kubectl get backuppolicies shows only NAME and AGE. You can add columns:

additionalPrinterColumns:
  - name: Schedule
    type: string
    jsonPath: .spec.schedule
    description: Cron schedule for backups
  - name: Retention
    type: integer
    jsonPath: .spec.retentionDays
    priority: 1          # 0 = always shown; 1 = only with -o wide
  - name: Ready
    type: string
    jsonPath: .status.conditions[?(@.type=='Ready')].status
  - name: Age
    type: date
    jsonPath: .metadata.creationTimestamp

Result:

NAME        SCHEDULE      READY   AGE
nightly     0 2 * * *     True    3d
weekly      0 0 * * 0     False   7d

Good printer columns turn kubectl get into a useful operational dashboard. Include Ready (from status conditions) so operators can immediately see which custom resources are healthy without running kubectl describe.


The Status Subresource

subresources:
  status: {}

Without the status subresource, spec and status are part of the same object. Any user with update permission on the CRD can modify both. Controllers write status through the same path as users write spec.

With the status subresource enabled:
kubectl apply / kubectl patch only update spec — the status block is stripped
– Controllers use the /status subresource endpoint to write status
– RBAC can grant update on backuppolicies (spec) independently from update on backuppolicies/status

  WITHOUT status subresource:         WITH status subresource:
  ─────────────────────────            ──────────────────────────
  PUT /backuppolicies/nightly          PUT /backuppolicies/nightly
  → updates spec AND status            → updates spec only

                                       PUT /backuppolicies/nightly/status
                                       → updates status only (controller path)

Always enable the status subresource on production CRDs. The split between spec and status is fundamental to the Kubernetes API contract. Without it, a controller updating status can accidentally overwrite spec changes made by a user at the same time.


The Scale Subresource

subresources:
  scale:
    specReplicasPath: .spec.replicas
    statusReplicasPath: .status.replicas
    labelSelectorPath: .status.labelSelector

This makes your custom resource compatible with:

kubectl scale backuppolicy nightly --replicas=3

And with HorizontalPodAutoscaler targeting your custom resource. If your CRD manages something replica-based (workers, shards, connections), enabling the scale subresource lets it plug into the standard Kubernetes autoscaling ecosystem without extra plumbing.


⚠ Common Mistakes

Forgetting x-kubernetes-preserve-unknown-fields: true on status. If you validate the status field with a strict schema but do not add this, the API server will prune any status fields the controller writes that are not in the schema. The controller’s status updates will silently lose fields. Either define the full status schema or use x-kubernetes-preserve-unknown-fields: true.

Using scope: Cluster for resources that should be namespaced. Once a CRD is created as cluster-scoped, you cannot make it namespaced without deleting and recreating it. Plan scope before deploying to production.

Not enabling the status subresource. Without it, controllers writing status can race with users updating spec. It also means kubectl patch --subresource=status does not work and some tooling behaves unexpectedly. Enable it from the start.

Loose schema with no required fields. An openAPIV3Schema with no required constraint accepts objects with empty spec. This usually means your controller gets called with a resource that is missing mandatory configuration. Define required fields and validate them at the API boundary, not inside the controller.


Quick Reference

# Inspect the full schema of a CRD
kubectl get crd backuppolicies.storage.example.com -o yaml | \
  yq '.spec.versions[0].schema'

# Check what subresources are enabled
kubectl get crd certificates.cert-manager.io -o jsonpath=\
  '{.spec.versions[0].subresources}'

# See all served versions for a CRD
kubectl get crd prometheuses.monitoring.coreos.com \
  -o jsonpath='{.spec.versions[*].name}'

# Check which version is the storage version
kubectl get crd certificates.cert-manager.io \
  -o jsonpath='{.spec.versions[?(@.storage==true)].name}'

# Describe the printer columns for a CRD
kubectl get crd scaledobjects.keda.sh \
  -o jsonpath='{.spec.versions[0].additionalPrinterColumns}'

Key Takeaways

  • spec.versions allows serving and storing multiple API versions; only one version has storage: true
  • scope (Namespaced vs Cluster) cannot be changed after creation — choose deliberately
  • openAPIV3Schema validates every CR at the API boundary, before etcd storage
  • The status subresource separates the user write path (spec) from the controller write path (status) — always enable it
  • additionalPrinterColumns makes kubectl get operationally useful; include a Ready column from status conditions

What’s Next

EP04: Write Your First Kubernetes CRD puts the anatomy into practice — a complete hands-on walkthrough building a BackupPolicy CRD from scratch, applying it to a cluster, creating instances, and verifying validation, RBAC, and status behavior.

Get EP04 in your inbox when it publishes → subscribe at linuxcent.com

What Is a Kubernetes CRD? How Custom Resources Extend the API

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 1
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • A Kubernetes CRD (Custom Resource Definition) is how you add new resource types to the Kubernetes API — the same way Deployment and Service exist natively, you can make BackupPolicy or Certificate exist too
    (CRD = the schema/blueprint; Custom Resource = an instance of that schema, just like a Pod is an instance of the Pod schema)
  • Every kubectl get crds on a real cluster shows dozens of them — cert-manager, KEDA, Prometheus Operator, Crossplane all ship their own CRDs
  • CRDs are served by the same API server as built-in resources — kubectl, RBAC, watches, and events all work identically
  • A CRD alone does nothing — a controller watches the custom resources and acts on them; together they form an Operator
  • CRDs live in etcd just like Pods and Deployments — they survive API server restarts and cluster upgrades
  • You do not need to modify Kubernetes source code or restart the API server to add a CRD

The Big Picture

  HOW KUBERNETES CRDs EXTEND THE API

  ┌──────────────────────────────────────────────────────────────┐
  │  Kubernetes API Server                                       │
  │                                                              │
  │  Built-in resources          Custom resources (via CRD)      │
  │  ─────────────────           ──────────────────────────      │
  │  Pod                         Certificate     (cert-manager)  │
  │  Deployment                  ScaledObject    (KEDA)          │
  │  Service                     ExternalSecret  (ESO)           │
  │  ConfigMap                   BackupPolicy    (your team)     │
  │  ...                         ...                             │
  │                                                              │
  │  All resources: same API, same kubectl, same RBAC, same etcd │
  └──────────────────────────────────────────────────────────────┘
            ▲                          ▲
            │ built in                 │ registered at runtime
            │                         │
         Kubernetes              CustomResourceDefinition
          binary                    (a YAML you apply)

What is a Kubernetes CRD? It is a resource that defines resources — a schema registration that teaches the API server about a new object type you want to use in your cluster.


What Problem CRDs Solve

Kubernetes ships with roughly 50 resource types: Pods, Deployments, Services, ConfigMaps, Secrets, PersistentVolumes, and so on. These cover the general-purpose building blocks for running containerized workloads.

But the moment you operate real infrastructure, you hit the edges. You want to express:

  • “This database should have three replicas with point-in-time recovery enabled” — not a Deployment
  • “This TLS certificate for api.example.com should renew 30 days before expiry” — not a Secret
  • “This queue consumer should scale to zero when the queue is empty” — not a HorizontalPodAutoscaler

Before CRDs (pre-2017), the only options were: use ConfigMaps as a poor substitute (no schema, no validation, no dedicated RBAC), or fork Kubernetes and add the resource natively (impractical for everyone outside the core team).

CRDs, introduced as stable in Kubernetes 1.16, solved this by letting you register a new resource type with the API server at runtime — without touching Kubernetes source code, without restarting the API server, without any special access beyond being able to create cluster-scoped resources.


The Kubernetes API: A Brief Mental Model

Before CRDs make sense, the API model needs to be clear.

  KUBERNETES API STRUCTURE

  apiVersion: apps/v1       ← API group (apps) + version (v1)
  kind: Deployment          ← resource type
  metadata:
    name: web               ← instance name
    namespace: default      ← namespace scope
  spec:
    replicas: 3             ← desired state

Every Kubernetes resource has:
– A group (e.g., apps, batch, networking.k8s.io) — or no group for core resources
– A version (e.g., v1, v1beta1)
– A kind (e.g., Deployment, Pod)
– A scope: namespaced or cluster-wide

The API server is a registry. Each group/version/kind combination maps to a Go struct that knows how to validate, store, and serve that resource type.

A CRD registers a new entry in that registry. You supply the group, version, kind, and schema. The API server handles everything else — serving it via REST, storing it in etcd, exposing it to kubectl.


What a CRD Looks Like

Here is the smallest possible CRD — it creates a new BackupPolicy resource type in the storage.example.com API group:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backuppolicies.storage.example.com
spec:
  group: storage.example.com
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                schedule:
                  type: string
                retentionDays:
                  type: integer
  scope: Namespaced
  names:
    plural: backuppolicies
    singular: backuppolicy
    kind: BackupPolicy
    shortNames:
      - bp

Apply it:

kubectl apply -f backuppolicy-crd.yaml

Now create an instance:

apiVersion: storage.example.com/v1alpha1
kind: BackupPolicy
metadata:
  name: nightly
  namespace: default
spec:
  schedule: "0 2 * * *"
  retentionDays: 30
kubectl apply -f nightly-backup.yaml
kubectl get backuppolicies
kubectl get bp            # shortName works
kubectl describe bp nightly

The API server validates the spec against the schema, stores it in etcd, and returns it via all the standard API endpoints — all without a single line of custom code.


CRD vs Built-In Resource: What Is Different?

Not much, deliberately.

Capability Built-in resource Custom resource (CRD)
kubectl get / describe / delete Yes Yes
RBAC (Roles, ClusterRoles) Yes Yes
Watch (informers, events) Yes Yes
Stored in etcd Yes Yes
OpenAPI schema validation Yes Yes (you define the schema)
Admission webhooks Yes Yes
Status subresource Yes Optional (you enable it)
Scale subresource Yes Optional (you enable it)
Built-in controller behavior Yes No — you write the controller

The last row is the critical one. When you create a Deployment, the deployment controller immediately starts managing ReplicaSets. When you create a BackupPolicy, nothing happens — until you write and deploy a controller that watches BackupPolicy objects and acts on them.

That controller + the CRD is what people call an Operator.


A Real Cluster: What You Actually See

Run this on any cluster running cert-manager, Prometheus Operator, or any other tooling:

kubectl get crds

Sample output (abbreviated):

NAME                                                  CREATED AT
certificates.cert-manager.io                          2024-11-01T08:12:00Z
certificaterequests.cert-manager.io                   2024-11-01T08:12:00Z
issuers.cert-manager.io                               2024-11-01T08:12:00Z
clusterissuers.cert-manager.io                        2024-11-01T08:12:00Z
scaledobjects.keda.sh                                 2024-11-01T08:13:00Z
scaledjobs.keda.sh                                    2024-11-01T08:13:00Z
externalsecrets.external-secrets.io                   2024-11-01T08:14:00Z
prometheuses.monitoring.coreos.com                    2024-11-01T08:15:00Z
servicemonitors.monitoring.coreos.com                 2024-11-01T08:15:00Z

Every tool that ships as a CRD-based system registers its resource types here first. The count often surprises engineers: a production cluster with a typical toolchain easily has 40–80 CRDs.

Check how many are on your cluster:

kubectl get crds --no-headers | wc -l

How the API Server Handles a CRD

When you apply a CRD, the API server does three things:

  CRD REGISTRATION FLOW

  kubectl apply -f my-crd.yaml
          │
          ▼
  1. API server validates the CRD manifest
     (is the schema valid OpenAPI v3? are names correct?)
          │
          ▼
  2. CRD stored in etcd
     (under /registry/apiextensions.k8s.io/customresourcedefinitions/)
          │
          ▼
  3. New REST endpoints activated immediately:
     GET  /apis/storage.example.com/v1alpha1/namespaces/{ns}/backuppolicies
     POST /apis/storage.example.com/v1alpha1/namespaces/{ns}/backuppolicies
     ...

From this point, any kubectl get backuppolicies or API call to those endpoints is handled exactly like a built-in resource call — the API server serves it from etcd, applies RBAC, runs admission webhooks, and returns standard JSON.

No restart required. The new endpoints appear within seconds.


The Difference Between CRD and CR

Two terms that are easily confused:

  • CRD (CustomResourceDefinition) — the schema/blueprint. There is one CRD per resource type. certificates.cert-manager.io is a CRD.
  • CR (Custom Resource) — an instance of a CRD. Every Certificate object you create is a custom resource. You can have thousands of CRs per CRD.
  CRD (one)          →  Custom Resource (many)
  ─────────             ─────────────────────
  certificates          web-tls           (namespace: production)
  .cert-manager.io      api-tls           (namespace: production)
                        admin-tls         (namespace: staging)
                        ...

The CRD is applied once (usually by the tool’s Helm chart). Custom resources are created by your users, your CI pipeline, or your GitOps system throughout the life of the cluster.


Where CRDs Fit in the Kubernetes Extension Model

CRDs are one of three ways to extend Kubernetes:

  KUBERNETES EXTENSION MECHANISMS

  1. CRDs + Controllers (Operators)
     Add new resource types + behavior
     → cert-manager, KEDA, Argo CD, Crossplane
     Used for: domain-specific abstractions, infrastructure management

  2. Admission Webhooks
     Intercept API requests to validate or mutate objects
     → OPA/Gatekeeper, Kyverno, Istio injection
     Used for: policy enforcement, sidecar injection, defaulting

  3. API Aggregation (AA)
     Register a fully separate API server behind the main API server
     → metrics-server, custom autoscalers
     Used for: when you need non-CRUD semantics (e.g. exec, attach, streaming)

For 95% of use cases, CRDs + controllers are the right mechanism. API aggregation is complex and only warranted for non-standard API semantics. Admission webhooks are complementary to CRDs, not an alternative.


⚠ Common Mistakes

Confusing the CRD with the controller. The CRD is just a schema registration — it does not execute code. If you apply a CRD but do not deploy its controller, creating custom resources will succeed (the API server accepts them) but nothing will happen. This catches many people the first time they try to use cert-manager by only applying the CRDs without installing the cert-manager controller.

Assuming CRD deletion is safe. Deleting a CRD deletes all custom resources of that type from etcd. There is no “are you sure?” prompt. If you delete the certificates.cert-manager.io CRD, every Certificate object in every namespace is gone.

Treating CRDs as ConfigMap replacements. Some teams store configuration in CRDs purely to get schema validation. This works, but without a controller, the custom resources are inert data. If you only need configuration storage with validation, a CRD is viable — just be explicit that there is no reconciliation loop.


Quick Reference

# List all CRDs in the cluster
kubectl get crds

# Inspect a specific CRD's schema
kubectl get crd certificates.cert-manager.io -o yaml

# List all custom resources of a type
kubectl get certificates -A

# Get details on a specific custom resource
kubectl describe certificate web-tls -n production

# Delete a CRD (WARNING: deletes all instances)
kubectl delete crd backuppolicies.storage.example.com

# Check if a CRD is established (ready to use)
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.conditions[?(@.type=="Established")].status}'
# Returns: True

Key Takeaways

  • A Kubernetes CRD registers a new resource type with the API server — no source code changes, no restart required
  • Custom resources behave identically to built-in resources: kubectl, RBAC, watches, etcd, admission webhooks all work the same way
  • The CRD is just the schema; a controller gives custom resources behavior — together they form an Operator
  • Every production cluster running modern tooling already uses dozens of CRDs
  • Deleting a CRD deletes all its instances — treat CRDs as production-critical objects

What’s Next

EP02: CRDs You Already Use makes this concrete before we go deeper — we walk through cert-manager’s Certificate, KEDA’s ScaledObject, and External Secrets’ ExternalSecret as working examples, so you understand what a well-designed CRD looks like from a user’s perspective before you design your own.

Get EP02 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes Today: v1.33 to v1.35, In-Place Resize GA, and What Comes Next

Reading Time: 6 minutes


Introduction

Ten years after the first commit, Kubernetes is not exciting in the way it was in 2015. That’s a compliment. The system is stable. The APIs are mature. The migrations — dockershim, PSP, cloud provider code — are behind us.

What the 1.33–1.35 cycle shows is a project focused on precision: removing edge cases, promoting long-running alpha features to stable, and making the scheduler, storage, and security model more correct rather than more powerful. That’s what a mature infrastructure platform looks like.

Here’s what happened and where the project is headed.


Kubernetes 1.33 — Sidecar Resize, In-Place Resize Beta (April 2025)

Code name: Octarine

In-Place Pod Vertical Scaling reaches Beta

After landing as alpha in 1.27, in-place pod resource resizing became beta in 1.33 — enabled by default via the InPlacePodVerticalScaling feature gate.

The capability: change CPU and memory requests/limits on a running container without terminating and restarting the pod.

# Resize a running container's CPU limit without restart
kubectl patch pod api-pod-xyz --type='json' -p='[
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/requests/cpu",
    "value": "2"
  },
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/limits/cpu",
    "value": "4"
  }
]'

# Verify the resize was applied
kubectl get pod api-pod-xyz -o jsonpath='{.status.containerStatuses[0].resources}'

Why this matters operationally: Before in-place resize, vertical scaling meant terminating the pod, losing in-memory state, waiting for a new pod to become ready. For databases with warm buffer pools, JVM applications with loaded heap caches, or any workload where startup cost is significant, this was a serious limitation. Vertical Pod Autoscaler (VPA) worked around it by restarting pods — acceptable for stateless workloads, problematic for stateful ones.

In 1.33, resizing also works for sidecar containers, combining two 1.32-stable features.

Sidecar Containers — Full Maturity

The first feature to formally combine sidecar and in-place resize: you can now vertically scale a service mesh proxy (Envoy sidecar) without restarting the application pod. For high-traffic services where the proxy itself becomes the CPU bottleneck, this is directly actionable.


Gateway API v1.4 (October 2025)

Gateway API continued its rapid iteration with v1.4:

BackendTLSPolicy (Standard channel): Configure TLS between the gateway and the backend service — not just TLS termination at the gateway, but end-to-end encryption:

apiVersion: gateway.networking.k8s.io/v1alpha3
kind: BackendTLSPolicy
metadata:
  name: api-backend-tls
spec:
  targetRefs:
  - group: ""
    kind: Service
    name: api-service
  validation:
    caCertificateRefs:
    - name: internal-ca
      group: ""
      kind: ConfigMap
    hostname: api.internal.corp

Gateway Client Certificate Validation: The gateway can now validate client certificates — mutual TLS for ingress traffic, not just between services.

TLSRoute to Standard: TLS routing (based on SNI, not HTTP host headers) graduated to the standard channel — enabling TCP workloads with TLS passthrough through the Gateway API model.

ListenerSet: Group multiple Gateway listeners — useful for shared infrastructure where multiple teams need to attach routes to the same gateway without managing separate Gateway resources.


Kubernetes 1.34 — Scheduler Improvements, DRA Continues (August 2025)

The 1.34 release focused on the scheduler and Dynamic Resource Allocation:

DRA structured parameters stabilization: The Dynamic Resource Allocation API matured its parameter model — resource drivers can expose structured claims that the scheduler understands, enabling topology-aware placement of GPU workloads:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: gpu-claim
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: gpu.nvidia.com
      selectors:
      - cel:
          expression: device.attributes["nvidia.com/gpu-product"].string() == "A100-SXM4-80GB"
      count: 2

Scheduler QueueingHint stable: Plugins can now tell the scheduler when to re-queue a pod for scheduling — instead of the scheduler periodically retrying all unschedulable pods, plugins signal when relevant cluster state has changed. This significantly reduces scheduler CPU consumption in large clusters with many unschedulable pods.

Fine-grained node authorization improvements: Kubelets can now be restricted from accessing Service resources they don’t need — further reducing the blast radius of a compromised kubelet.


Kubernetes 1.35 — In-Place Resize GA, Memory Limits Unlocked (December 2025)

In-Place Pod Vertical Scaling Graduates to Stable

After landing in alpha (1.27), beta (1.33), in-place resize graduated to GA in 1.35. Two significant improvements accompanied GA:

Memory limit decreases now permitted: Previously, you could increase memory limits in-place but not decrease them. The restriction existed because the kernel doesn’t immediately reclaim memory when the limit is lowered — the OOM killer would need to run. 1.35 lifts this restriction with proper handling: the kernel is instructed to reclaim, and the pod status reflects the resize progress.

Pod-Level Resources (alpha in 1.35): Specify resource requests and limits at the pod level rather than per-container — with in-place resize support. Useful for init containers and sidecar patterns where total pod resources matter more than per-container allocation.

spec:
  # Pod-level resources (alpha) — total budget for all containers
  resources:
    requests:
      cpu: "4"
      memory: "8Gi"
  containers:
  - name: application
    image: myapp:latest
    # No per-container resources; pod-level applies
  - name: log-collector
    image: fluentbit:latest
    restartPolicy: Always  # sidecar

Other 1.35 Highlights

Topology Spread Constraints improvements: Better handling of unschedulable scenarios — whenUnsatisfiable: ScheduleAnyway now has smarter fallback behavior.

VolumeAttributesClass stable: Change storage performance characteristics (IOPS, throughput) of a PersistentVolume without re-provisioning — the storage equivalent of in-place pod resize.

# Change volume IOPS without re-provisioning
kubectl patch pvc database-pvc --type='merge' -p='
  {"spec": {"volumeAttributesClassName": "high-performance"}}'

Job success policy improvements: Declare a Job successful when a subset of pods complete successfully — for distributed training jobs where not all workers need to finish.


What’s in Kubernetes 1.36 (April 22, 2026)

Kubernetes 1.36 is on track for April 22, 2026 release. Based on the enhancement tracking and KEP (Kubernetes Enhancement Proposal) pipeline, expected highlights include:

  • DRA continuing toward stable
  • Pod-level resources moving to beta
  • Scheduler improvements for AI/ML workload placement
  • Further Gateway API integration as core networking model

The project has reached a rhythm: four releases per year, each focused on advancing a predictable set of features through alpha → beta → stable. The drama of the 2019–2022 period (PSP, dockershim, API removals) is behind it.


The State of the Ecosystem in 2026

Control Plane Deployment Models

Model Examples Best For
Managed (cloud provider) GKE, EKS, AKS Most organizations; no control plane ops
Self-managed kubeadm, k3s, Talos Air-gapped, on-prem, specific compliance requirements
Managed (platform) Rancher, OpenShift Enterprises that need multi-cluster management + vendor support

CNI Landscape

CNI Model Notable Feature
Cilium eBPF kube-proxy replacement, network policy at kernel, Hubble observability
Calico eBPF or iptables BGP-based networking, hybrid cloud routing
Flannel VXLAN/host-gw Simple, low overhead, no network policy
Weave Mesh overlay Easy multi-host setup

eBPF-based CNIs (Cilium, Calico in eBPF mode) are now the default recommendation for production clusters. The iptables era of Kubernetes networking is ending.

Security Stack in 2026

A hardened Kubernetes cluster in 2026 runs:

Cluster provisioning:    Cluster API + GitOps (Flux/ArgoCD)
Admission control:       Pod Security Admission (restricted) + Kyverno or OPA/Gatekeeper
Runtime security:        Falco (eBPF-based syscall monitoring)
Network security:        Cilium NetworkPolicy + Cilium Cluster Mesh for multi-cluster
Image security:          Cosign signing in CI + admission webhook for signature verification
Secret management:       External Secrets Operator → HashiCorp Vault or cloud KMS
Observability:           Prometheus + Grafana + Hubble (network flows) + OpenTelemetry

The Permanent Principles That Haven’t Changed

Looking across twelve years and 35 minor versions, some things have not changed:

The API as the universal interface: Everything in Kubernetes is a resource. This remains the most important architectural decision — it makes every tool, every controller, every GitOps system work with the same model.

Reconciliation loops: Every Kubernetes controller watches actual state and drives it toward desired state. The controller pattern from 2014 is unchanged. CRDs and Operators are just more instances of it.

Labels and selectors: The flexible grouping mechanism from 1.0 is still the primary way Kubernetes components find each other. Services find pods. HPA finds Deployments. Operators find their managed resources.

Declarative, not imperative: You describe what you want. Kubernetes figures out how to achieve and maintain it. This principle, inherited from Borg’s BCL configuration, underlies everything from Deployments to Crossplane’s cloud resource management.


What’s Coming: The Next Five Years

WebAssembly on Kubernetes: The Wasm ecosystem (wasmCloud, SpinKube) is building toward running WebAssembly workloads as first-class Kubernetes pods — near-native performance, smaller images, stronger isolation than containers. Still early, but gaining real adoption.

AI inference as infrastructure: LLM serving is becoming a cluster primitive. Tools like KServe and vLLM on Kubernetes are moving from research to production. The scheduler, resource model, and networking will continue adapting to inference workload patterns.

Confidential computing: AMD SEV, Intel TDX, and ARM CCA provide hardware-level memory encryption for pods. The RuntimeClass mechanism and ongoing kernel work are making confidential Kubernetes workloads operational rather than experimental.

Leaner distributions: k3s, k0s, Talos, and Flatcar-based minimal Kubernetes distributions are growing in adoption for edge, IoT, and resource-constrained environments. The pressure is toward smaller, more auditable control planes.


Key Takeaways

  • In-place pod vertical scaling went from alpha (1.27) to stable (1.35) — live CPU and memory resize without pod restart changes the economics of stateful workload management
  • Gateway API v1.4 completes the ingress replacement story: BackendTLSPolicy, client certificate validation, and TLSRoute in standard channel
  • VolumeAttributesClass stable (1.35): Change storage performance in-place — the storage parallel to pod resource resize
  • The eBPF era of Kubernetes networking is established: Cilium as default CNI in GKE, growing in EKS/AKS, replacing iptables-based kube-proxy
  • The Kubernetes project in 2026 is focused on precision — promoting mature features to stable, reducing edge cases, improving scheduler efficiency — not adding new abstractions
  • WebAssembly, confidential computing, and AI inference scheduling are the frontiers to watch

Series Wrap-Up

Era Defining Change
2003–2014 Borg and Omega build the playbook internally at Google
2014–2016 Kubernetes 1.0, CNCF, and winning the container orchestration wars
2016–2018 RBAC stable, CRDs, cloud providers all-in on managed K8s
2018–2020 Operators, service mesh, OPA/Gatekeeper — the extensibility era
2020–2022 Supply chain crisis, PSP deprecated, API removals, dockershim exit
2022–2023 Dockershim and PSP removed, eBPF networking takes over
2023–2025 GitOps standard, sidecar stable, DRA, AI/ML workloads
2025–2026 In-place resize GA, VolumeAttributesClass, Gateway API complete

From 47,501 lines of Go in a 250-file GitHub commit to the operating system of the cloud — and still reconciling.


← EP07: Platform Engineering Era

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com

The Platform Engineering Era: GitOps, AI Workloads, and Leaner Kubernetes (2023–2025)

Reading Time: 6 minutes


Introduction

By 2023, the question had shifted from “how do we run Kubernetes?” to “how do we let other engineers run their workloads on Kubernetes without becoming a bottleneck?”

This is the platform engineering problem. And it drove the tooling that defined 2023–2025: GitOps as the deployment standard, Cluster API for Kubernetes-on-Kubernetes provisioning, AI/ML workloads forcing new scheduling capabilities, and the Kubernetes project itself shedding more weight to become faster to release and operate.


GitOps: Principle Becomes Practice

GitOps as a term was coined by Weaveworks in 2017. By 2023, it was no longer a debate — it was the default deployment model for organizations running Kubernetes at scale.

The principle: the desired state of your cluster lives in Git. A controller watches the repository and reconciles the cluster state to match. Every deployment is a PR merge. The audit trail is the Git history.

Flux v2 (CNCF graduated) and ArgoCD (CNCF incubating) became the two dominant implementations:

# Flux: GitRepository + Kustomization
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: production-config
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/org/k8s-config
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: production-apps
  namespace: flux-system
spec:
  interval: 10m
  path: ./clusters/production
  prune: true          # Remove resources deleted from Git
  sourceRef:
    kind: GitRepository
    name: production-config
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: api
    namespace: production

The prune: true behavior is critical: resources deleted from Git are deleted from the cluster. This is what makes GitOps a security control — unknown resources that aren’t in Git get removed. No more accumulation of forgotten test deployments, rogue debug pods, or unauthorized configuration changes that outlive the engineer who made them.

ArgoCD’s Application model added a UI, synchronization policies, and multi-cluster management:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-api
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/org/apps
    targetRevision: HEAD
    path: api/production
  destination:
    server: https://kubernetes.default.svc
    namespace: api
  syncPolicy:
    automated:
      prune: true
      selfHeal: true    # Revert manual kubectl changes
    syncOptions:
    - CreateNamespace=true

The selfHeal: true option is where GitOps becomes enforceable: any manual change made with kubectl is automatically reverted within the sync interval. For compliance-sensitive environments, this is a configuration drift prevention control.


Cluster API: Kubernetes Managing Kubernetes

Cluster API (cluster-sigs/cluster-api) flipped the usual model: instead of using tools like Terraform or Ansible to provision Kubernetes clusters, Cluster API lets you manage Kubernetes clusters as Kubernetes resources — using a management cluster to provision and manage workload clusters.

# Create a new Kubernetes cluster as a Kubernetes resource
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: workload-cluster-prod
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: workload-cluster-prod
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: workload-cluster-prod-control-plane

Cluster API reconciliation handles cluster provisioning, scaling, upgrades, and deletion — all through the Kubernetes API, with all the tooling (RBAC, audit logging, GitOps integration) that entails. Multi-cluster platform teams could now manage hundreds of workload clusters from a single management cluster.


Kubernetes 1.28 — Sidecar Containers Alpha (August 2023)

Sidecar containers had been a Kubernetes pattern since 2015 — a helper container in the same pod as the main application. But there was no native sidecar lifecycle management. Sidecars were just regular init containers or additional containers, which meant:
– Init container sidecars ran before the application and had to block until they succeeded
– Regular container sidecars had no ordering guarantees at startup
– At pod termination, sidecars could die before the application finished draining

1.28 introduced native sidecar support: a new restartPolicy field for init containers:

spec:
  initContainers:
  - name: log-collector
    image: fluentbit:latest
    restartPolicy: Always    # This makes it a sidecar
    # Starts before main containers, stays running, stops after main containers exit
  containers:
  - name: application
    image: myapp:latest

A sidecar container (init container with restartPolicy: Always):
– Starts before application containers
– Stays running throughout the pod lifecycle
– Terminates automatically after all main containers exit
– Restarts if it crashes (unlike regular init containers)

This solved the service mesh sidecar problem: Istio and Linkerd injected Envoy proxies as regular containers, leading to race conditions where the proxy hadn’t started when the application tried to make outbound connections. Native sidecar lifecycle guarantees the proxy is ready before the application starts.

Also in 1.28:
Retroactive default StorageClass assignment: Existing PVCs without a StorageClass assignment get the default applied retroactively — useful for migrations
Non-graceful node shutdown stable: Handle node power failures without manual pod cleanup
Recovery from volume expansion failure: Previously, a failed volume expansion left the PVC in a broken state; 1.28 introduced a mechanism to recover


AI/ML Workloads Force New Kubernetes Capabilities

The LLM wave of 2023 drove GPU workloads onto Kubernetes at a scale and urgency the project hadn’t anticipated. Running LLM inference on Kubernetes required solving problems that CPU-centric cluster scheduling hadn’t encountered:

GPU topology awareness: Inference across multiple GPUs requires GPUs connected by NVLink or on the same PCIe switch, not arbitrary GPUs from different nodes or different PCIe buses. The Dynamic Resource Allocation API (1.26 alpha) was designed exactly for this.

Fractional GPU allocation: NVIDIA’s time-slicing and MIG (Multi-Instance GPU) allow multiple pods to share a single GPU. The GPU operator (NVIDIA) manages this at the node level:

# Check GPU resources visible to Kubernetes
kubectl get nodes -o custom-columns=\
  "NODE:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
# NODE       GPU
# gpu-node-1   8
# gpu-node-2   8

Batch scheduling for training jobs: Training runs require all workers to start simultaneously — a single missing GPU makes the entire job stall. The Kubernetes Job API doesn’t guarantee this. Projects like Volcano (CNCF incubating) and Kueue (Kubernetes SIG Scheduling) added gang scheduling: a job only starts when all requested resources are available.

# Kueue: queue AI training jobs with resource quotas
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: gpu-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["nvidia.com/gpu", "cpu", "memory"]
    flavors:
    - name: a100-80gb
      resources:
      - name: nvidia.com/gpu
        nominalQuota: 16

Kubernetes 1.29 — Sidecar to Beta, Load Balancer IP Mode (December 2023)

  • Sidecar containers beta: The lifecycle semantics were refined based on 1.28 alpha feedback
  • Load balancer IP mode alpha: Distinguish between load balancers that use virtual IPs (kube-proxy handles the traffic) vs. those that handle traffic directly (no need for kube-proxy rules) — important for eBPF-based load balancers
  • ReadWriteOncePod volume access stable

Kubernetes 1.30 — Structured Authorization Config (April 2024)

  • Structured authorization configuration beta: Define multiple authorization webhooks with explicit ordering, failure modes, and connection settings — replacing the flat --authorization-mode flag
  • Sidecar containers beta continues
  • Node memory swap support beta: Allow pods to use swap memory — controversial but necessary for workloads with bursty memory patterns that prefer using swap over OOM kill
# Node with swap enabled — kubelet config
kind: KubeletConfiguration
memorySwap:
  swapBehavior: LimitedSwap

The swap support feature reversed a long-standing Kubernetes hard stance: swap was disabled since 1.0 because its interaction with Kubernetes memory accounting was unpredictable. The 1.30 approach adds proper accounting and policies.


Kubernetes 1.31 — Cloud Provider Code Removal Complete (August 2024)

1.31 marked the completion of the cloud provider code removal — the 1.5 million line migration that had been running since 1.26. Core binaries are 40% smaller. The API server, controller manager, and scheduler no longer contain vendor-specific code.

Also in 1.31:
Persistent Volume health monitor stable
AppArmor support stable: AppArmor profiles for pods using the native Kubernetes field (not annotations)
Traffic distribution for Services beta: Express topology preferences for Service routing (prefer local node, prefer same zone)

# Traffic distribution: prefer endpoints in the same zone
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  trafficDistribution: PreferClose
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Kubernetes 1.32 — Sidecar Stable, DRA Beta (December 2024)

  • Sidecar containers stable: After nearly a decade of workarounds, the sidecar pattern is a first-class Kubernetes primitive
  • Dynamic Resource Allocation beta: GPU and specialized hardware scheduling ready for production evaluation
  • Job API improvements: Success and failure policies for indexed jobs — granular control over batch workload behavior
  • Custom Resource field selectors: Filter CRDs on arbitrary fields — making large CRD-based systems more efficient to query

Crossplane: Kubernetes as the Control Plane for Everything

Crossplane (CNCF graduated) extended the Kubernetes API model beyond the cluster itself. Using CRDs and controllers, Crossplane lets you manage cloud resources (RDS databases, S3 buckets, VPCs, IAM roles) as Kubernetes resources — provisioned, updated, and deleted through the Kubernetes API.

# Crossplane: provision an RDS PostgreSQL instance as a Kubernetes resource
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
  name: production-db
spec:
  forProvider:
    region: us-east-1
    dbInstanceClass: db.r6g.xlarge
    masterUsername: admin
    engine: postgres
    engineVersion: "15"
    allocatedStorage: 100
    multiAZ: true
  writeConnectionSecretsToRef:
    name: production-db-credentials
    namespace: production

For platform teams, Crossplane means a single control plane — the Kubernetes API — for both compute workloads and cloud infrastructure. GitOps tools (Flux, ArgoCD) manage both.


Key Takeaways

  • GitOps (Flux, ArgoCD) became the production deployment standard — not for ideological reasons, but because the audit trail, drift detection, and self-healing properties solve real operational and compliance problems
  • Cluster API made Kubernetes cluster lifecycle (provisioning, upgrades, deletion) a Kubernetes-native operation — the same API, tooling, and audit trail
  • Native sidecar containers (1.28 alpha → 1.32 stable) finally resolved the lifecycle ordering problem that service meshes and log collectors had worked around for years
  • AI/ML workloads drove new scheduling capabilities (DRA, gang scheduling via Kueue/Volcano) and made GPU topology awareness a first-class concern
  • Crossplane generalized the Kubernetes API model to cloud infrastructure — the cluster is now a control plane for everything, not just containers

What’s Next

← EP06: The Runtime Reckoning | EP08: Kubernetes Today →

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com