The Platform Engineering Era: GitOps, AI Workloads, and Leaner Kubernetes (2023–2025)

Reading Time: 6 minutes

Table of Contents

Introduction

By 2023, the question had shifted from “how do we run Kubernetes?” to “how do we let other engineers run their workloads on Kubernetes without becoming a bottleneck?”

This is the platform engineering problem. And it drove the tooling that defined 2023–2025: GitOps as the deployment standard, Cluster API for Kubernetes-on-Kubernetes provisioning, AI/ML workloads forcing new scheduling capabilities, and the Kubernetes project itself shedding more weight to become faster to release and operate.

GitOps: Principle Becomes Practice

GitOps as a term was coined by Weaveworks in 2017. By 2023, it was no longer a debate — it was the default deployment model for organizations running Kubernetes at scale.

The principle: the desired state of your cluster lives in Git. A controller watches the repository and reconciles the cluster state to match. Every deployment is a PR merge. The audit trail is the Git history.

Flux v2 (CNCF graduated) and ArgoCD (CNCF incubating) became the two dominant implementations:

# Flux: GitRepository + Kustomization
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: production-config
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/org/k8s-config
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: production-apps
  namespace: flux-system
spec:
  interval: 10m
  path: ./clusters/production
  prune: true          # Remove resources deleted from Git
  sourceRef:
    kind: GitRepository
    name: production-config
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: api
    namespace: production

The prune: true behavior is critical: resources deleted from Git are deleted from the cluster. This is what makes GitOps a security control — unknown resources that aren’t in Git get removed. No more accumulation of forgotten test deployments, rogue debug pods, or unauthorized configuration changes that outlive the engineer who made them.

ArgoCD’s Application model added a UI, synchronization policies, and multi-cluster management:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-api
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/org/apps
    targetRevision: HEAD
    path: api/production
  destination:
    server: https://kubernetes.default.svc
    namespace: api
  syncPolicy:
    automated:
      prune: true
      selfHeal: true    # Revert manual kubectl changes
    syncOptions:
    - CreateNamespace=true

The selfHeal: true option is where GitOps becomes enforceable: any manual change made with kubectl is automatically reverted within the sync interval. For compliance-sensitive environments, this is a configuration drift prevention control.

Cluster API: Kubernetes Managing Kubernetes

Cluster API (cluster-sigs/cluster-api) flipped the usual model: instead of using tools like Terraform or Ansible to provision Kubernetes clusters, Cluster API lets you manage Kubernetes clusters as Kubernetes resources — using a management cluster to provision and manage workload clusters.

# Create a new Kubernetes cluster as a Kubernetes resource
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: workload-cluster-prod
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: workload-cluster-prod
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: workload-cluster-prod-control-plane

Cluster API reconciliation handles cluster provisioning, scaling, upgrades, and deletion — all through the Kubernetes API, with all the tooling (RBAC, audit logging, GitOps integration) that entails. Multi-cluster platform teams could now manage hundreds of workload clusters from a single management cluster.

Kubernetes 1.28 — Sidecar Containers Alpha (August 2023)

Sidecar containers had been a Kubernetes pattern since 2015 — a helper container in the same pod as the main application. But there was no native sidecar lifecycle management. Sidecars were just regular init containers or additional containers, which meant:
– Init container sidecars ran before the application and had to block until they succeeded
– Regular container sidecars had no ordering guarantees at startup
– At pod termination, sidecars could die before the application finished draining

1.28 introduced native sidecar support: a new restartPolicy field for init containers:

spec:
  initContainers:
  - name: log-collector
    image: fluentbit:latest
    restartPolicy: Always    # This makes it a sidecar
    # Starts before main containers, stays running, stops after main containers exit
  containers:
  - name: application
    image: myapp:latest

A sidecar container (init container with restartPolicy: Always):
– Starts before application containers
– Stays running throughout the pod lifecycle
– Terminates automatically after all main containers exit
– Restarts if it crashes (unlike regular init containers)

This solved the service mesh sidecar problem: Istio and Linkerd injected Envoy proxies as regular containers, leading to race conditions where the proxy hadn’t started when the application tried to make outbound connections. Native sidecar lifecycle guarantees the proxy is ready before the application starts.

Also in 1.28:
– Retroactive default StorageClass assignment: Existing PVCs without a StorageClass assignment get the default applied retroactively — useful for migrations
– Non-graceful node shutdown stable: Handle node power failures without manual pod cleanup
– Recovery from volume expansion failure: Previously, a failed volume expansion left the PVC in a broken state; 1.28 introduced a mechanism to recover

AI/ML Workloads Force New Kubernetes Capabilities

The LLM wave of 2023 drove GPU workloads onto Kubernetes at a scale and urgency the project hadn’t anticipated. Running LLM inference on Kubernetes required solving problems that CPU-centric cluster scheduling hadn’t encountered:

GPU topology awareness: Inference across multiple GPUs requires GPUs connected by NVLink or on the same PCIe switch, not arbitrary GPUs from different nodes or different PCIe buses. The Dynamic Resource Allocation API (1.26 alpha) was designed exactly for this.

Fractional GPU allocation: NVIDIA’s time-slicing and MIG (Multi-Instance GPU) allow multiple pods to share a single GPU. The GPU operator (NVIDIA) manages this at the node level:

# Check GPU resources visible to Kubernetes
kubectl get nodes -o custom-columns=\
  "NODE:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
# NODE       GPU
# gpu-node-1   8
# gpu-node-2   8

Batch scheduling for training jobs: Training runs require all workers to start simultaneously — a single missing GPU makes the entire job stall. The Kubernetes Job API doesn’t guarantee this. Projects like Volcano (CNCF incubating) and Kueue (Kubernetes SIG Scheduling) added gang scheduling: a job only starts when all requested resources are available.

# Kueue: queue AI training jobs with resource quotas
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: gpu-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["nvidia.com/gpu", "cpu", "memory"]
    flavors:
    - name: a100-80gb
      resources:
      - name: nvidia.com/gpu
        nominalQuota: 16

Kubernetes 1.29 — Sidecar to Beta, Load Balancer IP Mode (December 2023)

Sidecar containers beta: The lifecycle semantics were refined based on 1.28 alpha feedback
Load balancer IP mode alpha: Distinguish between load balancers that use virtual IPs (kube-proxy handles the traffic) vs. those that handle traffic directly (no need for kube-proxy rules) — important for eBPF-based load balancers
ReadWriteOncePod volume access stable

Kubernetes 1.30 — Structured Authorization Config (April 2024)

Structured authorization configuration beta: Define multiple authorization webhooks with explicit ordering, failure modes, and connection settings — replacing the flat --authorization-mode flag
Sidecar containers beta continues
Node memory swap support beta: Allow pods to use swap memory — controversial but necessary for workloads with bursty memory patterns that prefer using swap over OOM kill

# Node with swap enabled — kubelet config
kind: KubeletConfiguration
memorySwap:
  swapBehavior: LimitedSwap

The swap support feature reversed a long-standing Kubernetes hard stance: swap was disabled since 1.0 because its interaction with Kubernetes memory accounting was unpredictable. The 1.30 approach adds proper accounting and policies.

Kubernetes 1.31 — Cloud Provider Code Removal Complete (August 2024)

1.31 marked the completion of the cloud provider code removal — the 1.5 million line migration that had been running since 1.26. Core binaries are 40% smaller. The API server, controller manager, and scheduler no longer contain vendor-specific code.

Also in 1.31:
– Persistent Volume health monitor stable
– AppArmor support stable: AppArmor profiles for pods using the native Kubernetes field (not annotations)
– Traffic distribution for Services beta: Express topology preferences for Service routing (prefer local node, prefer same zone)

# Traffic distribution: prefer endpoints in the same zone
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  trafficDistribution: PreferClose
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

Kubernetes 1.32 — Sidecar Stable, DRA Beta (December 2024)

Sidecar containers stable: After nearly a decade of workarounds, the sidecar pattern is a first-class Kubernetes primitive
Dynamic Resource Allocation beta: GPU and specialized hardware scheduling ready for production evaluation
Job API improvements: Success and failure policies for indexed jobs — granular control over batch workload behavior
Custom Resource field selectors: Filter CRDs on arbitrary fields — making large CRD-based systems more efficient to query

Crossplane: Kubernetes as the Control Plane for Everything

Crossplane (CNCF graduated) extended the Kubernetes API model beyond the cluster itself. Using CRDs and controllers, Crossplane lets you manage cloud resources (RDS databases, S3 buckets, VPCs, IAM roles) as Kubernetes resources — provisioned, updated, and deleted through the Kubernetes API.

# Crossplane: provision an RDS PostgreSQL instance as a Kubernetes resource
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
  name: production-db
spec:
  forProvider:
    region: us-east-1
    dbInstanceClass: db.r6g.xlarge
    masterUsername: admin
    engine: postgres
    engineVersion: "15"
    allocatedStorage: 100
    multiAZ: true
  writeConnectionSecretsToRef:
    name: production-db-credentials
    namespace: production

For platform teams, Crossplane means a single control plane — the Kubernetes API — for both compute workloads and cloud infrastructure. GitOps tools (Flux, ArgoCD) manage both.

Key Takeaways

GitOps (Flux, ArgoCD) became the production deployment standard — not for ideological reasons, but because the audit trail, drift detection, and self-healing properties solve real operational and compliance problems
Cluster API made Kubernetes cluster lifecycle (provisioning, upgrades, deletion) a Kubernetes-native operation — the same API, tooling, and audit trail
Native sidecar containers (1.28 alpha → 1.32 stable) finally resolved the lifecycle ordering problem that service meshes and log collectors had worked around for years
AI/ML workloads drove new scheduling capabilities (DRA, gang scheduling via Kueue/Volcano) and made GPU topology awareness a first-class concern
Crossplane generalized the Kubernetes API model to cloud infrastructure — the cluster is now a control plane for everything, not just containers

What’s Next

← EP06: The Runtime Reckoning | EP08: Kubernetes Today →

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com