eBPF: From Kernel to Cloud, Episode 4
Earlier in this series: What Is eBPF? · The BPF Verifier · eBPF vs Kernel Modules
By Episode 3, we’d covered what eBPF is, why the verifier makes it safe for production, and why it’s replaced kernel modules for observability workloads. What we hadn’t answered — and what a 2am incident eventually forced — is what kind of eBPF programs are actually running on your nodes, and why the difference matters when something breaks.
A pod in production was dropping roughly one in fifty outbound TCP connections. Not all of them — just enough to cause intermittent timeouts in the application logs. NetworkPolicy showed egress allowed. Cilium reported no violations. Running curl manually from inside the pod worked every time.
I spent the better part of three hours eliminating possibilities. DNS. MTU. Node-level conntrack table exhaustion. Upstream firewall rules. Nothing.
Eventually, almost as an afterthought, I ran this:
sudo bpftool prog list
There were two TC programs attached to that pod’s veth interface. One from the current Cilium version. One from the previous version — left behind by a rolling upgrade that hadn’t cleaned up properly. Two programs. Different policy state. One was occasionally dropping packets based on rules that no longer existed in the current policy model.
The answer had been sitting in the kernel the whole time. I just didn’t know where to look.
That incident forced me to actually understand something I’d been hand-waving for two years: eBPF isn’t a single hook. It’s a family of program types, each attached to a different location in the kernel, each seeing different data, each suited for different problems. Understanding the difference is what separates “I run Cilium and Falco” from “I understand what Cilium and Falco are actually doing on my nodes” — and that difference matters when something breaks at 2am.
The Command You Should Run on Your Cluster Right Now
Before getting into the theory, do this:
# See every eBPF program loaded on the node
sudo bpftool prog list
# See every eBPF program attached to a network interface
sudo bpftool net list
On a node running Cilium and Falco, you’ll see something like this:
42: xdp name cil_xdp_entry loaded_at 2026-04-01T09:23:41
43: sched_cls name cil_from_netdev loaded_at 2026-04-01T09:23:41
44: sched_cls name cil_to_netdev loaded_at 2026-04-01T09:23:41
51: cgroup_sock_addr name cil_sock4_connect loaded_at 2026-04-01T09:23:41
88: raw_tracepoint name sys_enter loaded_at 2026-04-01T09:23:55
89: raw_tracepoint name sys_exit loaded_at 2026-04-01T09:23:55
Each line is a different program type. Each one fires at a different point in the kernel. The type column — xdp, sched_cls, raw_tracepoint, cgroup_sock_addr — tells you where in the kernel execution path that program is attached and therefore what it can and cannot see.
If you see more programs than you expect on a specific interface — like I did — that’s your first clue.
Why Program Types Exist
The Linux kernel isn’t a single pipeline. Network packets, system calls, file operations, process scheduling — these all run through different subsystems with different execution contexts and different available data.
eBPF lets you attach programs to specific points within those subsystems. The “program type” is the contract: it defines where the hook fires, what data the program receives, and what it’s allowed to do with it. A program designed to process network packets before they hit the kernel stack looks completely different from one designed to intercept system calls across all containers simultaneously.
Most of us will interact with four or five program types through the tools we already run. Understanding what each one actually is — where it sits, what it sees — is what makes you effective when those tools behave unexpectedly.
The Types Behind the Tools You Already Use
TC — Why Cilium Can Tell Which Pod Sent a Packet
TC stands for Traffic Control. It’s where Cilium enforces your NetworkPolicy, and it’s what caused my incident.
TC programs attach to network interfaces — specifically to the ingress and egress directions of the pod’s virtual interface (lxcXXXXX in Cilium’s naming). They fire after the kernel has already processed the packet enough to know its context: which socket created it, which cgroup that socket belongs to. Cgroup maps to container, container maps to pod.
This is the critical piece: TC is how Cilium knows which pod a packet belongs to. Without that cgroup context, per-pod policy enforcement isn’t possible.
# See TC programs on a pod's veth interface
sudo tc filter show dev lxc12345 ingress
sudo tc filter show dev lxc12345 egress
# If you see two entries on the same direction — that's the incident I described
# The priority number (pref 1, pref 2) tells you the order they run
When there are two TC programs on the same interface, the first one to return “drop” wins. The second program never runs. This is why the issue was intermittent rather than consistent — the stale program only matched specific connection patterns.
Fixing it is straightforward once you know what to look for:
# Remove a stale TC filter by its priority number
sudo tc filter del dev lxc12345 egress pref 2
Add this check to your post-upgrade runbook. Cilium upgrades are generally clean but not always.
XDP — Why Cilium Doesn’t Use TC for Everything
If TC is good enough for pod-level policy, why does Cilium also run an XDP program on the node’s main interface? Look at the bpftool prog list output again — there’s an xdp program loaded alongside the TC programs.
XDP fires earlier. Much earlier. Before the kernel allocates any memory for the packet. Before routing. Before connection tracking. Before anything.
The tradeoff is exactly what you’d expect: XDP is fast but context-poor. It sees raw packet bytes. It doesn’t know which pod the packet came from. It can’t read cgroup information because no socket buffer has been allocated yet.
Cilium uses XDP specifically for ClusterIP service load balancing — when a packet arrives at the node destined for a service VIP, XDP rewrites the destination to the actual pod IP in a single map lookup and sends it on its way. No iptables. No conntrack. The work is done before the kernel stack is involved.
There’s a silent failure mode worth knowing about here. XDP runs in one of two modes:
- Native mode — runs inside the NIC driver itself, before any kernel allocation. This is where the performance comes from.
- Generic mode — fallback when the NIC driver doesn’t support XDP. Runs later, after
sk_buffallocation. No performance benefit over iptables.
If your NIC doesn’t support native XDP, Cilium silently falls back to generic mode. The policy still works — but the performance characteristics you assumed aren’t there.
# Check which XDP mode is active on your node's main interface
ip link show eth0 | grep xdp
# xdpdrv ← native mode (fast)
# xdpgeneric ← generic mode (no perf benefit)
Most cloud provider instance types with modern Mellanox/Intel NICs support native mode. Worth verifying rather than assuming.
Tracepoints — How Falco Sees Every Container
Falco loads two programs: sys_enter and sys_exit. These are raw tracepoints — they fire on every single system call, from every process, in every container on the node.
Tracepoints are explicitly defined and maintained instrumentation points in the kernel. Unlike hooks that attach to specific internal function names (which can be renamed or inlined between kernel versions), tracepoints are stable interfaces. They’re part of the kernel’s public contract with tooling that wants to instrument it.
This matters operationally. When you patch your nodes — and cloud-managed nodes get patched frequently — tools built on tracepoints keep working. Tools built on kprobes (internal function hooks) may silently stop firing if the function they’re attached to gets renamed or inlined by the compiler in a new kernel build.
# Verify what Falco is actually using
sudo bpftool prog list | grep -E "kprobe|tracepoint"
# Falco's current eBPF driver should show raw_tracepoint entries
# If you see kprobe entries from Falco, you're on the older driver
# Check: falco --version and the driver being loaded at startup
If you’re running Falco on a cluster that gets regular OS patch upgrades and you haven’t verified the driver mode, check it. The older kprobe-based driver has a real failure mode on certain kernel versions.
LSM — How Tetragon Blocks Operations at the Kernel Level
LSM hooks run at the kernel’s security decision points: file opens, socket connections, process execution, capability checks. The defining characteristic is that they can deny an operation. Return an error from an LSM hook and the kernel refuses the syscall before it completes.
This is qualitatively different from observability hooks. kprobes and tracepoints watch. LSM hooks enforce.
When you see Tetragon configured to kill a process attempting a privileged operation, or block a container from writing to a specific path, that’s an LSM hook making the decision inside the kernel — not a sidecar watching traffic, not an admission webhook running before pod creation, not a userspace agent trying to act fast enough. The enforcement is in the kernel itself.
# See if any LSM eBPF programs are active on the node
sudo bpftool prog list | grep lsm
# Verify LSM eBPF support on your kernel (required for Tetragon enforcement mode)
grep CONFIG_BPF_LSM /boot/config-$(uname -r)
# CONFIG_BPF_LSM=y ← required
The Practical Summary
| What’s happening on your node | Program type | Where to look |
|---|---|---|
| Cilium service load balancing | XDP | ip link show eth0 \| grep xdp |
| Cilium pod network policy | TC (sched_cls) |
tc filter show dev lxcXXXX egress |
| Falco syscall monitoring | Tracepoint | bpftool prog list \| grep tracepoint |
| Tetragon enforcement | LSM | bpftool prog list \| grep lsm |
| Anything unexpected | All types | bpftool prog list, bpftool net list |
The Incident, Revisited
Three hours of debugging. The answer was a stale TC program sitting at priority 2 on a pod’s veth interface, left behind by an incomplete Cilium upgrade.
# What I should have run first
sudo bpftool net list
sudo tc filter show dev lxc12345 egress
Two commands. Thirty seconds. If I’d known that TC programs can stack on the same interface, I’d have started there.
That’s the point of understanding program types — not to write eBPF programs yourself, but to know where to look when the tools you depend on don’t behave the way you expect. The programs are already there, running on your nodes right now. bpftool prog list shows you all of them.
Key Takeaways
bpftool prog listandbpftool net listshow every eBPF program on a node — run these before anything else when debugging eBPF-based tool behavior- TC programs can stack on the same interface; stale programs from incomplete Cilium upgrades cause intermittent drops — check
tc filter showafter every Cilium upgrade - XDP runs before the kernel stack — fastest hook, but no pod identity; Cilium uses it for service load balancing, not pod policy
- XDP silently falls back to generic mode on unsupported NICs — verify with
ip link show | grep xdp - Tracepoints are stable across kernel versions; kprobe-based tools may silently break after node OS patches — verify your Falco driver mode
- LSM hooks enforce at the kernel level — this is what makes Tetragon’s enforcement mode fundamentally different from sidecar-based approaches
What’s Next
Every eBPF program fires, does its work, and exits — but the work always involves data. Counting connections. Tracking processes. Streaming events to a detection engine. In EP05, I’ll cover eBPF maps: the persistent data layer that connects kernel programs to the tools consuming their output. Understanding maps explains a class of production issues — and makes bpftool map dump useful rather than cryptic.