How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Reading Time: 6 minutes

The Identity Stack, Episode 9
EP08: FreeIPAEP09EP10: SAML/OIDC → …

Focus Keyphrase: Active Directory LDAP
Search Intent: Informational
Meta Description: Active Directory is LDAP + Kerberos + DNS + Group Policy — all tightly integrated. Here’s how AD replication, Sites, GPO, and Linux domain join actually work. (160 chars)


TL;DR

  • Active Directory is not a product that happens to use LDAP — it is an LDAP directory with a Microsoft-extended schema, a built-in Kerberos KDC, and DNS tightly integrated
  • Replication uses USNs (Update Sequence Numbers) and GUIDs — the Knowledge Consistency Checker (KCC) automatically builds the replication topology
  • Sites and site links tell AD which DCs are physically close — AD prefers to authenticate users against a DC in the same site to minimize WAN latency
  • Group Policy Objects (GPOs) are stored as LDAP entries (in the CN=Policies container) and Sysvol files — LDAP tells clients which GPOs apply; Sysvol delivers the policy files
  • Linux joins AD via realm join (uses adcli + SSSD) or net ads join (Samba + winbind) — both register a machine account in AD and get a Kerberos keytab
  • The difference between Linux in AD and Linux in FreeIPA: AD is optimized for Windows; FreeIPA is optimized for Linux — both interoperate

The Big Picture: What AD Actually Is

Active Directory Domain: corp.com
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  LDAP directory          Kerberos KDC                      │
│  ─────────────           ──────────                        │
│  Schema: 1000+ classes   Realm: CORP.COM                   │
│  Objects: users, groups, Issues TGTs + service tickets     │
│  computers, GPOs, OUs    Uses LDAP as the account DB       │
│                                                            │
│  DNS                     Sysvol (DFS share)                │
│  ────                    ────────────────                  │
│  SRV records for KDC     GPO templates                     │
│  and LDAP discovery      Login scripts                     │
│                          Replicated via DFSR               │
│                                                            │
│  Replication engine: USN + GUID + KCC                      │
└────────────────────────────────────────────────────────────┘
          │ replicates to          │ replicates to
          ▼                        ▼
   DC: dc02.corp.com        DC: dc03.corp.com

EP08 showed FreeIPA as the Linux-native answer to enterprise identity. AD is the Microsoft answer — and because most enterprises run Windows clients, understanding AD is unavoidable for Linux infrastructure engineers. This episode goes behind the LDAP and Kerberos protocols to explain what makes AD specifically work.


The AD Schema: LDAP With 1000+ Object Classes

AD’s schema extends the base LDAP schema with Microsoft-specific classes and attributes. Every user object is a user class (which extends organizationalPerson which extends person which extends top) with additional attributes like:

sAMAccountName   ← the pre-Windows 2000 login name (vamshi)
userPrincipalName ← the modern UPN ([email protected])
objectGUID       ← a globally unique 128-bit identifier (never changes, even if DN changes)
objectSid        ← Windows Security Identifier (used for ACL enforcement on Windows)
whenCreated      ← creation timestamp
pwdLastSet       ← password change timestamp
userAccountControl ← bitmask: disabled, locked, password never expires, etc.
memberOf         ← back-link: groups this user belongs to

objectGUID is the authoritative identifier in AD — not the DN. When a user is renamed or moved to a different OU, the GUID stays the same. Applications that store a user’s DN will break on rename; applications that store the GUID won’t.

userAccountControl is the bitmask that controls account state:

Flag          Value   Meaning
ACCOUNTDISABLE  2     Account disabled
LOCKOUT         16    Account locked out
PASSWD_NOTREQD  32    Password not required
NORMAL_ACCOUNT  512   Normal user account (set on almost all accounts)
DONT_EXPIRE_PASSWD 65536  Password never expires
# Query AD from a Linux machine
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName userPrincipalName objectGUID memberOf userAccountControl

Replication: USN + GUID + KCC

AD replication is multi-master — every DC accepts writes. The replication engine uses:

USN (Update Sequence Number) — a per-DC counter that increments on every local write. Each attribute in the directory stores the USN at which it was last modified (uSNChanged, uSNCreated). When DC-A replicates to DC-B, DC-B asks: “give me everything you’ve changed since the last USN I saw from you.”

GUID — each object has a globally unique identifier. If the same attribute is modified on two DCs before replication (a conflict), the conflict is resolved: last-writer-wins at the attribute level, based on the modification timestamp. If timestamps are equal, the attribute value from the DC with the lexicographically higher GUID wins.

KCC (Knowledge Consistency Checker) — a component that runs on every DC and automatically constructs the replication topology. You don’t configure which DCs replicate to which — the KCC builds a minimum spanning tree that ensures every DC is connected to every other within a set number of hops. You configure Sites and site links; the KCC does the rest.

# Check replication status from a Linux machine (requires rpcclient or adcli)
# Or on the DC: repadmin /showrepl (Windows tool)

# Simulate: query the highestCommittedUSN from a DC
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "" -s base highestCommittedUSN

Sites are AD’s concept of physical network topology. A site is a set of IP subnets with high-bandwidth connectivity between them. Site links represent the WAN connections between sites.

Site: Mumbai              Site: Hyderabad
┌────────────────┐        ┌────────────────┐
│ DC: dc-mum-01  │        │ DC: dc-hyd-01  │
│ DC: dc-mum-02  │        │ DC: dc-hyd-02  │
│ subnet: 10.1/16│        │ subnet: 10.2/16│
└───────┬────────┘        └────────┬───────┘
        │                          │
        └──── Site Link ───────────┘
              Cost: 100
              Replication interval: 15 min

When a user in Mumbai authenticates, AD’s KDC locates a DC in the same site using DNS SRV records. The SRV records include the site name in the service name: _ldap._tcp.Mumbai._sites.dc._msdcs.corp.com. SSSD and Windows clients query site-local SRV records first.

If no DC is available in the local site, authentication falls back to a DC in another site across the WAN link. Configuring sites correctly prevents remote authentication failures from killing local operations.


Group Policy: LDAP + Sysvol

GPOs are stored in two places:

LDAP — the CN=Policies,CN=System,DC=corp,DC=com container holds GPO metadata objects. Each GPO has a GUID, a display name, and version numbers. The gPLink attribute on OUs and the domain root links GPOs to where they apply.

Sysvol — the actual policy templates and scripts live in \\corp.com\SYSVOL\corp.com\Policies\{GPO-GUID}\. Sysvol is a DFS-R (Distributed File System Replication) share replicated to every DC.

When a Windows client applies Group Policy:
1. LDAP query: what GPOs are linked to my OU chain?
2. Sysvol fetch: download the policy templates from the GPO’s Sysvol path
3. Apply: process Registry settings, Security settings, Scripts

Linux clients don’t process GPOs natively. The adcli and sssd tools interpret a small subset of AD policy (password policy, account lockout) via LDAP. Full GPO processing on Linux requires Samba’s samba-gpupdate or third-party tools.


Joining Linux to AD

# Install required packages
dnf install -y realmd sssd adcli samba-common

# Discover the domain
realm discover corp.com
# corp.com
#   type: kerberos
#   realm-name: CORP.COM
#   domain-name: corp.com
#   configured: no
#   server-software: active-directory
#   client-software: sssd

# Join
realm join corp.com -U Administrator
# Prompts for Administrator password
# Creates machine account in AD
# Configures sssd.conf, krb5.conf, nsswitch.conf, pam.d automatically

# Verify
realm list
id [email protected]

What the join does:

  1. Creates a machine account HOSTNAME$ in CN=Computers,DC=corp,DC=com
  2. Sets a machine password (rotated automatically by SSSD)
  3. Retrieves a Kerberos keytab to /etc/krb5.keytab
  4. Configures SSSD with id_provider = ad, auth_provider = ad
  5. Updates /etc/nsswitch.conf to include sss
  6. Updates /etc/pam.d/ to include pam_sss

After joining, SSSD uses the machine’s Kerberos keytab to authenticate to the DC and query LDAP — no hardcoded service account credentials required.


LDAP Queries Against AD from Linux

# Find a user (after kinit or with -w password)
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName mail memberOf

# Find all members of a group
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(cn=engineers)" \
  member

# Find all AD-joined Linux machines
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(&(objectClass=computer)(operatingSystem=*Linux*))" \
  cn operatingSystem lastLogonTimestamp

# Find disabled accounts
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(userAccountControl:1.2.840.113556.1.4.803:=2)" \
  sAMAccountName

The last filter uses an LDAP extensible match (1.2.840.113556.1.4.803 is the OID for bitwise AND). userAccountControl:1.2.840.113556.1.4.803:=2 means “entries where userAccountControl AND 2 equals 2” — i.e., the ACCOUNTDISABLE bit is set. This is a Microsoft AD extension not in standard LDAP.


⚠ Common Misconceptions

“AD is just Microsoft’s LDAP.” AD is LDAP + Kerberos + DNS + DFS-R + GPO, all tightly integrated and with a schema that the Microsoft ecosystem depends on. You can query AD with standard ldapsearch. You cannot replace it with OpenLDAP without breaking every Windows client.

“Linux machines in AD get GPO.” Linux machines appear in AD and can be organized into OUs. Standard GPOs don’t apply to them. Samba’s samba-gpupdate can process a subset of AD policy for Linux — mostly Registry and Security settings mapped to Linux equivalents.

“realm leave removes the machine cleanly.” realm leave removes local configuration but does not delete the machine account from AD. The stale computer object stays in CN=Computers until an AD admin deletes it. Always run realm leave && adcli delete-computer -U Administrator for a clean removal.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management AD is the dominant enterprise identity store — understanding its LDAP structure, Kerberos realm, and GPO model is essential for IAM in mixed environments
CISSP Domain 4: Communications and Network Security AD replication traffic (RPC, LDAP, Kerberos) is a significant portion of enterprise WAN traffic — Sites and site links are a network security and performance design decision
CISSP Domain 3: Security Architecture and Engineering AD forest/domain/OU hierarchy is an architectural decision with long-term security consequences — getting OU structure wrong constrains GPO delegation for years

Key Takeaways

  • AD is LDAP + Kerberos + DNS + GPO + DFS-R — not a product that “uses” these; they’re the implementation
  • Replication is multi-master via USN + GUID; the KCC builds the topology automatically from Sites configuration
  • objectGUID is the stable identifier — not the DN, which changes on rename/move
  • realm join is the correct way to join Linux to AD — it configures SSSD, Kerberos, PAM, and NSS correctly in one command
  • userAccountControl is the bitmask that controls account state — (userAccountControl:1.2.840.113556.1.4.803:=2) finds disabled accounts

What’s Next

EP09 covered AD — LDAP and Kerberos inside the corporate network. EP10 covers what happens when identity needs to work across the internet, where Kerberos doesn’t reach: SAML, OAuth2, and OIDC — the protocols that let identity leave the building.

Next: SAML vs OIDC vs OAuth2: Which Protocol Handles Which Identity Problem

Get EP10 in your inbox when it publishes → linuxcent.com/subscribe

FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Reading Time: 5 minutes

The Identity Stack, Episode 8
EP07: LDAP HAEP08EP09: Active Directory → …

Focus Keyphrase: FreeIPA setup
Search Intent: Investigational
Meta Description: FreeIPA integrates 389-DS, MIT Kerberos, Dogtag PKI, and SSSD into one Linux identity stack. Here’s what it gives you and how to use it effectively. (153 chars)


TL;DR

  • FreeIPA is 389-DS (LDAP) + MIT Kerberos + Dogtag PKI + Bind DNS + SSSD — one ipa-server-install command gets you an enterprise identity platform
  • Host-Based Access Control (HBAC) lets you define centrally: which users can SSH to which hosts — no more managing /etc/security/access.conf per machine
  • Sudo rules from the directory: define sudo policy centrally, have every machine pull it — no /etc/sudoers.d/ files scattered across the fleet
  • ipa CLI is the management interface — ipa user-add, ipa group-add, ipa hbacrule-add — everything that took five LDAP commands takes one ipa command
  • FreeIPA trusts with Active Directory let Linux machines authenticate AD users without joining the AD domain
  • The right choice for Linux-centric environments; AD is the right choice when Windows clients dominate

The Big Picture: What FreeIPA Integrates

┌─────────────────────────────────────────────────────────┐
│                    FreeIPA Server                        │
│                                                         │
│  389-DS (LDAP)    MIT Kerberos    Dogtag PKI            │
│  ─────────────    ───────────     ─────────             │
│  User/group       TGT + service   Machine certs         │
│  storage          ticket issuing  User certs             │
│                                   OCSP / CRL            │
│  Bind DNS         SSSD (client)   Apache (WebUI)        │
│  ──────────       ────────────    ──────────────        │
│  SRV records      Enrollment      Management UI         │
│  for KDC/LDAP     automation      REST API              │
└─────────────────────────────────────────────────────────┘
              ▲                  ▲
              │ enrollment       │ SSH + sudo rules
   ┌──────────┴──────────┐  ┌───┴──────────────────┐
   │  Linux client        │  │  Linux client         │
   │  (ipa-client-install)│  │  (ipa-client-install) │
   └─────────────────────┘  └──────────────────────┘

EP06 and EP07 built OpenLDAP from components. FreeIPA gives you all of that plus Kerberos, PKI, DNS, and HBAC — opinionated, integrated, and managed through a single CLI and WebUI. This episode shows what you actually get from it.


Why FreeIPA Instead of Bare OpenLDAP

Running bare OpenLDAP requires you to:
– Configure schema for POSIX accounts, SSH keys, sudo rules, HBAC manually
– Set up MIT Kerberos separately and integrate it with LDAP
– Build your own PKI for machine certificates
– Maintain DNS SRV records for Kerberos discovery
– Write client enrollment scripts
– Build a management interface (or live in LDIF)

FreeIPA does all of this in one installer, with a consistent data model across all components. The trade-off is opacity — FreeIPA makes decisions for you (schema, replication topology, Kerberos realm name) that bare OpenLDAP leaves to you.


Installing FreeIPA Server

# RHEL / Rocky / AlmaLinux
dnf install -y freeipa-server freeipa-server-dns

# Run the installer (interactive)
ipa-server-install

# Or non-interactive:
ipa-server-install \
  --realm=CORP.COM \
  --domain=corp.com \
  --ds-password=DM_password \
  --admin-password=Admin_password \
  --setup-dns \
  --forwarder=8.8.8.8 \
  --unattended

# After install: get an admin Kerberos ticket
kinit admin

The installer creates:
– 389-DS instance with the FreeIPA schema
– MIT KDC with realm CORP.COM
– Dogtag CA and all certificate infrastructure
– Bind DNS with SRV records for the KDC and LDAP server
– Apache WebUI at https://ipa.corp.com/ipa/ui/
– SSSD configured on the server itself

Time: 5–10 minutes. What used to take a week of manual configuration.


The ipa CLI

Every management action goes through ipa. It talks to the IPA server’s REST API and handles Kerberos authentication transparently (it uses your kinit session).

# Users
ipa user-add vamshi \
  --first=Vamshi --last=Krishna \
  [email protected] \
  --password

ipa user-show vamshi
ipa user-find --all              # search all users
ipa user-disable vamshi          # lock account without deleting
ipa user-mod vamshi --shell=/bin/zsh

# Groups
ipa group-add engineers --desc "Engineering team"
ipa group-add-member engineers --users=vamshi,alice

# Password policy
ipa pwpolicy-mod --minlength=12 --maxlife=90 --history=10

# SSH public keys — stored centrally, pushed to every host
ipa user-mod vamshi --sshpubkey="ssh-ed25519 AAAA..."
# SSSD on enrolled hosts will use this key for SSH login — no authorized_keys file needed

Host-Based Access Control (HBAC)

HBAC is the feature that justifies FreeIPA for most Linux shops. It lets you define centrally: which users (or groups) can log in to which hosts (or host groups), using which services (SSH, sudo, FTP).

Without HBAC, access control is per-machine: /etc/security/access.conf or PAM pam_access rules, replicated across every server, managed inconsistently.

With HBAC: one rule, enforced everywhere.

# Create host groups
ipa hostgroup-add production-servers --desc "Production Linux hosts"
ipa hostgroup-add-member production-servers --hosts=web01.corp.com,db01.corp.com

# Create user groups
ipa group-add sre-team
ipa group-add-member sre-team --users=vamshi,alice

# Create an HBAC rule
ipa hbacrule-add allow-sre-to-prod \
  --desc "SRE team can SSH to production"
ipa hbacrule-add-user allow-sre-to-prod --groups=sre-team
ipa hbacrule-add-host allow-sre-to-prod --hostgroups=production-servers
ipa hbacrule-add-service allow-sre-to-prod --hbacsvcs=sshd

# Test the rule before applying
ipa hbactest \
  --user=vamshi \
  --host=web01.corp.com \
  --service=sshd
# Access granted: True
# Matched rules: allow-sre-to-prod

SSSD on each enrolled host enforces the HBAC rules at login time by querying the IPA server. No per-machine configuration. Add a new server to the production-servers host group and the HBAC rules apply immediately.


Sudo Rules from the Directory

# Create a sudo rule
ipa sudorule-add allow-sre-sudo \
  --cmdcat=all \
  --desc "SRE team gets full sudo on production"
ipa sudorule-add-user allow-sre-sudo --groups=sre-team
ipa sudorule-add-host allow-sre-sudo --hostgroups=production-servers

# Or a scoped rule — only specific commands
ipa sudorule-add allow-service-restart
ipa sudocmdgroup-add service-commands
ipa sudocmd-add /usr/bin/systemctl
ipa sudocmdgroup-add-member service-commands --sudocmds="/usr/bin/systemctl"
ipa sudorule-add-allow-command allow-service-restart --sudocmdgroups=service-commands

On enrolled hosts, SSSD’s sssd_sudo responder pulls these rules and the sudo command evaluates them locally. No /etc/sudoers.d/ files. Central policy, local enforcement.


Enrolling a Client

# On the client machine
dnf install -y freeipa-client

ipa-client-install \
  --domain=corp.com \
  --server=ipa.corp.com \
  --realm=CORP.COM \
  --principal=admin \
  --password=Admin_password \
  --unattended

# What this does:
# 1. Registers the host in IPA as a machine principal
# 2. Retrieves a host Kerberos keytab (/etc/krb5.keytab)
# 3. Configures SSSD (sssd.conf, nsswitch.conf, pam.d)
# 4. Configures Kerberos (/etc/krb5.conf)
# 5. Optionally configures NTP and DNS

After enrollment: getent passwd vamshi returns the IPA user. SSH with an IPA password works. HBAC rules are enforced. Sudo rules from the directory apply. SSH public keys from the user’s IPA profile work without authorized_keys files.


FreeIPA Trust with Active Directory

In mixed environments (Linux servers + Windows clients), you can establish a trust between FreeIPA and AD without joining the Linux servers to the AD domain directly.

# On the IPA server (after installing ipa-server-trust-ad)
ipa-adtrust-install --netbios-name=CORP

# Establish the trust
ipa trust-add ad.corp.com \
  --admin=Administrator \
  --password \
  --type=ad

# AD users can now log in to IPA-enrolled Linux hosts
# They appear as: CORP.COM\username or [email protected]

Under the hood: FreeIPA acts as an SSSD-enabled Samba DC for the trust relationship. AD users’ Kerberos tickets from the AD KDC are accepted by the FreeIPA KDC, which maps them to POSIX attributes stored in IPA (or automatically generated via ID mapping).


⚠ Common Misconceptions

“FreeIPA is just OpenLDAP with a UI.” FreeIPA uses 389-DS (not OpenLDAP), adds a full Kerberos KDC, a certificate authority, DNS, HBAC enforcement, and sudo management — all with a consistent schema designed for these use cases. It’s an integrated identity platform, not a wrapper.

“HBAC rules replace firewall rules.” HBAC controls who can log in to a host at the authentication layer — not network access. A blocked HBAC rule means the SSH session is rejected after TCP connection. You still need firewall rules to block TCP access.

“FreeIPA replicas are identical.” FreeIPA uses 389-DS Multi-Supplier replication. All replicas accept reads and writes. But the CA is separate — only the initial server (and explicitly designated CA replicas) run the CA. If the CA goes down, certificate operations stop; authentication does not.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management FreeIPA is an enterprise IAM platform — HBAC, sudo policy, SSH key management, and certificate-based authentication are all IAM controls
CISSP Domain 3: Security Architecture and Engineering FreeIPA’s integrated CA enables certificate-based authentication for machines and users — a stronger authentication factor than passwords
CISSP Domain 1: Security and Risk Management Centralized HBAC and sudo policy reduces the attack surface of privilege escalation — no more inconsistent sudoers files that drift across the fleet

Key Takeaways

  • FreeIPA = 389-DS + MIT Kerberos + Dogtag PKI + Bind DNS — one installer, one management interface
  • HBAC rules define centrally who can SSH to which host groups — enforced by SSSD on every enrolled client, no per-machine config
  • Sudo rules from the directory replace scattered /etc/sudoers.d/ files — central policy, SSSD-enforced locally
  • ipa hbactest lets you verify access rules before a user hits a blocked login — use it before every policy change
  • For Linux-centric environments: FreeIPA. For Windows-dominant environments: AD. For mixed: FreeIPA trust with AD.

What’s Next

FreeIPA is the Linux answer to enterprise identity. EP09 covers the Microsoft answer — Active Directory — which extended LDAP and Kerberos into a complete enterprise platform with Group Policy, Sites, and a replication model built for global scale.

Next: How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Get EP09 in your inbox when it publishes → linuxcent.com/subscribe

LDAP High Availability: Load Balancing and Production Architecture

Reading Time: 6 minutes

The Identity Stack, Episode 7
EP06: OpenLDAPEP07EP08: FreeIPA → …

Focus Keyphrase: LDAP high availability
Search Intent: Informational
Meta Description: Design LDAP high availability for production: HAProxy load balancing, read/write split, connection pooling, monitoring with cn=monitor, and 389-DS at scale. (157 chars)


TL;DR

  • LDAP HA means multiple directory servers behind a load balancer — clients connect to a VIP, not to individual servers
  • Read/write split: all writes go to the provider, reads are distributed across consumers — the load balancer enforces this by routing on port or backend check
  • SSSD handles multi-server failover natively (ldap_uri accepts a comma-separated list) — for apps without built-in failover, HAProxy with health checks does the work
  • Connection pooling is critical at scale — nss_ldap and pam_ldap opened a new connection per login; SSSD maintains a pool; apps that use libldap directly must implement their own
  • cn=monitor is the built-in monitoring endpoint — exposes connection counts, operation rates, and backend stats readable via ldapsearch
  • 389-DS (Red Hat Directory Server) is the production choice for >1M entries — purpose-built for large directories with a dedicated replication engine

The Big Picture: Production LDAP Topology

         Clients (SSSD, apps, VPN concentrators)
                      │
              ┌───────▼───────┐
              │   HAProxy VIP  │   ← single endpoint, port 389/636
              │  10.0.0.10     │
              └───────┬───────┘
                      │
          ┌───────────┼───────────┐
          ▼           ▼           ▼
   ldap1.corp.com  ldap2.corp.com  ldap3.corp.com
   (Provider)      (Consumer)      (Consumer)
   Reads + Writes  Reads only      Reads only
          │           ▲               ▲
          └───────────┴───────────────┘
               SyncRepl replication

EP06 built a two-node replicated directory. This episode covers what happens when the directory becomes infrastructure — when it needs to survive a node failure, handle thousands of connections, and be monitored like any other critical service.


HAProxy for LDAP

HAProxy is the standard choice for LDAP load balancing. Unlike HTTP, LDAP is a stateful protocol — once a client binds, subsequent operations on that connection share the authenticated session. The load balancer must use connection persistence, not per-request routing.

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 50000

defaults
    mode tcp                  # LDAP is TCP, not HTTP
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    option tcplog

# ── LDAP read/write split ─────────────────────────────────────────────

# Writes → provider only
frontend ldap-write
    bind *:389
    default_backend ldap-provider

backend ldap-provider
    balance first                   # always use first available (provider)
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s rise 2 fall 3
    server ldap2 ldap2.corp.com:389 check inter 5s rise 2 fall 3 backup

# Reads → all nodes round-robin
frontend ldap-read
    bind *:3389                     # internal read port
    default_backend ldap-consumers

backend ldap-consumers
    balance roundrobin
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s
    server ldap2 ldap2.corp.com:389 check inter 5s
    server ldap3 ldap3.corp.com:389 check inter 5s

# LDAPS (TLS)
frontend ldaps
    bind *:636
    default_backend ldap-consumers-tls

backend ldap-consumers-tls
    balance roundrobin
    server ldap1 ldap1.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem
    server ldap2 ldap2.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem

The health check (tcp-check connect) just verifies TCP connectivity. For a more precise check — verifying that slapd is actually responding to LDAP requests — use a custom script that runs ldapsearch and checks the result code.


SSSD Multi-Server Failover

SSSD has native failover — no load balancer required for SSSD-based clients:

# /etc/sssd/sssd.conf
[domain/corp.com]
ldap_uri = ldap://ldap1.corp.com, ldap://ldap2.corp.com, ldap://ldap3.corp.com
# SSSD tries them in order; switches to next on failure
# Switches back to primary after ldap_recovery_interval (default: 30s)

# For AD, discovery via DNS SRV records is even better:
ad_server = _srv_
# SSSD queries _ldap._tcp.corp.com SRV records and gets all DCs automatically

SSSD monitors the connection health. If the current server becomes unreachable, it switches to the next in the list within seconds. Existing cached data keeps serving during the switchover. Clients using SSSD don’t need a load balancer for basic HA.


Connection Pooling

Every LDAP bind creates an authenticated session on the server. A server with connection limits (olcConnMaxPending, olcConnMaxPendingAuth in OLC) will reject new connections when those limits are hit.

The problem: applications that use libldap directly tend to open a new connection per operation. At 500 requests/second, that’s 500 new TCP connections, 500 binds, 500 TLS handshakes per second — a directory that can handle 5000 concurrent connections starts refusing new ones.

The solutions:

SSSD — handles this automatically. SSSD maintains one or a small number of persistent connections per domain and multiplexes all PAM/NSS queries through them.

Application-level pooling — frameworks like python-ldap with connection pooling, ldap3 with connection strategies, or dedicated middleware like 389-DS‘s Directory Proxy Server.

ldap_maxconnections in OpenLDAP — sets a hard limit. When hit, new connections block until existing ones close. Set this to something reasonable (olcConnMaxPending: 100 in OLC) so you get a controlled failure mode instead of unbounded queuing.


Monitoring with cn=monitor

OpenLDAP exposes live operational statistics via the cn=monitor database — a virtual LDAP subtree that reflects the server’s current state. Enable it:

# enable-monitor.ldif
dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/ldap
olcModuleLoad: back_monitor

dn: olcDatabase=monitor,cn=config
objectClass: olcDatabaseConfig
olcDatabase: monitor
olcAccess: to *
  by dn="cn=admin,dc=corp,dc=com" read
  by * none

Query it:

# Overall statistics
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=monitor" -s sub "(objectClass=*)" \
  monitorOpInitiated monitorOpCompleted

# Connection counts
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Connections,cn=monitor" -s one \
  monitorConnectionNumber

# Operations by type
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Operations,cn=monitor" -s one \
  monitorOpInitiated monitorOpCompleted

Useful metrics to export to Prometheus (via prometheus-openldap-exporter or similar):
monitorOpCompleted per operation type (bind, search, modify)
monitorConnectionNumber — current connection count
– Backend-specific: olmMDBEntries, olmMDBPagesMax, olmMDBPagesUsed


389-DS: LDAP at Scale

OpenLDAP is excellent for directories up to a few million entries. When you need:
– 10M+ entries
– High write throughput (more than a few hundred writes/second)
– Fine-grained replication filtering
– A dedicated web-based admin UI

…389-DS (Red Hat Directory Server, community edition) is the production answer. It’s what FreeIPA uses under the hood.

Key architectural differences from OpenLDAP:

Multi-supplier replication — 389-DS’s replication engine uses a dedicated changelog (stored in LMDB) and Change Sequence Numbers (CSNs) for conflict resolution. Multi-supplier (multi-master) replication is first-class, not a bolted-on feature.

Changelog — every change is written to a persistent changelog before being applied. This enables precise replication: a consumer can reconnect after a network partition and get exactly the changes it missed, rather than doing a full resync.

Plugin architecture — 389-DS functionality (replication, managed entries, DNA for automatic UID allocation, memberOf, password policy) is all implemented as plugins that can be enabled/disabled per directory instance.

# Install 389-DS
dnf install -y 389-ds-base

# Create a new instance
dscreate interactive
# — or use a template:
dscreate from-file /path/to/instance.inf

# Manage with dsctl
dsctl slapd-corp status
dsctl slapd-corp start
dsctl slapd-corp stop

# Admin with dsconf
dsconf slapd-corp backend suffix list
dsconf slapd-corp replication status -suffix "dc=corp,dc=com"

The dsconf replication status command gives a live view of replication lag across all suppliers and consumers — something OpenLDAP requires you to compute manually from contextCSN comparisons.


Global Catalog: Cross-Domain Search in AD

When your directory spans multiple AD domains in a forest, the Global Catalog solves a specific problem: a user in emea.corp.com needs to be found by an app that only knows corp.com.

Forest: corp.com
  ├── corp.com       → DC port 389    full directory: 500K entries
  ├── emea.corp.com  → DC port 389    full directory: 200K entries
  └── Global Catalog → GC port 3268  partial replica: 700K entries
                                       (not all attributes — just the most queried ones)

The GC replicates a subset of attributes from every domain in the forest. By default: cn, mail, sAMAccountName, userPrincipalName, memberOf, and about 150 others. Attributes marked with isMemberOfPartialAttributeSet in the schema are replicated to the GC.

If an application is configured to use port 3268 instead of 389, it’s using the GC — and it won’t see attributes not included in the partial attribute set. This surprises teams that add a custom attribute to AD and then wonder why their application can’t see it on 3268 but can on 389.


⚠ Production Gotchas

HAProxy TCP health checks don’t verify LDAP is responsive. A server can accept TCP connections but have slapd in a degraded state (database corruption, out-of-memory). Build a proper LDAP health check: a script that binds and searches a known entry and checks the result.

replication lag under write load. SyncRepl consumers can fall behind under sustained write load. Monitor the contextCSN difference between provider and consumers. If consumers are more than a few seconds behind, investigate the provider’s write throughput and the consumer’s processing speed.

Directory size and the MDB mapsize. LMDB requires a pre-configured maximum database size (olcDbMaxSize). If the database grows beyond this, slapd starts failing writes. Set it to 2–4x your expected data size and monitor olmMDBPagesUsed / olmMDBPagesMax.


Key Takeaways

  • HAProxy in TCP mode provides LDAP load balancing — use balance first for write routing (provider only), balance roundrobin for reads
  • SSSD has native failover via ldap_uri — for SSSD clients, a load balancer adds HA but isn’t strictly required
  • cn=monitor is the built-in OpenLDAP monitoring endpoint — export its counters to Prometheus for operational visibility
  • 389-DS is the right choice for >1M entries, high write throughput, or multi-supplier replication as a first-class feature
  • Global Catalog (port 3268/3269) is a partial replica of all AD domains — useful for forest-wide searches, but missing non-replicated attributes

What’s Next

EP07 covers the infrastructure layer. EP08 zooms out to FreeIPA — what you get when LDAP, Kerberos, DNS, PKI, and HBAC are integrated into a single Linux-native identity stack, and why most Linux shops running their own directory should be running FreeIPA instead of bare OpenLDAP.

Next: FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

OpenLDAP Setup and Replication: Running Your Own Directory

Reading Time: 5 minutes

The Identity Stack, Episode 6
EP01 → … → EP05: KerberosEP06EP07: LDAP HA → …

Focus Keyphrase: OpenLDAP setup
Search Intent: Navigational
Meta Description: Set up OpenLDAP with the MDB backend, configure it via cn=config (OLC), and wire up SyncRepl replication — a complete walkthrough for running your own directory. (162 chars)


TL;DR

  • OpenLDAP’s server process is slapd — the backend that stores data is MDB (LMDB), a memory-mapped B-tree that replaced the old Berkeley DB backend
  • Configuration lives in the directory itself: cn=config (OLC — Online Configuration) lets you modify slapd at runtime without restarting
  • SyncRepl is the replication protocol: a consumer subscribes to a provider and stays in sync via either polling (refreshOnly) or a persistent connection (refreshAndPersist)
  • Multi-Provider (formerly Multi-Master) lets multiple nodes accept writes — conflict resolution uses CSN (Change Sequence Number), last-writer-wins
  • The essential tools: slapd, ldapadd, ldapmodify, ldapsearch, slapcat, slaptest
  • Always build indexes on the attributes you search most — uid, cn, memberOf — or every search is a full scan

The Big Picture: slapd Architecture

ldapsearch / ldapadd / SSSD / any LDAP client
              │ TCP 389 / 636
              ▼
         ┌─────────────────────────────────┐
         │  slapd (OpenLDAP server)         │
         │                                 │
         │  Frontend (protocol layer)       │
         │    • parse BER requests          │
         │    • ACL enforcement             │
         │    • schema validation           │
         │                                 │
         │  Backend (storage layer)         │
         │    • MDB (LMDB) — default       │
         │    • memory-mapped file I/O      │
         │    • ACID transactions           │
         └────────────┬────────────────────┘
                      │
              /var/lib/ldap/
              data.mdb   (the directory data)
              lock.mdb   (LMDB lock file)

EP05 showed Kerberos in isolation. OpenLDAP is where you run the identity store that Kerberos references — and where SSSD looks up user and group attributes. This episode builds a working two-node replicated directory from scratch.


Installation

# Ubuntu / Debian
apt-get install -y slapd ldap-utils

# RHEL / Rocky / AlmaLinux
dnf install -y openldap-servers openldap-clients

# After install — Ubuntu runs a configuration wizard
# Skip it: dpkg-reconfigure slapd
# Or answer it and then switch to OLC management

On RHEL-family systems, slapd is not configured after install — you work entirely through OLC from the start.


OLC: The Directory Configures Itself

The old way was slapd.conf — a static file that required a full restart on every change. OLC (Online Configuration) replaced it: slapd‘s own configuration is stored as LDAP entries under cn=config. You modify configuration the same way you modify data — with ldapmodify. Changes take effect immediately.

cn=config                        ← root config entry
├── cn=schema,cn=config          ← schema definitions
│     ├── cn={0}core             ← core schema
│     ├── cn={1}cosine           ← RFC 1274 attributes
│     └── cn={2}inetorgperson    ← inetOrgPerson object class
├── olcDatabase={-1}frontend     ← default settings for all databases
├── olcDatabase={0}config        ← the config database itself
└── olcDatabase={1}mdb           ← your actual directory data
      ├── olcAccess              ← ACLs
      ├── olcSuffix              ← base DN (e.g., dc=corp,dc=com)
      └── olcDbIndex             ← search indexes

Everything under cn=config has attributes prefixed with olc (OpenLDAP Configuration). You query and modify it just like any other LDAP subtree — with one restriction: only the cn=config admin (usually gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth — the local root via SASL EXTERNAL) can write to it.


Bootstrapping a Directory

The quickest way to get a working directory is a set of LDIF files applied in order.

1. Load schemas

# Apply the schemas OpenLDAP ships with
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/inetorgperson.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/nis.ldif       # adds posixAccount, posixGroup

2. Configure the MDB database

# mdb-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=corp,dc=com
-
replace: olcRootDN
olcRootDN: cn=admin,dc=corp,dc=com
-
replace: olcRootPW
olcRootPW: {SSHA}hashed_password_here

Generate the hash: slappasswd -s yourpassword

ldapmodify -Y EXTERNAL -H ldapi:/// -f mdb-config.ldif

3. Add indexes

# indexes.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcDbIndex
olcDbIndex: uid eq,pres
olcDbIndex: cn eq,sub
olcDbIndex: sn eq,sub
olcDbIndex: mail eq
olcDbIndex: memberOf eq
olcDbIndex: entryCSN eq
olcDbIndex: entryUUID eq

The last two (entryCSN, entryUUID) are required for SyncRepl replication to work efficiently.

4. Load initial data

# base.ldif
dn: dc=corp,dc=com
objectClass: top
objectClass: dcObject
objectClass: organization
o: Corp
dc: corp

dn: ou=people,dc=corp,dc=com
objectClass: organizationalUnit
ou: people

dn: ou=groups,dc=corp,dc=com
objectClass: organizationalUnit
ou: groups

dn: uid=vamshi,ou=people,dc=corp,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
cn: Vamshi Krishna
sn: Krishna
uid: vamshi
uidNumber: 1001
gidNumber: 1001
homeDirectory: /home/vamshi
loginShell: /bin/bash
mail: [email protected]
userPassword: {SSHA}hashed_password_here
ldapadd -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" \
  -w adminpassword \
  -f base.ldif

ACLs: Who Can Read What

OpenLDAP ACLs are evaluated top-to-bottom; first match wins.

# acls.ldif — set via OLC
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcAccess
# Users can change their own passwords
olcAccess: to attrs=userPassword
  by self write
  by anonymous auth
  by * none
# Users can read their own entry
olcAccess: to dn.base="ou=people,dc=corp,dc=com"
  by self read
  by users read
  by * none
# Service accounts can read everything (for SSSD)
olcAccess: to *
  by dn="cn=svc-ldap,ou=services,dc=corp,dc=com" read
  by self read
  by * none

A service account (cn=svc-ldap) that SSSD uses to search the directory needs read access to ou=people and ou=groups. Never give SSSD admin (write) access.


SyncRepl Replication

SyncRepl is a pull-based replication protocol built on the LDAP Sync operation (RFC 4533). A consumer connects to a provider and requests changes. The provider sends them. The consumer stays in sync.

On the Provider: Enable the syncprov overlay

# syncprov.ldif
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
olcSpCheckpoint: 100 10     # checkpoint every 100 ops or 10 minutes
olcSpSessionLog: 100        # keep last 100 changes for delta-sync
ldapadd -Y EXTERNAL -H ldapi:/// -f syncprov.ldif

On the Consumer: Configure syncrepl

# consumer-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcSyncrepl
olcSyncrepl: rid=001
  provider=ldap://ldap1.corp.com:389
  bindmethod=simple
  binddn="cn=repl-svc,dc=corp,dc=com"
  credentials=replication-password
  searchbase="dc=corp,dc=com"
  scope=sub
  schemachecking=on
  type=refreshAndPersist    # persistent connection (vs refreshOnly = polling)
  retry="5 5 60 +"          # retry: 5 times every 5s, then every 60s forever
  interval=00:00:05:00      # (for refreshOnly) sync every 5 minutes
-
add: olcUpdateRef
olcUpdateRef: ldap://ldap1.corp.com   # redirect writes to provider

refreshAndPersist keeps a persistent connection open. Changes replicate within milliseconds. refreshOnly polls on an interval — simpler, but adds latency.

Verify Replication

# On provider: check the contextCSN (the sync state token)
ldapsearch -x -H ldap://ldap1.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# contextCSN: 20260427010000.000000Z#000000#000#000000

# On consumer: should match after sync
ldapsearch -x -H ldap://ldap2.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# Same CSN = in sync

Multi-Provider: Accepting Writes on Both Nodes

Standard SyncRepl has one provider and one or more consumers — only the provider accepts writes. Multi-Provider (formerly Multi-Master) lets every node accept writes.

# On each node — add mirrormode to the database config
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcMirrorMode
olcMirrorMode: TRUE

With mirrormode enabled and each node configured as both provider and consumer of the other, writes on either node replicate to the other. Conflict resolution is CSN-based (Change Sequence Number) — a monotonically increasing timestamp. Last write wins at the attribute level.

Multi-Provider does not prevent split-brain conflicts — if two clients write the same attribute on two different nodes during a network partition, the higher CSN wins when the partition heals. For most directory use cases (user passwords, group memberships), this is acceptable. For others, it requires careful thought.


⚠ Production Gotchas

MDB data file grows monotonically. LMDB never shrinks the data file automatically. Deleted entries leave free space inside the file that gets reused, but the file on disk doesn’t shrink. Use slapcat to export and slapadd to reimport if you need to reclaim disk space.

slapcat is the only safe backup. slapcat reads the MDB database directly and exports LDIF — it does not go through slapd. Run it while slapd is running (LMDB is MVCC-safe for readers), but never copy the raw MDB files while slapd is running.

Schema changes on a replicated directory require coordination. Load the new schema on the provider first. SyncRepl will propagate it to consumers — but if a consumer gets a new entry using the new schema before the schema itself is replicated, the import will fail. Load schemas manually on all nodes before adding entries that use them.


Key Takeaways

  • OpenLDAP uses LMDB (MDB backend) — a memory-mapped, ACID-compliant storage engine with no external dependency
  • OLC (cn=config) is the right way to configure slapd — changes apply without restarts
  • SyncRepl pulls changes from a provider to a consumer — refreshAndPersist for near-real-time, refreshOnly for poll-based
  • Always index uid, cn, entryCSN, and entryUUID — unindexed searches are full scans
  • Multi-Provider allows writes on all nodes with CSN-based last-write-wins conflict resolution

What’s Next

A single OpenLDAP server works. Two nodes with SyncRepl work better. EP07 goes further: how you put multiple LDAP servers behind a load balancer, how connection pooling works, what to monitor, and how 389-DS handles directories with tens of millions of entries.

Next: LDAP High Availability: Load Balancing and Production Architecture

Get EP07 in your inbox when it publishes → linuxcent.com/subscribe

How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Reading Time: 7 minutes

The Identity Stack, Episode 5
EP01EP02EP03EP04: SSSDEP05EP06: OpenLDAP → …

Focus Keyphrase: how Kerberos works
Search Intent: Informational
Meta Description: How Kerberos works: the KDC, ticket-granting tickets, and the three-step flow that lets enterprises authenticate without sending passwords on the wire. (157 chars)


TL;DR

  • Kerberos is a network authentication protocol — it proves identity without sending passwords over the network, using time-limited cryptographic tickets
  • Three actors: the client, the KDC (Key Distribution Center), and the service — the KDC issues tickets; clients use tickets to authenticate to services
  • The ticket flow: AS-REQ (get a TGT) → TGS-REQ (exchange TGT for a service ticket) → AP-REQ (present service ticket to the target service)
  • A TGT (Ticket-Granting Ticket) is a session credential — it lets you request service tickets without re-entering your password for the lifetime of the ticket (default 10 hours)
  • LDAP + Kerberos together: LDAP stores identity (who you are), Kerberos authenticates it (proves you are who you say you are) — Active Directory is exactly this combination
  • kinit, klist, kdestroy are the hands-on tools — run them and read the ticket output

The Big Picture: Three Actors, Three Steps

         1. AS-REQ / AS-REP
Client ◄────────────────────► AS (Authentication Server)
  │                                     │
  │    (part of KDC)                    │
  │                                     ▼
  │         2. TGS-REQ / TGS-REP   TGS (Ticket-Granting Server)
  ├───────────────────────────────────►│
  │         (part of KDC)              │
  │                                    │
  │    3. AP-REQ / AP-REP              │
  └─────────────────────────────► Service (SSH, LDAP, NFS, HTTP...)

KDC = AS + TGS (usually the same process, same machine)

EP04 mentioned Kerberos tickets and clock skew requirements without explaining the protocol. This episode explains why Kerberos was invented, what a ticket actually is, and how the three-step flow works — so that when SSSD says “KDC unreachable” or kinit fails with “pre-authentication required,” you know exactly what’s happening.


The Problem Kerberos Was Built to Solve

MIT’s Project Athena started in 1983 — a campus-wide computing initiative giving students access to thousands of workstations. The problem: how do you authenticate a student at workstation 847 to a file server across campus without sending their password over the network?

In 1988, Steve Miller and Clifford Neuman published Kerberos version 4. The core insight: a trusted third party (the KDC) can issue cryptographic proof that a user has authenticated, and that proof can be presented to any service on the network without the service ever seeing the user’s password.

The password never leaves the client machine after the initial authentication. Every subsequent authentication — to a different service, to the same service again — uses a ticket. The KDC knows both the client and the service. The client and service only need to trust the KDC.


Keys, Tickets, and Sessions

Before the protocol, the primitives:

Long-term keys — derived from passwords. When you set a password in Kerberos, it’s hashed into a key stored in the KDC database (in the krbtgt account on AD, in /var/lib/krb5kdc/principal on MIT Kerberos). The client also derives this key from the password at authentication time. Neither ever sends the raw password.

Session keys — temporary symmetric keys created by the KDC for a specific session. They’re valid for the ticket’s lifetime. After the ticket expires, the session key is useless.

Tickets — encrypted blobs issued by the KDC. A ticket contains the session key, the client identity, the expiry time, and optional flags. It’s encrypted with the target service’s long-term key — only the service can decrypt it. The client carries the ticket but can’t read the contents.


The Three-Step Flow

Step 1: AS-REQ / AS-REP — Getting a TGT

Client                        KDC (AS component)
  │                                │
  │── AS-REQ ──────────────────────►
  │   {username, timestamp}         │
  │   (timestamp encrypted with     │
  │    client's long-term key)       │
  │                                 │
  │   KDC verifies: decrypts        │
  │   timestamp with stored key.    │
  │   If valid → issues TGT         │
  │                                 │
  ◄── AS-REP ──────────────────────│
      {session_key_enc_with_client, │
       TGT_enc_with_krbtgt_key}     │

The client decrypts the session key using its long-term key (derived from the password). The TGT is encrypted with the KDC’s own key (krbtgt) — the client can’t read it, but carries it.

This is the step that requires the password. After this, the TGT is what the client uses for everything else.

Step 2: TGS-REQ / TGS-REP — Getting a Service Ticket

Client                        KDC (TGS component)
  │                                │
  │── TGS-REQ ─────────────────────►
  │   {TGT, authenticator,         │
  │    target_service_name}        │
  │   (authenticator encrypted      │
  │    with TGT session key)        │
  │                                 │
  │   KDC: decrypts TGT,           │
  │   verifies authenticator,       │
  │   issues service ticket         │
  │                                 │
  ◄── TGS-REP ────────────────────│
      {service_session_key_enc,    │
       service_ticket_enc_with_    │
       service_long_term_key}      │

No password involved. The client proves its identity by presenting the TGT (which only the KDC can issue) and an authenticator (a timestamp encrypted with the TGT’s session key, proving the client holds the session key without revealing it).

Step 3: AP-REQ / AP-REP — Authenticating to the Service

Client                        Service (sshd, LDAP, NFS...)
  │                                │
  │── AP-REQ ──────────────────────►
  │   {service_ticket,             │
  │    authenticator_enc_with_      │
  │    service_session_key}        │
  │                                 │
  │   Service: decrypts ticket      │
  │   with its long-term key,       │
  │   verifies authenticator        │
  │                                 │
  ◄── AP-REP (optional) ───────────│
      {mutual authentication}       │

The service decrypts the ticket using its own key. It extracts the client identity and session key. It verifies the authenticator. No communication with the KDC required — the service trusts what the KDC signed.


Why Clock Skew Matters

Every Kerberos authenticator contains a timestamp. The service rejects authenticators older than 5 minutes (by default) — this prevents replay attacks where an attacker captures an authenticator and replays it later.

This is why clock skew over 5 minutes breaks Kerberos authentication entirely. If your machine’s clock drifts 6 minutes from the KDC, every authenticator you generate is rejected as too old or too far in the future. No tickets. No AD logins. No SSSD authentication.

# Check time sync status
timedatectl status
chronyc tracking        # if using chrony
ntpq -p                 # if using ntpd

# If clock is off: force a sync
chronyc makestep        # immediate step correction (chrony)

Hands-On: kinit, klist, kdestroy

# Get a TGT (will prompt for password)
kinit [email protected]

# Show current tickets
klist
# Credentials cache: FILE:/tmp/krb5cc_1001
# Principal: [email protected]
#
# Valid starting     Expires            Service principal
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#   renew until 05/04/26 01:00:00

# Show encryption types used (the -e flag)
klist -e
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#         Etype: aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

# Get a service ticket for a specific service
kvno host/[email protected]
# host/[email protected]: kvno = 3

# Show all tickets including service tickets
klist -f
# Flags: F=forwardable, f=forwarded, P=proxiable, p=proxy, D=postdated,
#        d=postdated, R=renewable, I=initial, i=invalid, H=hardware auth

# Destroy all tickets
kdestroy

The Valid starting and Expires fields are the ticket lifetime. After expiry, you need to re-authenticate (or renew the ticket if it’s within the renew until window). The renew until date is when even renewal stops working.


/etc/krb5.conf

[libdefaults]
    default_realm = CORP.COM
    dns_lookup_realm = false
    dns_lookup_kdc = true         # find KDCs via DNS SRV records
    ticket_lifetime = 10h
    renew_lifetime = 7d
    forwardable = true            # tickets can be forwarded to remote hosts (needed for SSH forwarding)
    rdns = false

[realms]
    CORP.COM = {
        kdc = dc01.corp.com
        kdc = dc02.corp.com       # failover KDC
        admin_server = dc01.corp.com
    }

[domain_realm]
    .corp.com = CORP.COM
    corp.com = CORP.COM

With dns_lookup_kdc = true, Kerberos finds KDCs by querying DNS SRV records (_kerberos._tcp.corp.com). AD sets these up automatically. On MIT Kerberos, you add them manually. DNS-based discovery is the recommended approach for AD environments — it picks up new DCs automatically.


Kerberos + LDAP: Why Enterprises Run Both

LDAP and Kerberos solve different problems and are almost always deployed together:

LDAP answers:  "Who is vamshi? What groups is he in? What's his home directory?"
Kerberos answers: "Is this really vamshi? Prove it without sending a password."

Active Directory is exactly this combination — the directory is LDAP-based, the authentication is Kerberos. When a Linux machine joins an AD domain via realm join or adcli, it gets:
– LDAP access to the AD directory (for NSS: user and group lookups)
– A Kerberos principal registered in AD (for PAM: ticket-based authentication)
– A machine account (the machine’s identity in the directory)

When you SSH into an AD-joined Linux machine:
1. SSSD issues a Kerberos AS-REQ for the user’s TGT
2. SSSD uses the TGT to get a service ticket for the Linux machine’s PAM service
3. Authentication is verified via the service ticket — no LDAP Bind with a password
4. SSSD does an LDAP Search to get POSIX attributes (UID, GID, home dir)

Password-based LDAP Bind is the fallback when Kerberos isn’t available. Kerberos is the default on AD-joined systems — and it’s more secure because the password never leaves the client.


⚠ Common Misconceptions

“Kerberos sends your password to the KDC.” It doesn’t. The client derives a key from the password locally and uses that key to encrypt a timestamp (the pre-authentication data). The KDC verifies the timestamp using the stored key. The raw password never travels.

“Kerberos is an authorization protocol.” Kerberos authenticates — it proves who you are. Authorization (what you can do) is a separate decision, usually handled by ACLs on the service or directory group membership.

“Once you have a TGT, you’re authenticated to everything.” A TGT only proves your identity to the KDC. Each service requires a separate service ticket. The TGT is what lets you get those service tickets without re-entering your password.

“Kerberos requires AD.” MIT Kerberos 5 is a standalone implementation. FreeIPA (EP08) runs MIT Kerberos. Heimdal is another implementation. AD uses a Microsoft-extended version of Kerberos 5, but the core protocol is the same RFC.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management Kerberos is the de facto enterprise authentication protocol — SSO, delegation, and service account authentication all depend on it
CISSP Domain 4: Communications and Network Security Kerberos prevents credential sniffing and replay attacks — two of the core network authentication threat categories
CISSP Domain 3: Security Architecture and Engineering The KDC is a critical single point of trust — its availability, key management, and account (krbtgt) rotation are architectural security decisions

Key Takeaways

  • Kerberos is a ticket-based protocol — the password is used once to get a TGT; from then on, tickets prove identity without the password
  • The three-step flow: get a TGT from the AS, exchange it for a service ticket at the TGS, present the service ticket to the target service
  • Clock skew over 5 minutes breaks Kerberos — time synchronization is a hard dependency
  • LDAP stores identity; Kerberos authenticates it — Active Directory is exactly this combination, and so is FreeIPA
  • klist -e shows the encryption types in use — aes256-cts-hmac-sha1-96 is what you want to see; arcfour-hmac (RC4) is legacy and should be disabled

What’s Next

EP05 covered Kerberos as a protocol. EP06 goes hands-on: building a real LDAP directory with OpenLDAP, configuring replication, and understanding how the server-side components — slapd, the MDB backend, SyncRepl — fit together.

Next: OpenLDAP Setup and Replication: Running Your Own Directory

Get EP06 in your inbox when it publishes → linuxcent.com/subscribe

SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Reading Time: 7 minutes

The Identity Stack, Episode 4
EP01: What Is LDAPEP02: LDAP InternalsEP03: LDAP Auth on LinuxEP04EP05: Kerberos → …

Focus Keyphrase: SSSD Linux
Search Intent: Informational
Meta Description: SSSD powers every enterprise Linux login — but most engineers only interact with it when it breaks. Here’s the architecture, the config knobs that matter, and how to debug it. (185 chars — trim to: SSSD powers every enterprise Linux login. Here’s the architecture, the sssd.conf knobs that matter, and how to debug it when it breaks. (137 chars))
Meta Description (final): SSSD powers every enterprise Linux login. Here’s the architecture, the sssd.conf knobs that matter, and how to debug it when it breaks. (137 chars)


TL;DR

  • SSSD (System Security Services Daemon) is the caching and brokering layer between Linux and directory services — it handles LDAP, Kerberos, and AD so PAM and NSS don’t have to
  • Architecture: three tiers — responders (answer PAM/NSS queries), providers (talk to AD/LDAP/Kerberos), and a shared cache (LDB database on disk)
  • Credential caching means offline logins work — a user who authenticated yesterday can log in today even if the domain controller is unreachable
  • Key config: sssd.conf — the [domain] section is where almost all tuning happens
  • Debugging toolkit: sssctl, sss_cache, id, getent, journalctl -u sssd
  • The most common failure modes are: SSSD not running, stale cache, misconfigured ldap_search_base, and clock skew breaking Kerberos

The Big Picture: SSSD as the Identity Broker

PAM (pam_sss)         NSS (sss module)
      │                      │
      └──────────┬───────────┘
                 ▼
          SSSD Responders
          ┌────────────────────────────────────┐
          │  PAM responder   NSS responder      │
          │  (auth, account, (passwd, group,    │
          │   session)        shadow lookups)   │
          └────────────┬───────────────────────┘
                       │  shared cache (LDB)
                       ▼
          SSSD Providers
          ┌────────────────────────────────────┐
          │  identity provider  auth provider   │
          │  (user/group attrs) (credentials)   │
          └────────────┬───────────────────────┘
                       │
          ┌────────────┼────────────┐
          ▼            ▼            ▼
       LDAP          Kerberos    Local files
    (AD / OpenLDAP)  (KDC / AD)

EP03 showed that SSSD sits between PAM and LDAP. This episode goes inside it — the architecture, the config, and how to tell exactly what it’s doing on any given login attempt.


Why SSSD Exists

The problem before SSSD: nss_ldap and pam_ldap made direct LDAP connections for every query. No caching, no connection pooling, no failover, no offline support. On a system that makes dozens of getpwuid() calls per second (every ls -l, every process spawn), this meant dozens of LDAP roundtrips per second hitting the domain controller.

SSSD solved this with a single daemon that:
– Maintains a persistent connection pool to the directory
– Caches identity and credential data in an LDB (LDAP-like) database on disk
– Handles failover across multiple directory servers
– Satisfies PAM and NSS queries from cache when the directory is unreachable

The credential cache is the key insight. When you authenticate successfully, SSSD stores a hash of your credentials locally. If the domain controller is unreachable on your next login — network outage, laptop offline, VPN not connected — SSSD can verify your credentials against the local cache. You log in. You never knew the DC was down.


SSSD Architecture

SSSD is a set of cooperating processes sharing a cache:

Monitor — the parent process. Starts and restarts all other SSSD processes. If a responder or provider crashes, the monitor restarts it.

Responders — answer queries from PAM and NSS. Each responder handles a specific interface:
sssd_nss — answers getpwnam(), getpwuid(), getgrnam(), initgroups() calls
sssd_pam — handles PAM authentication, account checks, and session management
sssd_autofs, sssd_ssh, sssd_sudo — optional responders for specific services

Providers — the backend processes that talk to the actual directory:
– Each domain gets its own provider process (sssd_be[domain_name])
– The provider connects to LDAP/Kerberos/AD, fetches data, and writes it to the shared cache
– If the provider crashes or loses connectivity, responders fall back to serving from cache

Cache — LDB files in /var/lib/sss/db/. One database per configured domain, plus a cache for negative results (lookups that returned “not found”). The cache is an LDAP-like directory stored on disk — SSSD uses the same hierarchical structure for local storage as the remote directory uses.

# See the cache files
ls -la /var/lib/sss/db/
# cache_corp.com.ldb         ← user/group data for domain corp.com
# ccache_corp.com            ← Kerberos credential cache
# timestamps_corp.com.ldb   ← when entries were last refreshed

sssd.conf: The Config That Matters

/etc/sssd/sssd.conf has a [sssd] section (global) and one [domain/name] section per directory. The domain section is where almost all tuning happens.

[sssd]
services = nss, pam, sudo
domains = corp.com
config_file_version = 2

[domain/corp.com]
# What type of directory this is
id_provider = ad               # or: ldap, ipa, files
auth_provider = ad             # or: ldap, krb5, none
access_provider = ad           # controls who can log in

# The AD/LDAP server (can be a list for failover)
ad_domain = corp.com
ad_server = dc01.corp.com, dc02.corp.com

# Where to look for users and groups
ldap_search_base = dc=corp,dc=com

# Cache behavior
cache_credentials = true       # enable offline login
entry_cache_timeout = 5400     # how long before re-querying (seconds)
offline_credentials_expiration = 1  # days cached credentials stay valid offline

# What uid/gid range belongs to this domain (prevents UID conflicts)
ldap_id_mapping = true         # auto-map AD SIDs to UIDs (no uidNumber needed)
# OR for classical POSIX LDAP:
# ldap_id_mapping = false      # use uidNumber/gidNumber from directory

# Restrict logins to specific AD groups
# access_provider = simple
# simple_allow_groups = linux-admins, sre-team

# Home directory and shell defaults
override_homedir = /home/%u
default_shell = /bin/bash
fallback_homedir = /home/%u

# Enumerate all users (expensive on large dirs — disable unless needed)
enumerate = false

The two most commonly wrong settings:

ldap_search_base — if this doesn’t include the OU where your users live, SSSD won’t find them. On AD, the default searches the entire domain, which is usually correct. On OpenLDAP, you may need ou=people,dc=corp,dc=com.

ldap_id_mapping — on AD, users typically don’t have uidNumber attributes. Setting ldap_id_mapping = true tells SSSD to derive a UID from the user’s SID algorithmically. This produces consistent UIDs across machines. Setting it to false requires actual uidNumber attributes in the directory.


Credential Caching and Offline Logins

The cache is what separates SSSD from a simple proxy. When cache_credentials = true:

  1. On successful authentication, SSSD stores a hash of the credential in the LDB cache
  2. On the next authentication attempt, SSSD first tries the domain controller
  3. If the DC is unreachable, SSSD falls back to the local credential hash
  4. If the hash matches, login succeeds — even with no network

The credential hash is not the cleartext password — it’s a salted hash stored in /var/lib/sss/db/cache_corp.com.ldb. The security model is the same as /etc/shadow: someone with root access to the machine can access the hashes.

offline_credentials_expiration controls how long cached credentials stay valid when the DC is unreachable. 0 means forever (not recommended for high-security environments). 1 means one day — after 24 hours offline, even cached credentials expire and the user must authenticate online.


The Debugging Toolkit

# 1. Is SSSD running?
systemctl status sssd
pgrep -a sssd    # shows all SSSD processes (monitor + responders + providers)

# 2. Domain connectivity status
sssctl domain-status corp.com
# Domain: corp.com
# Active servers:
#   LDAP: dc01.corp.com
#   KDC: dc01.corp.com
# Discovered servers:
#   LDAP: dc01.corp.com, dc02.corp.com

# 3. Can SSSD find a specific user?
sssctl user-checks vamshi
# user: vamshi
# user name: [email protected]
# POSIX attributes: UID=1001, GID=1001, ...
# Authentication: success (uses actual PAM auth stack)

# 4. What does NSS see?
getent passwd vamshi          # full passwd entry
id vamshi                     # uid, gid, groups

# 5. Flush stale cache entries
sss_cache -u vamshi           # invalidate one user
sss_cache -G engineers        # invalidate one group
sss_cache -E                  # invalidate everything (nuclear option)

# 6. Live logs
journalctl -u sssd -f         # tail all SSSD logs
# Then attempt login in another terminal — watch the auth flow in real time

# 7. Increase log verbosity temporarily
sssctl config-check            # validate sssd.conf syntax
# Edit sssd.conf: add debug_level = 6 under [domain/corp.com]
systemctl restart sssd
journalctl -u sssd -f          # now shows LDAP queries, cache hits/misses

The single most useful command is sssctl user-checks <username>. It runs the full NSS + PAM auth stack internally and prints what SSSD would do on a real login — without creating a session or touching the running system.


Breaking SSSD (and What Each Failure Looks Like)

SSSD not running:

ssh vamshi@server
# Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
# getent passwd vamshi → (empty)
# Fix: systemctl start sssd

Stale cache after AD password change:

# User changed password in AD but SSSD still has old credential hash
ssh vamshi@server  # password accepted (wrong!) — cache hit with old hash
# Fix: sss_cache -u vamshi, then attempt login again

Clock skew > 5 minutes (breaks Kerberos):

journalctl -u sssd | grep -i "clock skew\|KDC\|kinit"
# sssd_be[corp.com]: Kerberos authentication failed: Clock skew too great
# Fix: systemctl restart chronyd (or ntpd), verify time sync

ldap_search_base wrong:

getent passwd vamshi  # empty, but user exists in AD
sssctl user-checks vamshi  # "User not found"
# Check: ldap_search_base must include the OU containing users
# Test: ldapsearch -x -H ldap://dc -b "ou=engineers,dc=corp,dc=com" "(uid=vamshi)"

⚠ Common Misconceptions

“Restarting SSSD logs everyone out.” Restarting SSSD doesn’t affect existing authenticated sessions. Active shell sessions, running processes — all unaffected. Only new authentication attempts are disrupted during the restart window, which takes a few seconds.

“sss_cache -E fixes everything.” Flushing the entire cache forces SSSD to re-fetch all entries from the domain controller on the next lookup. On a system with many users or enumeration enabled, this can cause a brief spike in LDAP traffic and slow lookups. Use targeted flushes (-u username, -G group) when possible.

“debug_level should always be high.” SSSD at debug_level = 9 logs every LDAP packet. On a production system with active logins, this generates gigabytes of logs quickly. Set it temporarily for debugging, then remove it and restart.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management SSSD is the runtime implementation of enterprise identity integration on Linux — understanding its caching model, failover behavior, and credential storage is foundational to IAM operations
CISSP Domain 3: Security Architecture and Engineering The credential cache design (/var/lib/sss/db/) creates a local credential store with specific security properties — architects need to understand the offline login trade-off
CISSP Domain 7: Security Operations SSSD is a critical security service — monitoring it, understanding its failure modes, and knowing how to recover it quickly are operational security skills

Key Takeaways

  • SSSD is a three-tier system: responders (serve PAM/NSS), providers (talk to AD/LDAP), and a shared LDB cache — each tier is independently restartable
  • Credential caching enables offline logins — the security trade-off is a local hash store in /var/lib/sss/db/
  • sssctl user-checks is the first tool to reach for when a login fails — it simulates the full auth flow and shows exactly where it breaks
  • ldap_id_mapping = true is the right choice for AD environments without POSIX attributes; false requires actual uidNumber/gidNumber in the directory
  • Clock skew over 5 minutes silently breaks Kerberos authentication — time sync is a hard dependency

What’s Next

EP04 showed SSSD’s role as the caching and brokering layer. What it referenced repeatedly — “Kerberos ticket”, “KDC”, “GSSAPI” — is the authentication protocol that sits underneath AD-joined Linux logins. SSSD uses Kerberos to authenticate. LDAP carries the identity data. EP05 explains how Kerberos works.

Next: How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Get EP05 in your inbox when it publishes → linuxcent.com/subscribe

How LDAP Authentication Works on Linux: PAM, NSS, and the Login Stack

Reading Time: 9 minutes

The Identity Stack, Episode 3
EP01: What Is LDAPEP02: LDAP InternalsEP03EP04: SSSD → …

Focus Keyphrase: LDAP authentication Linux
Search Intent: Informational
Meta Description: Trace a Linux SSH login through PAM, NSS, and LDAP step by step — and understand why LDAP alone is not an authentication protocol. (144 chars)


TL;DR

  • LDAP is a directory protocol — it stores identity information and can verify a password via Bind, but authentication on Linux runs through PAM, not directly through LDAP
  • NSS (/etc/nsswitch.conf) answers “who is this user?” — it resolves UIDs, group memberships, and home directories by querying LDAP (or the local files, or SSSD)
  • PAM (/etc/pam.d/) answers “are they allowed in?” — it enforces authentication, account validity, session setup, and password policy
  • pam_ldap (the old way) opened a direct LDAP connection on every login — fragile, no caching, broken when the LDAP server was unreachable
  • pam_sss (the modern way) delegates to SSSD, which caches credentials and handles failover — SSSD is the layer between Linux and the directory
  • Tracing a single SSH login: sshd → PAM → pam_sss → SSSD → LDAP Bind + Search → session created

The Big Picture: One SSH Login, Four Layers

You type: ssh [email protected]

  sshd
    │
    ▼
  PAM  (/etc/pam.d/sshd)          ← "Is this user allowed in?"
    │
    ├── pam_sss    (auth)          ← sends credentials to SSSD
    ├── pam_sss    (account)       ← checks account not expired/locked
    ├── pam_sss    (session)       ← logs the session open/close
    └── pam_mkhomedir (session)    ← creates /home/vamshi if it doesn't exist
    │
    ▼
  SSSD  (/etc/sssd/sssd.conf)     ← "Let me check the directory"
    │
    ├── NSS responder              ← answers getent, id, getpwnam
    └── LDAP/Kerberos provider     ← talks to the actual directory
    │
    ▼
  LDAP Server (AD / OpenLDAP)
    │
    ├── Bind: uid=vamshi + password (or Kerberos ticket)
    └── Search: posixAccount attrs for uid=vamshi
    │
    ▼
  Linux session created
  UID=1001, GID=1001, HOME=/home/vamshi, SHELL=/bin/bash

EP02 showed what the directory contains and what travels on the wire. What it left open is how Linux uses that to grant a login — and why LDAP is not, by itself, an authentication protocol.


Why LDAP Is Not an Authentication Protocol

This is the confusion that trips people most. LDAP can verify a password — the Bind operation does exactly that. But authentication on Linux means something broader: checking credentials, checking account validity, enforcing password policy, setting up a session, creating a home directory. LDAP handles one piece of that. PAM handles the rest.

More precisely: LDAP doesn’t know what a Linux session is. It doesn’t know about /etc/pam.d/. It doesn’t enforce login hours, account expiry, or concurrent session limits. It returns directory entries and verifies binds. The intelligence about what to do with those results lives in the Linux authentication stack.

When you run ssh vamshi@server, the OS doesn’t open an LDAP connection and ask “can this user log in?” It calls PAM. PAM consults its configuration, and PAM decides whether to call LDAP (directly or via SSSD), whether to check the shadow file, whether to enforce MFA. LDAP is one possible backend. It’s not the gatekeeper.


NSS: The Traffic Controller

Before PAM runs, Linux needs to know if the user exists at all. That’s NSS’s job.

/etc/nsswitch.conf is a routing table for name resolution. It tells the OS where to look when something asks “who is UID 1001?” or “what groups is vamshi in?”:

# /etc/nsswitch.conf

passwd:     files sss        ← user lookups: check /etc/passwd first, then SSSD
group:      files sss        ← group lookups: check /etc/group first, then SSSD
shadow:     files sss        ← shadow password lookups
hosts:      files dns        ← hostname lookups (not identity-related)
netgroup:   sss              ← NIS netgroups from SSSD only
automount:  sss              ← autofs maps from SSSD

Every call to getpwnam(), getpwuid(), getgrnam(), getgrgid() in any process — including sshd — goes through NSS. The entries in nsswitch.conf control which backends are tried in order.

With passwd: files sss, a lookup for user vamshi:
1. Checks /etc/passwd — not found (vamshi is a domain user, not in local files)
2. Queries SSSD — SSSD checks its cache, or queries LDAP, and returns the posixAccount attributes

Without the sss entry in passwd:, domain users don’t exist on the system — getent passwd vamshi returns nothing, id vamshi fails, SSH login never gets to PAM’s authentication step.

# Verify NSS is routing to SSSD correctly
getent passwd vamshi
# vamshi:*:1001:1001:Vamshi K:/home/vamshi:/bin/bash

# If this returns nothing, NSS isn't reaching SSSD
# Check: systemctl status sssd && grep passwd /etc/nsswitch.conf

# See what groups the user is in (NSS group lookup)
id vamshi
# uid=1001(vamshi) gid=1001(engineers) groups=1001(engineers),1002(ops)

PAM: The Real Gatekeeper

PAM (Pluggable Authentication Modules) is the framework that lets Linux swap authentication backends without recompiling anything. Every service that needs to authenticate users — sshd, sudo, login, su, gdm — has a PAM configuration file in /etc/pam.d/.

Each PAM config defines four stacks:

auth        ← verify credentials (password, key, MFA)
account     ← check if the account is valid (not expired, not locked, login hours)
password    ← password change policy
session     ← set up/tear down the session (home dir, limits, logging)

A typical /etc/pam.d/sshd on a system joined to AD via SSSD:

# /etc/pam.d/sshd

# auth stack — verify the user's credentials
auth    required      pam_sepermit.so
auth    substack      password-auth   ← usually includes pam_sss.so

# account stack — check account validity
account required      pam_nologin.so
account include       password-auth

# password stack — handle password changes
password include      password-auth

# session stack — set up the session
session required      pam_selinux.so close
session required      pam_loginuid.so
session optional      pam_keyinit.so force revoke
session include       password-auth
session optional      pam_motd.so
session optional      pam_mkhomedir.so skel=/etc/skel/ umask=0077
session required      pam_selinux.so open

The include and substack directives pull in shared stacks from other files (like /etc/pam.d/password-auth). On a system with SSSD, password-auth contains:

auth    required      pam_env.so
auth    sufficient    pam_sss.so      ← try SSSD first
auth    required      pam_deny.so     ← if pam_sss fails, deny

account required      pam_unix.so
account sufficient    pam_localuser.so
account sufficient    pam_sss.so      ← SSSD account check
account required      pam_permit.so

session optional      pam_sss.so      ← SSSD session tracking

The sufficient flag means: if this module succeeds, stop checking this stack and consider it passed. required means: this must pass (but continue checking other modules and report failure at the end). requisite means: if this fails, stop immediately.


PAM Control Flags at a Glance

required   — must succeed; failure reported after remaining modules run
requisite  — must succeed; failure reported immediately, stack stops
sufficient — if success, stop stack (ignore remaining); failure continues
optional   — result ignored unless it's the only module in the stack

This matters for debugging. If pam_sss.so is sufficient and SSSD is down, PAM falls through to pam_deny.so — login denied. If it were optional, the login would proceed to the next module. The control flag is the policy decision.


The Old Way: pam_ldap

Before SSSD, Linux systems used pam_ldap and nss_ldap directly:

# Old /etc/pam.d/common-auth (Ubuntu pre-SSSD era)
auth    sufficient    pam_ldap.so    ← direct LDAP connection per login
auth    required      pam_unix.so nullok_secure

# Old /etc/nsswitch.conf
passwd: files ldap    ← nss_ldap for user lookups
group:  files ldap

pam_ldap opened a fresh LDAP connection on every login attempt. No caching. If the LDAP server was unreachable for 3 seconds, the login hung for 3 seconds — sometimes much longer. If the LDAP server was down, all domain logins failed immediately. Previously logged-in users with active sessions were fine; new logins simply didn’t work.

nss_ldap had the same problem for NSS lookups: every getpwnam() call hit the LDAP server directly. On a busy system with many processes doing user lookups, this meant hundreds of LDAP queries per second, no connection reuse, and no way to survive a brief network blip.

The problems were structural:
– No credential caching — offline logins impossible
– No connection pooling — LDAP server saw one connection per login attempt
– No failover logic — one LDAP server down meant all logins down
– Slow timeouts that blocked login sessions

SSSD was built to fix all of this.


The Modern Way: pam_sss + SSSD

pam_sss doesn’t talk to LDAP directly. It’s a thin client that passes authentication requests to SSSD over a Unix domain socket. SSSD manages the LDAP connection, the credential cache, and the failover logic.

sshd  →  PAM (pam_sss)  →  SSSD (Unix socket)  →  LDAP server
                                   │
                                   └── credential cache
                                       (survives brief LDAP outages)

When pam_sss sends a credential to SSSD:
1. SSSD checks its in-memory cache — if the credential hash matches a recent successful auth, it can satisfy the request without hitting LDAP
2. If not cached (or cache expired), SSSD sends a Bind to the LDAP server
3. On success, SSSD caches the result and returns success to pam_sss
4. pam_sss returns PAM_SUCCESS, and the auth stack continues

The credential cache is what enables offline logins. If the LDAP server is unreachable and a user has authenticated successfully within the cache TTL (default: 1 day for credentials, configurable via cache_credentials = True in sssd.conf), SSSD satisfies the auth from cache and the login succeeds. The user never knows the LDAP server was down.


Tracing a Full SSH Login

Here’s every step of an SSH login for a domain user, in order:

1.  sshd accepts the TCP connection
2.  sshd calls PAM: pam_start("sshd", "vamshi", ...)

3.  PAM auth stack runs pam_sss:
      pam_sss sends credentials to SSSD via /var/lib/sss/pipes/pam

4.  SSSD auth provider:
      a. Check credential cache — miss (first login)
      b. Resolve user: NSS lookup for uid=vamshi
         → SSSD LDAP provider searches dc=corp,dc=com for (uid=vamshi)
         → Returns: uidNumber=1001, gidNumber=1001, homeDirectory=/home/vamshi
      c. Authenticate: LDAP Simple Bind as uid=vamshi,ou=engineers,dc=corp,dc=com
         → Server returns: success
      d. Cache the credential hash + POSIX attrs

5.  SSSD returns PAM_SUCCESS to pam_sss

6.  PAM account stack runs pam_sss:
      SSSD checks: account not expired, not locked, login permitted
      → PAM_ACCT_MGMT success

7.  PAM session stack:
      pam_loginuid sets /proc/self/loginuid = 1001
      pam_mkhomedir creates /home/vamshi if missing
      pam_sss opens session (records in SSSD session tracking)

8.  sshd creates the shell, sets environment:
      USER=vamshi, HOME=/home/vamshi, SHELL=/bin/bash, LOGNAME=vamshi

9.  Shell prompt appears

Steps 4b and 4c are the only two LDAP operations in the entire login flow: one Search to resolve the user’s attributes, one Bind to verify the password. Everything else is PAM and SSSD.


Debugging the Stack

When a login fails, the failure could be in any layer. Work top-down:

# 1. Does NSS resolve the user at all?
getent passwd vamshi
# If empty: NSS isn't reaching SSSD, or SSSD isn't finding the user in LDAP

# 2. Is SSSD running and healthy?
systemctl status sssd
sssctl domain-status corp.com      # shows SSSD's view of domain connectivity

# 3. What does SSSD think about the user?
sssctl user-checks vamshi          # runs auth + account checks internally
id vamshi                          # forces NSS resolution and shows group memberships

# 4. What does SSSD's log say?
journalctl -u sssd -f              # tail SSSD logs live, then attempt login

# 5. Can you reach the LDAP server at all?
ldapsearch -x -H ldap://dc.corp.com \
  -D "cn=svc-ldap,ou=services,dc=corp,dc=com" \
  -w "password" \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)" dn

# 6. Force a cache flush if entries are stale
sss_cache -u vamshi                # invalidate this user's cache entry
sss_cache -G engineers             # invalidate a group

The sssctl user-checks command is the single most useful diagnostic — it simulates the full PAM auth + account check flow without actually creating a session, and prints exactly what SSSD would do on a real login attempt.


⚠ Common Misconceptions

“If ldapsearch works, SSH login should work.” Not necessarily. ldapsearch tests the LDAP layer. An SSH login requires NSS to resolve the user, PAM to authenticate, SSSD to be running and configured correctly, and pam_mkhomedir to create the home directory if it’s the first login. Any of these can fail independently.

“pam_ldap and pam_sss do the same thing.” They have the same job (authenticate via LDAP) but completely different architectures. pam_ldap is a direct-connect, no-cache module. pam_sss is a client of SSSD, which provides caching, connection pooling, failover, and offline support. On any modern system, you want pam_sss.

“nsswitch.conf order doesn’t matter much.” It matters exactly as much as the order suggests. passwd: files sss means local /etc/passwd is always checked first — if a domain username collides with a local user, the local account wins. This is the intended behavior (local accounts should always be reachable), but it means you’ll never override a local account with a directory entry.

“SSSD cache = security risk.” The cache stores a credential hash, not the cleartext password. An attacker with access to the SSSD cache database (/var/lib/sss/db/) would see hashed credentials — the same situation as /etc/shadow. The real concern is whether offline authentication is appropriate for your security posture; it can be disabled with offline_credentials_expiration = 0.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management PAM is the enforcement layer for authentication policy on Linux — understanding its stack is foundational to any Linux IAM deployment
CISSP Domain 3: Security Architecture and Engineering The separation between NSS (resolution) and PAM (authentication) is an architectural boundary — misunderstanding it leads to misconfigured systems where account checks are bypassed
CISSP Domain 4: Communications and Network Security pam_ldap vs pam_sss affects whether credentials travel over a direct LDAP connection (one socket per login, no TLS guarantee) or through SSSD’s managed, pooled connection

Key Takeaways

  • LDAP alone is not an authentication protocol for Linux — authentication flows through PAM, and LDAP is one of PAM’s possible backends
  • NSS (/etc/nsswitch.conf) resolves user identity (who is UID 1001?); PAM enforces it (are they allowed in?)
  • pam_ldap talks to LDAP directly — no cache, no failover, login blocked when LDAP is unreachable
  • pam_sss delegates to SSSD — credential caching, connection pooling, offline login, and failover are all built in
  • A full SSH login touches LDAP exactly twice: one Search for POSIX attributes, one Bind to verify the password
  • When login fails, debug top-down: NSS resolution → SSSD status → LDAP reachability → PAM config

What’s Next

EP03 showed how authentication reaches LDAP — through PAM, through SSSD, through a Bind. What it assumed is that SSSD is healthy and the LDAP server is reachable. The moment either goes wrong, the behavior depends entirely on how SSSD is configured — its cache TTLs, its failover order, its offline credential policy.

EP04 goes inside SSSD: the architecture, the sssd.conf knobs that matter, how to read the logs, and how to break it intentionally and fix it.

Next: SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

One Blueprint, Six Clouds — Multi-Provider OS Image Builds

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**

Focus Keyphrase: multi-cloud OS hardening
Search Intent: Informational
Meta Description: Maintain one OS hardening baseline across AWS, GCP, and Azure without separate scripts that drift. One HardeningBlueprint YAML, six providers, zero duplication. (155 chars)


TL;DR

  • Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
  • A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
  • The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
  • Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
  • One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
  • Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.


We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.


How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.


The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.


What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider OS disk Ephemeral Data
AWS /dev/xvda /dev/xvdb /dev/xvdc+
GCP /dev/sda /dev/sdb+
Azure /dev/sda /dev/sdb (temp disk) /dev/sdc+
DigitalOcean /dev/vda /dev/vdb+

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.


Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.


Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.


Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.


Key Takeaways

  • Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
  • The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
  • Parallel multi-provider builds produce images with identical compliance grades on the same schedule
  • Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
  • Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

Kubernetes CRDs in Production: Finalizers, Status Conditions, and RBAC Patterns

Reading Time: 8 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 10
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Finalizers block deletion until cleanup completes — they prevent orphaned external resources but cause stuck objects if the controller crashes mid-cleanup; always implement a removal timeout
  • Status conditions are the standard communication channel between controller and user: use type, status, reason, message, and observedGeneration on every condition; never invent ad-hoc status fields
  • Owner references wire automatic garbage collection — when the parent custom resource is deleted, Kubernetes deletes owned child objects; use them for every object your controller creates in the same namespace
  • RBAC for CRDs in multi-tenant clusters must include separate ClusterRoles for controller, editor, and viewer; grant status and finalizers as separate sub-resources; never give application teams cluster-scoped create/delete on CRDs
  • The three most common Kubernetes CRD production failure modes: finalizer death loop, status thrash, and CRD deletion cascade — all avoidable with the patterns in this episode
  • Running kubectl get crds on a healthy cluster should show Established: True for every CRD; non-Established CRDs silently reject all create requests

The Big Picture

  PRODUCTION CRD LIFECYCLE: FULL PICTURE

  Create         Reconcile        Suspend/Resume      Delete
  ──────         ─────────        ──────────────      ──────
  User applies   Controller       User patches         User deletes
  BackupPolicy   creates CronJob, spec.suspended=true  BackupPolicy
      │          sets status          │                    │
      ▼              │                ▼                    ▼
  Admission      │           Controller          Finalizer blocks
  webhook        │           suspends CronJob     deletion
  (if any)       │                               Controller:
      │          │                                 1. Delete CronJob
      ▼          ▼                                 2. Remove external state
  Schema       Status                              3. Remove finalizer
  validation   conditions                          Object deleted from etcd
      │        updated
      ▼
  Controller
  reconcile
  triggered

Kubernetes CRD production readiness is not just about making the happy path work — it is about designing for the failure modes: controllers crashing mid-operation, deletion races, and status messages that confuse operators at 2am.


Finalizers: Controlled Deletion

A finalizer is a string in metadata.finalizers. Kubernetes will not delete an object that has finalizers, regardless of who issues the delete command.

metadata:
  name: nightly
  namespace: demo
  finalizers:
    - storage.example.com/backup-cleanup  # ← your controller put this here

When kubectl delete bp nightly runs:

  1. API server sets metadata.deletionTimestamp  (does NOT delete yet)
  2. Object is visible as "Terminating"
  3. Controller sees deletionTimestamp set
  4. Controller runs cleanup:
       - delete backup data from S3
       - delete CronJob (or let owner references handle it)
       - release any external locks
  5. Controller removes the finalizer:
       patch bp nightly --type=json \
         -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'
  6. API server sees finalizers list is now empty → deletes the object

Adding a finalizer in Go

const finalizerName = "storage.example.com/backup-cleanup"

func (r *BackupPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    bp := &storagev1alpha1.BackupPolicy{}
    if err := r.Get(ctx, req.NamespacedName, bp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Deletion path
    if !bp.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(bp, finalizerName) {
            if err := r.cleanupExternalResources(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(bp, finalizerName)
            if err := r.Update(ctx, bp); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }

    // Normal path: ensure finalizer is present
    if !controllerutil.ContainsFinalizer(bp, finalizerName) {
        controllerutil.AddFinalizer(bp, finalizerName)
        if err := r.Update(ctx, bp); err != nil {
            return ctrl.Result{}, err
        }
    }

    // ... rest of reconcile
}

Finalizer death loop and the timeout pattern

If cleanupExternalResources always returns an error (external system down, bug in cleanup code), the object gets stuck in Terminating forever. The operator cannot delete it; kubectl delete --force does not help with finalizers.

Prevention: add a cleanup deadline with status tracking.

func (r *BackupPolicyReconciler) cleanupExternalResources(ctx context.Context, bp *storagev1alpha1.BackupPolicy) error {
    // Check if we've been trying to clean up for too long
    if bp.DeletionTimestamp != nil {
        deadline := bp.DeletionTimestamp.Add(10 * time.Minute)
        if time.Now().After(deadline) {
            // Log the failure, abandon cleanup, let the object be deleted.
            log.FromContext(ctx).Error(nil, "cleanup deadline exceeded, removing finalizer anyway",
                "name", bp.Name)
            return nil   // returning nil removes the finalizer
        }
    }
    // ... actual cleanup
}

Recovery for a stuck object (use only when cleanup truly cannot succeed):

kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers"}]'

Status Conditions: The Right Way

The Kubernetes standard condition format is defined in k8s.io/apimachinery/pkg/apis/meta/v1.Condition:

type Condition struct {
    Type               string          // e.g. "Ready", "Synced", "Degraded"
    Status             ConditionStatus // "True", "False", "Unknown"
    ObservedGeneration int64           // the .metadata.generation this condition reflects
    LastTransitionTime metav1.Time     // when Status last changed
    Reason             string          // machine-readable, CamelCase, e.g. "CronJobCreated"
    Message            string          // human-readable, may contain details
}

Standard condition types

Type Meaning
Ready The resource is fully reconciled and operational
Synced The resource has been synced with an external system
Progressing An operation is actively in progress
Degraded The resource is operating in a reduced capacity

Use Ready: True only when the full reconcile is complete and the resource is functional. Use Ready: False with a clear Message when reconcile fails or is blocked.

Setting conditions in Go

meta.SetStatusCondition(&bpCopy.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionFalse,
    ObservedGeneration: bp.Generation,
    Reason:             "CronJobCreateFailed",
    Message:            fmt.Sprintf("failed to create CronJob: %v", err),
})

meta.SetStatusCondition handles deduplication — it updates an existing condition of the same Type rather than appending a duplicate.

observedGeneration is critical

metadata.generation      = 5   (increments on every spec change)
status.observedGeneration = 3  (set by controller on each reconcile)

If observedGeneration < generation:
  → controller has not yet reconciled the latest spec change
  → status.conditions reflect an older state
  → do NOT alert based on conditions that lag generation

Always set ObservedGeneration: bp.Generation when writing status conditions. Tooling (Argo CD, Flux, kubectl wait) depends on this to know whether status is current.

kubectl wait uses conditions

# Wait until BackupPolicy is Ready
kubectl wait bp/nightly -n demo \
  --for=condition=Ready \
  --timeout=60s

This works because kubectl wait reads the status.conditions array.


Owner References: Automatic Garbage Collection

Owner references wire a parent-child relationship between Kubernetes objects. When the parent is deleted, Kubernetes garbage-collects all owned children automatically.

metadata:
  name: nightly-backup       # CronJob
  ownerReferences:
    - apiVersion: storage.example.com/v1alpha1
      kind: BackupPolicy
      name: nightly
      uid: a1b2c3d4-...
      controller: true          # only one owner can be the controller
      blockOwnerDeletion: true  # the GC waits for this owner before deleting child

Set in Go using ctrl.SetControllerReference:

if err := ctrl.SetControllerReference(bp, cronJob, r.Scheme); err != nil {
    return ctrl.Result{}, err
}

Owner reference rules

  • Owner and owned object must be in the same namespace — cluster-scoped objects cannot own namespaced objects
  • Only one object can be the controller: true owner; others can be non-controller owners
  • Deleting the owner cascades to deleting owned objects — this is garbage collection, not finalizer-based cleanup

Without owner references, deleting a BackupPolicy leaves the CronJob as an orphan. This is hard to detect and accumulates over time.


RBAC Patterns for Multi-Tenant CRD Usage

A production CRD deployment needs three distinct RBAC roles:

# 1. Controller role — full access for the operator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-controller
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/status"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies/finalizers"]
    verbs: ["update"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 2. Editor role — for application teams (namespaced binding)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-editor
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  # No status write — only the controller writes status
  # No finalizers write — prevents deletion blocking by non-controllers
---
# 3. Viewer role — for audit, monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backuppolicy-viewer
rules:
  - apiGroups: ["storage.example.com"]
    resources: ["backuppolicies"]
    verbs: ["get", "list", "watch"]

Bind editor/viewer roles at namespace scope, not cluster scope:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-backup-editor
  namespace: team-alpha
subjects:
  - kind: Group
    name: team-alpha
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: backuppolicy-editor
  apiGroup: rbac.authorization.k8s.io

This pattern gives team-alpha full control over BackupPolicies in their namespace but no access to other namespaces — standard Kubernetes multi-tenancy.


The Three Production Failure Modes

1. Finalizer death loop

Symptoms: Object stuck in Terminating for hours; kubectl get bp nightly shows DeletionTimestamp set but object exists.

Cause: cleanupExternalResources always returns an error.

Detection:

kubectl get bp nightly -n demo -o jsonpath='{.metadata.deletionTimestamp}'
# non-empty = stuck in termination
kubectl describe bp nightly -n demo
# look for repeated reconcile error events

Fix: Add cleanup deadline in controller; use kubectl patch to remove finalizer as last resort.

2. Status thrash

Symptoms: Controller sets Ready: True, then Ready: False, then Ready: True in a rapid loop. Alert noise, confusing dashboards.

Cause: Each reconcile compares actual state incorrectly due to cache lag — it sees its own status write as a change, re-reconciles, and flips the status again.

Fix: Set ObservedGeneration on every condition. Compare generation with observedGeneration before re-reconciling. Use meta.IsStatusConditionTrue to check current condition before overwriting it with the same value.

// Only update status if it changed
current := meta.FindStatusCondition(bp.Status.Conditions, "Ready")
if current == nil || current.Status != desired.Status || current.Reason != desired.Reason {
    meta.SetStatusCondition(&bpCopy.Status.Conditions, desired)
    r.Status().Update(ctx, bpCopy)
}

3. CRD deletion cascade

Symptoms: A team deletes a CRD for cleanup purposes; all instances across all namespaces disappear silently.

Cause: kubectl delete crd backuppolicies.storage.example.com — the API server cascades the deletion to all custom resources of that type.

Prevention:
– Add a resourcelock annotation on production CRDs managed by your operator
– Use GitOps (Argo CD, Flux) to manage CRD installation — a deleted CRD is automatically re-applied from the Git source
– Back up CRDs and instances with velero or equivalent before any CRD management operations


Production Readiness Checklist

CRD DEFINITION
  □ spec.versions has exactly one storage: true version
  □ Status subresource enabled (subresources.status: {})
  □ additionalPrinterColumns includes Ready column from status.conditions
  □ OpenAPI schema defines required fields and types
  □ CEL rules cover cross-field constraints

CONTROLLER
  □ Owner references set on all child resources
  □ Finalizer logic includes cleanup deadline
  □ Status conditions use standard format with observedGeneration
  □ Reconcile function is idempotent
  □ Not-found errors handled cleanly (return nil, not error)
  □ At least 2 replicas with leader election enabled

RBAC
  □ Three ClusterRoles: controller, editor, viewer
  □ Status and finalizers are separate RBAC sub-resources
  □ Editor/viewer bound at namespace scope, not cluster scope
  □ Controller ServiceAccount has only necessary permissions

OPERATIONS
  □ CRD installed via GitOps or Helm (not manual kubectl apply)
  □ Backup of CRDs and instances included in cluster backup
  □ kubectl get crds shows Established: True for all CRDs
  □ Monitoring for stuck Terminating objects (finalizer deadlock)
  □ Alert on controller reconcile error rate, not just pod health

⚠ Common Mistakes

Granting update on backuppolicies but not backuppolicies/status to the controller. If the controller cannot write status, status updates silently fail. The controller appears to run but status conditions never update. Grant both backuppolicies (for spec/metadata writes) and backuppolicies/status (for the status subresource path).

Setting Ready: True before all owned resources are healthy. If the controller sets Ready: True after creating the CronJob but before verifying the CronJob is actually active, users see a false-positive health signal. Only set Ready: True when you have confirmed the desired state is actually achieved.

Not setting observedGeneration on status conditions. Tools like Argo CD and kubectl wait --for=condition=Ready will report incorrect health status if observedGeneration is stale. Always set ObservedGeneration: obj.Generation in every condition write.

Using kubectl delete crd in a production cluster without a backup. This is irreversible. Treat CRDs as production-critical infrastructure — require GitOps review, backup verification, and team approval before any CRD deletion.


Quick Reference

# Check for stuck Terminating objects
kubectl get backuppolicies -A --field-selector metadata.deletionTimestamp!=''

# Force-remove a stuck finalizer (use only when cleanup is truly impossible)
kubectl patch bp nightly -n demo --type=json \
  -p '[{"op":"remove","path":"/metadata/finalizers/0"}]'

# Check all CRDs are Established
kubectl get crds -o jsonpath='{range .items[*]}{.metadata.name} {.status.conditions[?(@.type=="Established")].status}{"\n"}{end}'

# Watch status conditions update during reconcile
kubectl get bp nightly -n demo -w -o \
  jsonpath='{.status.conditions[?(@.type=="Ready")].status} {.status.conditions[?(@.type=="Ready")].message}{"\n"}'

# Verify owner references are set on child CronJob
kubectl get cronjob nightly-backup -n demo \
  -o jsonpath='{.metadata.ownerReferences}'

# List all objects owned by a BackupPolicy (by label)
kubectl get all -n demo -l backuppolicy=nightly

Key Takeaways

  • Finalizers block deletion until cleanup completes — always implement a cleanup deadline to prevent permanent stuck objects
  • Status conditions must use the standard format with observedGeneration — tooling depends on it for correctness
  • Owner references enable automatic garbage collection of child resources when the parent is deleted
  • RBAC needs three roles (controller, editor, viewer) with status and finalizers as separate sub-resources
  • The three production failure modes — finalizer death loop, status thrash, CRD deletion cascade — are all preventable with the patterns covered in this episode

Series Complete

You now have the full picture of Kubernetes CRDs and Operators: from understanding what a CRD is (EP01), through real examples (EP02), schema design (EP03), hands-on YAML (EP04), CEL validation (EP05), the controller loop (EP06), building an operator (EP07), versioning (EP08), admission webhooks (EP09), to production patterns in this episode.

The next series in the Kubernetes learning arc on linuxcent.com covers Kubernetes Networking Deep Dive — Services, Ingress, Gateway API, CNI, and eBPF networking. Subscribe below to get it when it launches.

Stay subscribed → linuxcent.com

Admission Webhooks: Validating and Mutating Requests Before They Reach etcd

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 9
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes admission webhooks are HTTPS endpoints called by the API server synchronously on every create/update/delete — before the object reaches etcd
    (two types: mutating webhooks modify the object; validating webhooks approve or reject it — mutating runs first, then validating)
  • Use a validating webhook when you need to reject objects based on state you cannot express in CEL: checking if a referenced Secret exists, enforcing cross-resource quotas, consulting an external policy engine
  • Use a mutating webhook when you need to inject defaults or sidecar containers that depend on context you cannot express in the CRD schema (environment-specific defaults, sidecar injection)
  • Admission webhooks are an availability dependency — if your webhook is unreachable, the API requests it covers will fail. failurePolicy: Ignore is the safety valve; use it only for non-critical webhooks
  • OPA/Gatekeeper and Kyverno are admission webhook platforms — they let you write policy as code (Rego, YAML) instead of writing Go webhook handlers
  • For CRD-specific validation that only depends on the object itself, prefer CEL (EP05) — webhooks are for rules that require external lookups or cross-resource checks

The Big Picture

  KUBERNETES ADMISSION CHAIN (full picture)

  kubectl apply -f backuppolicy.yaml
        │
        ▼
  API Server: authentication + authorization
        │
        ▼
  1. Mutating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return modified object │
     │ Examples: inject annotations,          │
     │ set defaults, add sidecars            │
     └───────────────────────────────────────┘
        │
        ▼
  2. Schema validation (OpenAPI + CEL)
        │
        ▼
  3. Validating admission webhooks
     ┌───────────────────────────────────────┐
     │ Receive object, return allow/deny     │
     │ Examples: quota checks, cross-        │
     │ resource validation, policy engines   │
     └───────────────────────────────────────┘
        │
        ▼ (allowed)
  etcd storage

Kubernetes admission webhooks are how tools like Istio inject sidecars, Kyverno enforces policies, and OPA/Gatekeeper applies organizational guardrails — all without modifying Kubernetes source code. Understanding them completes the picture of how Kubernetes is extended beyond CRDs.


Validating vs Mutating: When to Use Each

  DECISION TREE: CEL vs Validating Webhook vs Mutating Webhook

  "I need to validate a field value"
      │
      ├── Depends only on the object being submitted?
      │   → Use CEL (x-kubernetes-validations) — EP05
      │
      └── Needs to look up another resource, quota, or external system?
          → Use Validating Admission Webhook

  "I need to set default values or inject content"
      │
      ├── Defaults depend only on other fields in the same object?
      │   → Use OpenAPI schema defaults or CEL
      │
      └── Defaults depend on environment, namespace labels, or external config?
          → Use Mutating Admission Webhook

Practical examples:

Rule Right tool
retentionDays must be ≤ 365 CEL
if storageClass=premium then retentionDays ≤ 90 CEL
Referenced SecretStore must exist in the same namespace Validating webhook
BackupPolicy count per namespace must not exceed team quota Validating webhook
Inject costCenter annotation from namespace labels Mutating webhook
Inject backup-agent sidecar into all Pods in labeled namespaces Mutating webhook
Enforce that all BackupPolicies have a team label Kyverno or OPA policy

The Webhook Request/Response Contract

Both webhook types receive an AdmissionReview object and return an AdmissionReview response.

Request (from API server to webhook):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "kind": {"group": "storage.example.com", "version": "v1alpha1", "kind": "BackupPolicy"},
    "resource": {"group": "storage.example.com", "version": "v1alpha1", "resource": "backuppolicies"},
    "operation": "CREATE",
    "userInfo": {"username": "alice", "groups": ["system:authenticated"]},
    "object": { /* full BackupPolicy JSON */ },
    "oldObject": null
  }
}

Response for a validating webhook (allow):

{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    "allowed": true
  }
}

Response for a validating webhook (deny):

{
  "response": {
    "uid": "...",
    "allowed": false,
    "status": {
      "code": 422,
      "message": "referenced SecretStore 'aws-secrets-manager' not found in namespace 'production'"
    }
  }
}

Response for a mutating webhook (allow + patch):

{
  "response": {
    "uid": "...",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiJhZGQiLCJwYXRoIjoiL21ldGFkYXRhL2Fubm90YXRpb25zL2Nvc3RDZW50ZXIiLCJ2YWx1ZSI6ImVuZ2luZWVyaW5nIn1d"
    // base64-encoded JSON patch:
    // [{"op":"add","path":"/metadata/annotations/costCenter","value":"engineering"}]
  }
}

Writing a Validating Webhook with kubebuilder

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --programmatic-validation

Edit api/v1alpha1/backuppolicy_webhook.go:

package v1alpha1

import (
    "context"
    "fmt"

    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/webhook/admission"
    esov1beta1 "github.com/external-secrets/external-secrets/apis/externalsecrets/v1beta1"
)

type BackupPolicyCustomValidator struct {
    Client client.Client
}

//+kubebuilder:webhook:path=/validate-storage-example-com-v1alpha1-backuppolicy,mutating=false,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create;update,versions=v1alpha1,name=vbackuppolicy.kb.io,admissionReviewVersions=v1

func (v *BackupPolicyCustomValidator) SetupWebhookWithManager(mgr ctrl.Manager) error {
    v.Client = mgr.GetClient()
    return ctrl.NewWebhookManagedBy(mgr).
        For(&BackupPolicy{}).
        WithValidator(v).
        Complete()
}

// ValidateCreate validates a new BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    bp := obj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateUpdate validates an updated BackupPolicy.
func (v *BackupPolicyCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
    bp := newObj.(*BackupPolicy)
    return nil, v.validateSecretStoreRef(ctx, bp)
}

// ValidateDelete is a no-op here.
func (v *BackupPolicyCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
    return nil, nil
}

// validateSecretStoreRef checks that the referenced SecretStore exists in the same namespace.
func (v *BackupPolicyCustomValidator) validateSecretStoreRef(ctx context.Context, bp *BackupPolicy) error {
    ref := bp.Spec.SecretStoreRef
    if ref == "" {
        return nil  // optional field; CEL handles it if required
    }

    store := &esov1beta1.SecretStore{}
    err := v.Client.Get(ctx, types.NamespacedName{Name: ref, Namespace: bp.Namespace}, store)
    if apierrors.IsNotFound(err) {
        return fmt.Errorf("referenced SecretStore %q not found in namespace %q",
            ref, bp.Namespace)
    }
    return err  // nil on found, real error on API failure
}

Writing a Mutating Webhook: Cost Center Injection

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --defaulting

Edit the defaulting webhook:

//+kubebuilder:webhook:path=/mutate-storage-example-com-v1alpha1-backuppolicy,mutating=true,failurePolicy=fail,sideEffects=None,groups=storage.example.com,resources=backuppolicies,verbs=create,versions=v1alpha1,name=mbackuppolicy.kb.io,admissionReviewVersions=v1

func (r *BackupPolicy) Default() {
    // Default is called by kubebuilder's webhook framework on admission.
    // The webhook handler calls this and patches the object.
    //
    // This runs AFTER API server schema defaults — use it for context-dependent defaults.
}

// For namespace-label-based injection, implement the full webhook handler instead:
type BackupPolicyMutator struct {
    Client client.Client
}

func (m *BackupPolicyMutator) Handle(ctx context.Context, req admission.Request) admission.Response {
    bp := &BackupPolicy{}
    if err := json.Unmarshal(req.Object.Raw, bp); err != nil {
        return admission.Errored(http.StatusBadRequest, err)
    }

    // Fetch the namespace to read its labels
    ns := &corev1.Namespace{}
    if err := m.Client.Get(ctx, types.NamespacedName{Name: bp.Namespace}, ns); err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }

    // Inject costCenter annotation from namespace label
    if costCenter, ok := ns.Labels["billing/cost-center"]; ok {
        if bp.Annotations == nil {
            bp.Annotations = make(map[string]string)
        }
        bp.Annotations["billing/cost-center"] = costCenter
    }

    marshaled, err := json.Marshal(bp)
    if err != nil {
        return admission.Errored(http.StatusInternalServerError, err)
    }
    return admission.PatchResponseFromRaw(req.Object.Raw, marshaled)
}

The WebhookConfiguration Resource

The ValidatingWebhookConfiguration tells the API server which webhooks exist and which resources/operations they handle:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: backup-operator-validating-webhook
  annotations:
    cert-manager.io/inject-ca-from: backup-operator-system/backup-operator-serving-cert
webhooks:
  - name: vbackuppolicy.kb.io
    admissionReviewVersions: ["v1"]
    clientConfig:
      service:
        name: backup-operator-webhook-service
        namespace: backup-operator-system
        path: /validate-storage-example-com-v1alpha1-backuppolicy
    rules:
      - apiGroups:   ["storage.example.com"]
        apiVersions: ["v1alpha1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["backuppolicies"]
    failurePolicy: Fail          # Fail = reject request if webhook unreachable
    sideEffects: None
    timeoutSeconds: 10
    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values: ["kube-system"]  # never webhook kube-system objects

failurePolicy: Fail vs Ignore

  failurePolicy: Fail (default)
  ──────────────────────────────
  If webhook is unreachable → API request fails with 500
  Use when: the validation is critical (quota enforcement, policy)
  Risk: your webhook becoming unavailable breaks all covered API operations

  failurePolicy: Ignore
  ──────────────────────────────
  If webhook is unreachable → API request proceeds as if webhook allowed it
  Use when: the webhook is advisory or can be bypassed safely
  Risk: policy is silently not enforced during webhook outage

For production operators, use failurePolicy: Fail but ensure high availability:
– Run at least 2 webhook pod replicas with PodDisruptionBudget
– Use cert-manager for automatic TLS certificate rotation
– Set timeoutSeconds to a value that allows graceful degradation (5–10s)
– Exclude system namespaces with namespaceSelector


OPA/Gatekeeper and Kyverno: Webhooks as Policy Platforms

Writing raw webhook handlers in Go is powerful but heavyweight for policy enforcement. OPA/Gatekeeper and Kyverno are webhook-based policy engines that let you express policies as code:

Kyverno (YAML-based policies):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-backup-label
spec:
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds: ["BackupPolicy"]
      validate:
        message: "BackupPolicy must have a 'team' label"
        pattern:
          metadata:
            labels:
              team: "?*"

OPA/Gatekeeper (Rego-based policies):

package backuppolicy

deny[msg] {
    input.request.kind.kind == "BackupPolicy"
    not input.request.object.metadata.labels["team"]
    msg := "BackupPolicy must have a 'team' label"
}

Both run as admission webhooks that the API server calls. The policy language sits on top of the webhook plumbing. For organizational policy enforcement across many resource types, these tools outperform custom Go webhook handlers.


⚠ Common Mistakes

Webhook covering * resources or * operations. A webhook covering all resources in the cluster is a reliability risk — a bug in the webhook or an outage breaks everything. Scope webhooks to exactly the resources and operations they need with rules[].resources and rules[].operations.

No TLS certificate rotation. Webhook endpoints require a TLS certificate that the API server trusts. Certificates expire. Using cert-manager with the cert-manager.io/inject-ca-from annotation automates this. Without it, expired certificates cause silent webhook outages (the API server rejects the TLS handshake, triggering failurePolicy behavior).

Not excluding system namespaces. If a validating webhook covers Pods and has failurePolicy: Fail, and the webhook pod itself crashes, the API server cannot create a new webhook pod because the webhook rejects the creation. Use namespaceSelector to exclude kube-system and your operator’s own namespace.

Treating webhook latency as free. Every API operation covered by a webhook adds a synchronous HTTP round-trip. On a busy cluster creating thousands of objects per minute, a 100ms webhook latency becomes significant. Set timeoutSeconds, profile webhook performance, and scope rules narrowly.


Quick Reference

# List all webhook configurations
kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

# Inspect webhook rules and failure policy
kubectl describe validatingwebhookconfiguration backup-operator-validating-webhook

# Temporarily disable a webhook for debugging (dangerous in production)
kubectl delete validatingwebhookconfiguration backup-operator-validating-webhook

# Check webhook endpoint certificate
kubectl get secret backup-operator-webhook-server-cert \
  -n backup-operator-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# Test webhook is reachable from a cluster node
kubectl run webhook-test --image=curlimages/curl --rm -it --restart=Never -- \
  curl -k https://backup-operator-webhook-service.backup-operator-system.svc:443/healthz

Key Takeaways

  • Mutating webhooks modify objects at admission; validating webhooks approve or reject them — mutating runs before validating
  • Use CEL for rules that depend only on the submitted object; use webhooks when you need external lookups or cross-resource checks
  • failurePolicy: Fail blocks API requests if the webhook is unreachable — ensure high availability before using it
  • Always exclude system namespaces and scope rules to specific resource types to minimize the blast radius of webhook failures
  • OPA/Gatekeeper and Kyverno are admission webhook platforms for policy-as-code — prefer them over custom Go handlers for organizational policy enforcement

What’s Next

EP10: Kubernetes CRDs in Production ties the full series together — finalizer design patterns, status condition conventions, owner references, RBAC for multi-tenant CRD usage, and the production failure modes that catch teams off guard.

Get EP10 in your inbox when it publishes → subscribe at linuxcent.com

Kubernetes CRD Versioning: From v1alpha1 to v1 Without Breaking Clients

Reading Time: 6 minutes

Kubernetes CRDs & Operators: Extending the API, Episode 8
What Is a CRD? · CRDs You Already Use · CRD Anatomy · Write Your First CRD · CEL Validation · Controller Loop · Build an Operator · CRD Versioning · Admission Webhooks · CRDs in Production


TL;DR

  • Kubernetes CRD versioning lets you evolve your API from v1alpha1 to v1 without deleting existing custom resources or breaking clients still using the old version
    (storage version = the version etcd actually stores objects in; served versions = the versions the API server responds to; you can serve v1alpha1 and v1 simultaneously while migrating)
  • The hub-and-spoke model is the recommended conversion architecture: one “hub” version (usually v1) that every other version converts to/from
  • Without a conversion webhook, the API server only allows one served version at a time — you must use a webhook to serve multiple versions with schema differences
  • kubectl storage-version-migrator (or manual re-apply) migrates existing objects from the old storage version to the new one after you update storage: true
  • Changing field names between versions without a conversion webhook corrupts data silently — always test conversion round-trips before promoting a version

The Big Picture

  CRD VERSION LIFECYCLE

  Stage 1: Alpha                 Stage 2: Beta              Stage 3: Stable
  ──────────────────             ──────────────             ──────────────
  v1alpha1                       v1alpha1 (deprecated)      v1alpha1 (removed)
    served: true                   served: true               served: false
    storage: true                  storage: false             storage: false
                                 v1beta1                    v1beta1 (deprecated)
                                   served: true               served: true
                                   storage: false             storage: false
                                 v1                         v1
                                   served: true               served: true
                                   storage: true              storage: true

  Clients using v1alpha1:         The API server converts     Eventually remove
  still work via conversion       on the fly                  old served versions
  webhook

Kubernetes CRD versioning is what allows you to ship BackupPolicy v1alpha1 today, learn from real usage, evolve the schema to v1 with renamed fields and new constraints, and keep existing clusters running without a migration window.


Why Versioning Is Necessary

When BackupPolicy v1alpha1 shipped, the spec used retentionDays. After six months of production use, the team learns:

  • retentionDays should be renamed to retention.days (nested under a retention object for future extensibility)
  • A new required field backupFormat needs to be added with a default of tar.gz
  • The targets field should be renamed to includedNamespaces

These are breaking changes. Clients (GitOps repos, Helm charts, other operators) still have YAML referencing v1alpha1 with the old field names. You cannot simply rename the fields.

The solution: add v1 with the new schema, run both versions simultaneously via a conversion webhook, migrate objects to the new storage version, then deprecate v1alpha1.


Simple Case: Non-Breaking Addition (No Webhook Needed)

If you only add new optional fields to the schema — no renames, no removals — you can add a new version without a conversion webhook, as long as only one version is served at a time.

versions:
  - name: v1alpha1
    served: false      # stop serving old version
    storage: false
    schema: ...
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              schedule:
                type: string
              retentionDays:
                type: integer
              backupFormat:          # new optional field
                type: string
                default: "tar.gz"

Existing objects stored as v1alpha1 are served as v1 with the new field defaulted. This works for purely additive changes because the stored bytes are compatible with the new schema.

When this is not enough: field renames, type changes, field removal, or structural reorganization all require a conversion webhook.


The Hub-and-Spoke Model

For breaking schema changes, the API server needs a conversion webhook. The recommended architecture is hub-and-spoke:

  HUB-AND-SPOKE CONVERSION

       v1alpha1
          │
          ▼ convert to hub
         v1  (hub)
          ▲
          │ convert to hub
       v1beta1

  Every version converts TO the hub and FROM the hub.
  The hub is always the storage version.
  Two-version conversion: v1alpha1 → v1 → v1beta1
  Never directly: v1alpha1 → v1beta1

This means you only write N conversion functions (one per version) rather than N² (one per version pair). As you add versions, the conversion complexity grows linearly.


Writing a Conversion Webhook

The conversion webhook is an HTTPS endpoint that the API server calls when it needs to convert an object between versions.

1. Define the conversion hub

In the kubebuilder project, mark v1 as the hub:

In api/v1/backuppolicy_conversion.go:

package v1

// Hub marks this type as the conversion hub.
func (*BackupPolicy) Hub() {}

2. Implement conversion in v1alpha1

In api/v1alpha1/backuppolicy_conversion.go:

package v1alpha1

import (
    "fmt"
    v1 "github.com/example/backup-operator/api/v1"
    "sigs.k8s.io/controller-runtime/pkg/conversion"
)

// ConvertTo converts v1alpha1 BackupPolicy to v1 (the hub).
func (src *BackupPolicy) ConvertTo(dstRaw conversion.Hub) error {
    dst := dstRaw.(*v1.BackupPolicy)

    // Metadata
    dst.ObjectMeta = src.ObjectMeta

    // Field mapping: v1alpha1 → v1
    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.BackupFormat  = "tar.gz"           // new field: default for old objects
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended

    // Renamed field: retentionDays → retention.days
    dst.Spec.Retention = v1.RetentionSpec{
        Days: src.Spec.RetentionDays,
    }

    // Renamed field: targets → includedNamespaces
    for _, t := range src.Spec.Targets {
        dst.Spec.IncludedNamespaces = append(dst.Spec.IncludedNamespaces,
            v1.NamespaceTarget{
                Namespace:      t.Namespace,
                IncludeSecrets: t.IncludeSecrets,
            })
    }

    dst.Status = v1.BackupPolicyStatus(src.Status)
    return nil
}

// ConvertFrom converts v1 (hub) BackupPolicy back to v1alpha1.
func (dst *BackupPolicy) ConvertFrom(srcRaw conversion.Hub) error {
    src := srcRaw.(*v1.BackupPolicy)

    dst.ObjectMeta = src.ObjectMeta

    dst.Spec.Schedule      = src.Spec.Schedule
    dst.Spec.StorageClass  = src.Spec.StorageClass
    dst.Spec.Suspended     = src.Spec.Suspended
    dst.Spec.RetentionDays = src.Spec.Retention.Days

    for _, n := range src.Spec.IncludedNamespaces {
        dst.Spec.Targets = append(dst.Spec.Targets, BackupTarget{
            Namespace:      n.Namespace,
            IncludeSecrets: n.IncludeSecrets,
        })
    }

    // backupFormat cannot be round-tripped to v1alpha1 (no such field)
    // Store it in an annotation to preserve the value if the object is
    // re-converted back to v1.
    if src.Spec.BackupFormat != "" && src.Spec.BackupFormat != "tar.gz" {
        if dst.Annotations == nil {
            dst.Annotations = make(map[string]string)
        }
        dst.Annotations["storage.example.com/backup-format"] = src.Spec.BackupFormat
    }

    dst.Status = BackupPolicyStatus(src.Status)
    return nil
}

3. Register the webhook

kubebuilder create webhook \
  --group storage \
  --version v1alpha1 \
  --kind BackupPolicy \
  --conversion

This generates the webhook server setup. Deploy with a TLS certificate (cert-manager can manage this automatically via the kubebuilder //+kubebuilder:webhook:... marker).


Updating the CRD to Reference the Webhook

spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: backup-operator-webhook-service
          namespace: backup-operator-system
          path: /convert
      conversionReviewVersions: ["v1", "v1beta1"]
  versions:
    - name: v1alpha1
      served: true
      storage: false
      schema: ...
    - name: v1
      served: true
      storage: true
      schema: ...

Once applied, kubectl get backuppolicies.v1alpha1.storage.example.com/nightly and kubectl get backuppolicies.v1.storage.example.com/nightly both work — the API server converts transparently.


Migrating Existing Objects to the New Storage Version

After changing storage: true from v1alpha1 to v1, existing objects in etcd are still stored as v1alpha1 bytes. They are served correctly (via conversion) but are not yet migrated.

Migrate them:

# Option 1: Manual re-apply (works for small object counts)
kubectl get backuppolicies -A -o name | while read name; do
  kubectl apply -f <(kubectl get $name -o yaml)
done

# Option 2: Storage Version Migrator (automated, for large clusters)
# Install: https://github.com/kubernetes-sigs/kube-storage-version-migrator
kubectl apply -f storageVersionMigration.yaml

After migration, all objects in etcd are stored as v1. You can then set v1alpha1 served: false to stop serving the old version.


Storage Version Migration Checklist

  SAFE VERSION PROMOTION CHECKLIST

  □ New version (v1) has served: true, storage: true
  □ Old version (v1alpha1) has served: true, storage: false
  □ Conversion webhook deployed and healthy
  □ Round-trip conversion tested (v1alpha1 → v1 → v1alpha1 preserves all data)
  □ kubectl get backuppolicies works at both versions
  □ Existing objects migrated (re-applied or migration job run)
  □ Old version set to served: false (stop serving)
  □ Old version removed from CRD after N release cycles

⚠ Common Mistakes

Changing the storage version without a conversion webhook. If you flip storage: true from v1alpha1 to v1 while still serving v1alpha1, the API server tries to read stored v1alpha1 bytes as v1 and fails. Always deploy the conversion webhook before changing the storage version.

Lossy conversion. If ConvertFrom (v1 → v1alpha1) drops a field that exists in v1, objects are silently corrupted when a v1alpha1 client reads and re-saves them. Round-trip test every conversion: original → hub → original must produce identical objects (or use annotations to preserve fields that cannot round-trip).

Forgetting to migrate existing objects. After changing the storage version, existing objects are still stored in the old format. They convert on read, but etcd still holds old bytes. Until migrated, your etcd backup/restore story is broken — restoring from backup would restore old-format bytes that need conversion.


Quick Reference

# Check which version is currently the storage version
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.status.storedVersions}'
# output: ["v1alpha1"]  or  ["v1alpha1","v1"]  or  ["v1"]

# Verify conversion webhook is reachable
kubectl get crd backuppolicies.storage.example.com \
  -o jsonpath='{.spec.conversion.webhook.clientConfig}'

# Read an object at a specific version
kubectl get backuppolicies.v1alpha1.storage.example.com/nightly -n demo -o yaml
kubectl get backuppolicies.v1.storage.example.com/nightly -n demo -o yaml

# Check CRD conditions (NamesAccepted, Established)
kubectl describe crd backuppolicies.storage.example.com | grep -A5 Conditions

Key Takeaways

  • CRD versioning lets you evolve the schema without a migration window — old and new versions coexist via a conversion webhook
  • The hub-and-spoke model minimizes conversion code: N functions, not N² — the hub version is always the storage version
  • Never change the storage version without a deployed conversion webhook for breaking schema changes
  • Conversion must be lossless — fields that cannot round-trip should be preserved in annotations
  • Migrate existing objects to the new storage version after promoting it, then deprecate the old served version

What’s Next

EP09: Admission Webhooks completes the Kubernetes extension picture — validating and mutating webhooks that intercept API requests before they reach etcd, when to use them alongside CRDs, and how they differ from CEL validation.

Get EP09 in your inbox when it publishes → subscribe at linuxcent.com