Zero Trust Identity: SPIFFE, SPIRE, mTLS, and Continuous Verification

Reading Time: 7 minutes

The Identity Stack, Episode 13
EP12: Entra ID + LinuxEP13


TL;DR

  • Zero Trust means “never trust, always verify” — identity is verified continuously, not just at login time; network location provides no implicit trust
  • Human identity (users) and workload identity (services, pods, jobs) are separate problems — LDAP/Kerberos/OIDC solve the human side; SPIFFE/SPIRE solve the workload side
  • SPIFFE (Secure Production Identity Framework For Everyone) defines a standard for workload identity — a SPIFFE ID is a URI like spiffe://corp.com/ns/prod/sa/payments-svc
  • SPIRE (SPIFFE Runtime Environment) issues short-lived X.509 SVIDs (SPIFFE Verifiable Identity Documents) to workloads — certificates that rotate automatically, every hour
  • mTLS (mutual TLS) is how workloads prove identity to each other — both sides present certificates; no passwords, no API keys
  • The evolution: /etc/passwd (1970) → NIS → LDAP → Kerberos → SAML → OIDC → SPIFFE/SPIRE — the problem has always been the same; the trust boundary keeps moving outward

The Big Picture: From /etc/passwd to Zero Trust

1970s  /etc/passwd              ← trust: the local machine
       One machine, one user list

1984   NIS / Yellow Pages       ← trust: the local network
       Centralized, but cleartext, flat

1993   LDAP                     ← trust: the directory server
       Hierarchical, scalable, encrypted (eventually)

1988   Kerberos                 ← trust: the KDC
       Tickets instead of passwords, network-wide

2002   SAML                     ← trust: the IdP assertion
       Identity crosses the internet

2014   OIDC / OAuth2            ← trust: the JWT signature
       API-native, mobile-native, developer-native

2017   SPIFFE / SPIRE           ← trust: the workload certificate
       Automated identity for services, not humans

2026   Zero Trust               ← trust: nothing, verify everything
       Continuous verification, short-lived credentials,
       device posture, behavioral signals

EP01 of this series started with the chaos of per-machine /etc/passwd. This episode — EP13 — closes the loop: from that chaos to a model where identity is verified continuously, credentials expire in hours not years, and the network provides no implicit trust.


The Assumption That Zero Trust Rejects

Traditional security assumed: if you’re on the internal network, you’re trusted. A VPN user was treated as equivalent to someone at a desk in the office. A service running on the same Kubernetes node as another service was implicitly trusted.

That assumption broke in practice:

  • Compromised VPN credentials gave attackers full internal access
  • Lateral movement after initial compromise was easy — once inside, everything trusted you
  • Perimeter-based security had no visibility into east-west traffic (service-to-service)

Zero Trust inverts the model: the network provides no trust. Every access request is verified — user or service, internal or external, first request or hundredth. Trust is dynamic, contextual, and short-lived.


Human Zero Trust: Continuous Verification

For human users, Zero Trust extends OIDC and Conditional Access:

Short-lived tokens. Access tokens expire in 1 hour (OIDC standard). Refresh tokens are revocable. A user who is terminated can have their refresh tokens revoked in Entra ID — the next time their app tries to use the refresh token, it fails. The maximum blast radius of a stolen token is bounded by its lifetime.

Device posture. The device the user authenticates from is part of the identity assertion. Conditional Access can require: device is managed (Intune-enrolled), device is compliant (no malware, full-disk encryption enabled, OS patched). A valid user credential from an unmanaged device is denied.

Behavioral signals. Entra ID Identity Protection and similar systems analyze login patterns — unusual location, impossible travel (login from Mumbai, then New York 5 minutes later), unfamiliar device. High-risk sign-ins trigger step-up authentication or are blocked automatically.

Privileged Access Management (PAM). For privileged operations (production shell access, AD admin), Zero Trust adds time-bounded just-in-time access:

Request:  "I need admin access to db01.corp.com for 2 hours to investigate an incident"
Approval: Manager approves via Slack/email/ticketing system
Grant:    Temporary role assignment or password checkout from the PAM vault
Access:   User SSHes with a one-time or time-limited credential
Expire:   Credential automatically revoked after 2 hours
Audit:    Full session recording available for review

CyberArk, BeyondTrust, and HashiCorp Vault implement this model. Vault’s SSH Secrets Engine issues short-lived SSH certificates:

# Request a signed SSH certificate (valid 30 minutes)
vault ssh \
  -role=prod-admin \
  -mode=ca \
  -mount-point=ssh-client-signer \
  [email protected]

# Vault issues a certificate signed by the server's trusted CA
# sshd on db01 trusts that CA — no authorized_keys needed
# Certificate expires in 30 minutes — no cleanup required

Workload Identity: The Non-Human Problem

Services don’t have passwords they can type. A microservice calling another microservice needs to prove its identity — but you can’t give a Kubernetes pod a static API key (it’ll be in a config file, in a git repo, or in a crash dump within 6 months).

Workload identity solves this with short-lived, automatically rotated certificates — the service’s identity is its certificate, issued by a trusted CA, expiring in minutes to hours.

Traditional:                     Zero Trust:
  payments-svc → orders-svc        payments-svc → orders-svc
  Authentication: API key           Authentication: mTLS (X.509 cert)
  "Bearer sk_live_abc123"           cert: spiffe://corp.com/ns/prod/sa/payments-svc
  Rotation: manual (rarely done)    Rotation: automatic, every hour
  Revocation: change the key        Revocation: cert expires; new cert issued
  Audit: "API key was used"         Audit: "spiffe://payments-svc → spiffe://orders-svc"

SPIFFE: The Standard

SPIFFE (Secure Production Identity Framework For Everyone) defines what a workload identity looks like. The core concept is the SPIFFE ID — a URI in the format:

spiffe://<trust-domain>/<workload-path>

Examples:
  spiffe://corp.com/ns/prod/sa/payments-svc
  spiffe://corp.com/region/us-east/service/auth-api
  spiffe://corp.com/k8s/cluster-prod/namespace/payments/pod/payments-svc-abc123

The trust domain (corp.com) is the organizational boundary. The path is the workload identifier — typically encoding namespace, service account, or cluster information.

A SPIFFE ID is embedded in an SVID (SPIFFE Verifiable Identity Document) — either an X.509 certificate (X.509-SVID) or a JWT (JWT-SVID). The X.509-SVID is the standard form: the SPIFFE ID appears in the certificate’s Subject Alternative Name (SAN) field.

X.509 Certificate (SVID):
  Subject: CN=payments-svc
  SAN: URI=spiffe://corp.com/ns/prod/sa/payments-svc
  Validity: 1 hour
  Issuer: SPIRE Intermediate CA
  Signed by: corp.com trust bundle

Any service that has the corp.com trust bundle (the CA certificate chain) can verify that a certificate with spiffe://corp.com/... in the SAN was issued by the authorized CA for that trust domain.


SPIRE: The Runtime

SPIRE (SPIFFE Runtime Environment) is the reference implementation that issues SVIDs to workloads.

SPIRE Server
  ├── Node attestation: verifies the identity of the node/VM
  │   (AWS instance identity document, GCP service account, k8s node SA)
  └── Workload attestation: verifies the identity of the process
      (Kubernetes SA, Unix UID/GID, Docker container labels)
         │
         │ issues X.509 SVIDs (short-lived, auto-rotated)
         ▼
SPIRE Agent (runs on every node)
         │
         │ SPIFFE Workload API (Unix socket)
         ▼
Workload (your service)
  → gets its own certificate
  → gets the trust bundle (CA certs of trusted domains)
  → uses cert for mTLS with other services

The workload fetches its identity via the Workload API socket — no environment variables, no file mounts. The SPIRE Agent pushes new certificates before the old ones expire. Rotation is transparent to the workload.

# On a node with SPIRE Agent running:
# Fetch the SVID for the current workload
spire-agent api fetch x509 \
  -socketPath /run/spire/sockets/agent.sock

# Output shows:
# SPIFFE ID: spiffe://corp.com/ns/prod/sa/payments-svc
# Certificate: (PEM)
# Trust bundle: (PEM of issuing CA chain)
# Expires: 2026-04-27T02:00:00Z (1 hour from now)

mTLS: Both Sides Show ID

Mutual TLS (mTLS) is what makes SPIFFE useful operationally. In standard TLS, only the server presents a certificate — the client just verifies it. In mTLS, both sides present certificates. Both sides verify the other’s certificate against the trust bundle.

payments-svc → orders-svc connection:

TLS handshake:
  payments-svc presents: spiffe://corp.com/ns/prod/sa/payments-svc cert
  orders-svc presents:   spiffe://corp.com/ns/prod/sa/orders-svc cert

  Both verify:
    • cert signed by trusted CA (the corp.com SPIRE CA)
    • cert not expired
    • SPIFFE ID in SAN matches what's expected

  After handshake: encrypted channel, both sides verified
  Authorization: orders-svc checks its policy:
    "is spiffe://corp.com/ns/prod/sa/payments-svc allowed to call /api/orders?"

Service meshes (Istio, Linkerd, Consul Connect) implement mTLS transparently — the application doesn’t handle certificates; the sidecar proxy does. In Istio’s case, Citadel (now istiod) acts as the SPIFFE-compatible CA, issuing certificates to envoy sidecars. The application code doesn’t change.


Open Policy Agent: Authorization After Identity

Zero Trust separates identity from authorization. Once you know who the caller is (SPIFFE ID, OIDC token, user cert), a policy engine decides what they can do.

OPA (Open Policy Agent) is the standard for this:

# opa-policy.rego
package authz

# payments-svc can read orders; nothing else can write orders
allow {
  input.caller == "spiffe://corp.com/ns/prod/sa/payments-svc"
  input.method == "GET"
  startswith(input.path, "/api/orders")
}

default allow = false

The service checks OPA on each request: “caller=X wants to do Y to Z — allowed?” OPA evaluates the policy and returns a decision. The policy is version-controlled, tested, and deployed independently of the service.


⚠ Common Misconceptions

“Zero Trust means no trust.” Zero Trust means trust is earned dynamically through verification, not granted by network location. A verified user with a valid, compliant device and MFA is trusted — for the scope and duration of the verified session. The “zero” refers to implicit trust, not trust itself.

“SPIFFE replaces OIDC.” SPIFFE is for workload (service) identity. OIDC is for human (user) identity. They complement each other — a service has a SPIFFE identity; a user has an OIDC identity; the authorization layer accepts both.

“mTLS is complex to implement.” With a service mesh (Istio, Linkerd), mTLS is transparent — the sidecar handles it. Without a service mesh, the application needs to use the SPIFFE Workload API. The complexity is real but manageable, especially compared to the alternative of static API keys.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management Zero Trust extends IAM to workloads (SPIFFE) and continuous verification (short-lived tokens, device posture) — it’s the current frontier of identity architecture
CISSP Domain 3: Security Architecture and Engineering The separation of identity (SPIFFE ID), authentication (mTLS), and authorization (OPA) is a clean architectural decomposition that scales to complex multi-service environments
CISSP Domain 4: Communications and Network Security mTLS encrypts and authenticates every service-to-service connection — it eliminates the assumption that east-west traffic on the internal network is safe
CISSP Domain 1: Security and Risk Management Zero Trust is a risk management posture — it accepts that perimeter breach is inevitable and limits blast radius through continuous verification and least-privilege

Key Takeaways

  • Zero Trust rejects network-based implicit trust — every request is verified regardless of source
  • Human identity: short-lived OIDC tokens, device posture checks, Conditional Access, JIT privileged access (Vault, CyberArk)
  • Workload identity: SPIFFE IDs in X.509 certificates, issued by SPIRE, rotated automatically every hour — no static API keys
  • mTLS lets services verify each other’s identity at the TLS layer — service meshes (Istio, Linkerd) implement it transparently
  • OPA handles authorization after identity is established — who you are ≠ what you can do
  • The series arc: /etc/passwd → NIS → LDAP → Kerberos → SAML → OIDC → SPIFFE/SPIRE — the problem has always been “how do you know who someone is, at scale, without trusting the network?” The answer keeps getting better.

What does identity look like at your organization — still static API keys and shared service accounts, or moving toward SPIFFE and short-lived credentials? 👇


The Identity Stack: From LDAP to Zero Trust — 13 episodes complete.

Start from EP01: What Is LDAP →

Entra ID Linux Login: SSH Authentication with Azure AD Credentials

Reading Time: 6 minutes

The Identity Stack, Episode 12
EP11: Identity ProvidersEP12EP13: Zero Trust Identity → …


TL;DR

  • Entra ID (Azure AD) Linux login lets you SSH into a VM using your Azure AD credentials — no local Linux accounts, no SSH keys to distribute
  • The stack: aad-auth package + pam_aad.so + SSSD — Azure authenticates via OIDC device code flow or password, then maps the identity to a local Linux UID
  • Entra ID is not AD — it’s OIDC/OAuth2 native, with no LDAP and no Kerberos (unless you add Azure AD DS, a separate managed service)
  • Conditional Access Policies can gate Linux logins — MFA, device compliance, location restrictions — the same policies as for web apps
  • Two login modes: interactive (browser-based device code, for non-Azure VMs) and integrated (Azure IMDS-based, for Azure VMs)
  • Required roles: Virtual Machine Administrator Login or Virtual Machine User Login on the VM — IAM, not local sudoers

The Big Picture: How Entra ID Linux Login Works

User: ssh [email protected]

  sshd on Linux VM
      │
      ▼
  PAM (/etc/pam.d/sshd)
      │
      ├── pam_aad.so (auth)
      │     │
      │     │  OIDC device code flow:
      │     │  "Go to microsoft.com/devicelogin and enter code ABCD-1234"
      │     │  User authenticates in browser with MFA
      │     │  Entra ID issues id_token + access_token
      │     ▼
      │   pam_aad validates token:
      │     • signature (JWKS from Entra ID)
      │     • tenant ID (iss claim)
      │     • VM resource audience (aud claim)
      │     • group membership (groups claim)
      │
      └── pam_mkhomedir (session)
            Creates /home/[email protected] on first login

  Shell session created
  whoami → vamshi_corp_com (sanitized UPN for Linux username)

EP11 mapped the IdP landscape. This episode gets specific: Entra ID and Linux. Understanding this matters because Entra ID is increasingly where enterprise identities live, and cloud VMs that SSH into with local accounts are an operational and security liability.


Entra ID vs Active Directory: What’s Different

This distinction matters before configuring anything.

Active Directory (on-prem) Entra ID (cloud)
Protocol LDAP + Kerberos OIDC + OAuth2
Directory queries ldapsearch Microsoft Graph API
Linux join realm join (adcli + SSSD) aad-auth package
Authentication Kerberos tickets JWT tokens
Group policy GPO via Sysvol Conditional Access + Intune
Network requirement DC reachable on LAN/VPN HTTPS to login.microsoftonline.com

Entra ID has no LDAP interface and no Kerberos realm. You cannot run ldapsearch against it. You cannot kinit to it. The authentication protocol is entirely OIDC/OAuth2 — the same protocol your browser uses to “Login with Microsoft.”

If you need LDAP and Kerberos from Azure, that’s Azure AD Domain Services — a separate managed service that Microsoft runs, which does speak LDAP and Kerberos. It’s not Entra ID; it’s a managed AD replica in Azure. EP12 covers the Entra ID path — the modern, protocol-native approach.


Prerequisites

# Azure side:
# 1. The VM's managed identity must be enabled (System-assigned)
# 2. Two Entra ID roles assigned on the VM resource:
#    - "Virtual Machine Administrator Login" (for sudo access)
#    - "Virtual Machine User Login" (for regular access)
# 3. Conditional Access policies that apply to the VM login scope

# VM side (Ubuntu 20.04+ / RHEL 8+):
# Install the aad-auth package (Microsoft-maintained)
curl -sSL https://packages.microsoft.com/keys/microsoft.asc \
  | gpg --dearmor -o /usr/share/keyrings/microsoft.gpg
echo "deb [signed-by=/usr/share/keyrings/microsoft.gpg] \
  https://packages.microsoft.com/ubuntu/22.04/prod jammy main" \
  > /etc/apt/sources.list.d/microsoft.list
apt-get update && apt-get install -y aad-auth

Configuration

# Configure the aad-auth package
aad-auth configure \
  --tenant-id 12345678-1234-1234-1234-123456789abc \
  --app-id 87654321-4321-4321-4321-cba987654321

# This writes /etc/aad.conf:
# [aad]
# tenant_id = 12345678-...
# app_id = 87654321-...
# version = 1

# Verify the PAM configuration was updated
grep pam_aad /etc/pam.d/common-auth
# auth [success=1 default=ignore] pam_aad.so

The aad-auth package installs pam_aad.so and configures PAM automatically. It also modifies /etc/nsswitch.conf to add aad as a source for passwd lookups — so getent passwd [email protected] works after the first login.


The Login Flow

On an Azure VM (Integrated mode)

Azure VMs have access to the Instance Metadata Service (IMDS) at 169.254.169.254. pam_aad uses the VM’s managed identity to get a token from IMDS, which proves the VM is trusted, then validates the user’s token against the tenant.

# User SSHes with username as UPN ([email protected] or [email protected])
ssh [email protected]@vm.eastus.cloudapp.azure.com

# Or use the short form if the tenant is configured:
ssh [email protected]@vm.eastus.cloudapp.azure.com

On first connection, pam_aad initiates the device code flow:

To sign in, use a web browser to open https://microsoft.com/devicelogin
and enter the code ABCD-1234 to authenticate.

The user opens the URL in any browser (on any device), enters the code, and authenticates with their Entra ID credentials + MFA. The SSH session gets a token. Subsequent logins within the token cache TTL skip the device code step.

Username format on the Linux system

Entra ID usernames (UPNs) contain @ — not valid in Linux usernames. aad-auth sanitizes the UPN:

[email protected] → vamshi_corp_com    (default)
# or, with shorter_username enabled in /etc/aad.conf:
[email protected] → vamshi

The UID is derived from the Azure AD Object ID (a deterministic hash) — stable across logins, same UID on every VM in the tenant.


Conditional Access for Linux Logins

Conditional Access Policies in Entra ID apply to Linux VM logins the same way they apply to web app logins.

Policy: Require MFA for Linux VM Login
  Conditions:
    Cloud apps: "Azure Linux Virtual Machine Sign-In"
    Users: All users (or specific groups)
  Grant:
    Require multi-factor authentication
    Require compliant device (optional)

With this policy, every SSH login triggers MFA — regardless of whether the client machine supports it. The MFA challenge appears in the device code flow (the browser window the user opens).

You can also enforce:
Location restrictions — only from corporate IP ranges
Device compliance — device must be Intune-managed
Sign-in risk — block logins flagged as risky by Entra ID Identity Protection

This is the operational shift: Linux login security is now managed in the same Conditional Access policy engine as every other Entra ID-protected resource. No more per-machine PAM configuration for MFA.


Role-Based Access: Who Can Log In

Access to the VM is controlled by Azure RBAC — not by local Linux groups or sudoers.

# Grant a user SSH access to the VM
az role assignment create \
  --assignee [email protected] \
  --role "Virtual Machine User Login" \
  --scope /subscriptions/SUB_ID/resourceGroups/RG/providers/Microsoft.Compute/virtualMachines/VM_NAME

# Grant admin (sudo) access
az role assignment create \
  --assignee [email protected] \
  --role "Virtual Machine Administrator Login" \
  --scope /subscriptions/SUB_ID/...

Virtual Machine Administrator Login maps to the sudo group on the Linux VM. Users with this role get passwordless sudo. Users with Virtual Machine User Login get a regular shell.

The mapping is enforced by pam_aad checking the groups claim in the token against the configured admin group. No /etc/sudoers.d/ files needed.


Debugging Entra ID Linux Logins

# Check aad-auth service status
systemctl status aad-auth

# View aad-auth logs
journalctl -u aad-auth -f

# Attempt a manual token validation (requires aad-auth debug mode)
aad-auth login --username [email protected]

# Check the local user cache
getent passwd vamshi_corp_com
# Returns if the user has logged in before

# Clear the local cache (forces re-authentication)
aad-auth clean-cache

# Verify Conditional Access isn't blocking (check Entra ID Sign-in logs)
# Azure Portal → Entra ID → Sign-in logs → filter by user + app "Azure Linux VM Sign-In"

The Entra ID Sign-in logs in the Azure Portal show every authentication attempt, the Conditional Access policies that evaluated, which ones passed/failed, and the exact failure reason. This is far more diagnostic than reading PAM logs.


Entra ID Connect: Bringing On-Prem Users to Entra ID

For organizations with existing on-prem AD who want to enable Entra ID Linux login:

On-prem AD users → Entra ID Connect sync → Entra ID
                                                │
                                    Linux VM login (aad-auth)

Entra ID Connect is a Windows Server application that syncs users from on-prem AD to Entra ID every 30 minutes. Users authenticate against Entra ID (which validates against AD via Password Hash Sync, Pass-Through Authentication, or Federation). The Linux VM doesn’t know or care — it sees an Entra ID token.

With Password Hash Sync: password hashes (not plaintext) are synced to Entra ID — users authenticate directly in the cloud.
With Pass-Through Authentication: Entra ID forwards authentication requests to an on-prem agent that validates against AD — no password hashes leave the datacenter.
With Federation (AD FS / Entra ID as a relying party): Entra ID delegates authentication to AD FS — the most complex, the most on-prem control.


⚠ Common Misconceptions

“Entra ID = Azure Active Directory = Active Directory.” Three different things. Active Directory: on-prem, LDAP+Kerberos. Azure AD (now Entra ID): cloud, OIDC+OAuth2. Azure AD Domain Services: managed AD replica in Azure, LDAP+Kerberos, not Entra ID.

“You need Azure AD DS to join Linux to Azure.” Azure AD DS is the managed AD service. Entra ID Linux login (via aad-auth) is entirely separate and doesn’t require AD DS. You can authenticate Linux to Entra ID directly via OIDC.

“The Linux username matches the Entra ID username.” The UPN is sanitized (@_) to produce a valid Linux username. The canonical identity is the UPN or the Entra Object ID. Don’t hardcode the sanitized username in scripts.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management Entra ID Linux login centralizes Linux VM access in the same IAM system as all other enterprise resources — one policy engine, one audit log
CISSP Domain 3: Security Architecture and Engineering Eliminating per-VM local accounts removes a class of credential management risk — no SSH keys to rotate, no local accounts to audit
CISSP Domain 1: Security and Risk Management Conditional Access Policies enforcing MFA on Linux logins reduce the risk of credential-based compromise of cloud VMs

Key Takeaways

  • Entra ID Linux login uses OIDC device code flow — no LDAP, no Kerberos, no local Linux accounts
  • aad-auth package installs pam_aad.so and handles the full authentication stack: token issuance, validation, user cache, UID mapping
  • VM access is controlled by Azure RBAC roles (Virtual Machine Administrator Login / Virtual Machine User Login) — not by sudoers files
  • Conditional Access Policies apply to Linux VM logins — MFA, device compliance, and location restrictions use the same engine as every other Entra ID app
  • Debugging starts in Entra ID Sign-in logs (Azure Portal), not in /var/log/auth.log

What’s Next

EP12 showed how Entra ID enables Linux logins in the cloud. EP13 is the series closer: Zero Trust identity — what it means to verify identity continuously, how SPIFFE and SPIRE handle workload (non-human) identity, and where the stack goes from /etc/passwd in 1970 to a Zero Trust policy engine in 2026.

Next: Zero Trust Identity: SPIFFE, SPIRE, mTLS, and Continuous Verification

Get EP13 in your inbox when it publishes → linuxcent.com/subscribe

Identity Providers Explained: On-Prem, Cloud, SCIM, and Federation

Reading Time: 6 minutes

The Identity Stack, Episode 11
EP10: SAML/OIDCEP11EP12: Entra ID + Linux → …


TL;DR

  • An Identity Provider (IdP) is the system that authenticates users and issues identity assertions (SAML assertions, OIDC tokens) to applications
  • On-prem IdPs: AD FS (Microsoft), Shibboleth (universities), Keycloak (open source), Ping Identity — they sit in front of AD and speak SAML/OIDC to cloud apps
  • Cloud IdPs: Okta, Entra ID (Azure AD), Google Workspace, Ping Identity Cloud — they are the directory and the authentication layer in one
  • Federation: IdPs can trust each other — a corporate IdP can delegate to a cloud IdP, or federate with a partner org’s IdP
  • SCIM (System for Cross-domain Identity Management) is provisioning, not authentication — it creates/updates/deactivates user accounts in target systems when the source directory changes
  • The key distinction: federation (authentication flow) vs directory sync (data copy) — they solve different problems and are often deployed together

The Big Picture: Where IdPs Sit

                        On-prem Directory
                        (Active Directory / OpenLDAP / FreeIPA)
                               │
                               │ LDAP / Kerberos
                               ▼
                         Identity Provider
                         ┌──────────────────────────────────┐
                         │  AD FS / Keycloak / Okta /       │
                         │  Entra ID Connect / Shibboleth   │
                         │                                  │
                         │  Speaks: SAML 2.0 + OIDC + OAuth2│
                         └────────────────┬─────────────────┘
                                          │ assertions / tokens
                      ┌───────────────────┼───────────────────┐
                      ▼                   ▼                   ▼
               Salesforce          GitHub Enterprise      AWS IAM
               (SAML SP)           (OIDC RP)              (OIDC)

EP10 covered the protocols. This episode covers the systems — what an IdP actually does, how the major ones differ, and how they connect to each other through federation and SCIM.


On-Premises Identity Providers

AD FS (Active Directory Federation Services)

AD FS is Microsoft’s on-prem federation server — a Windows Server role that sits in front of Active Directory and speaks SAML 2.0 and OIDC to external applications.

What it does:
– Authenticates users against AD (Kerberos/LDAP behind the scenes)
– Issues SAML assertions and OIDC tokens to external SPs
– Handles claims transformation: maps AD attributes to what the SP expects

What it doesn’t do well:
– It’s Windows Server only
– Configuration is complex (XML, certificates, claim rule language)
– No built-in MFA (requires Azure MFA or a third-party provider)
– Being deprecated in favor of Entra ID for most use cases

AD FS made sense when everything was on-prem. As workloads move to cloud, Entra ID Connect (a lighter sync agent) combined with Entra ID as the IdP replaces AD FS for most enterprises.

Keycloak

Keycloak is the open-source IdP from Red Hat. It’s what FreeIPA uses for web-based OIDC/SAML SSO, and it’s widely deployed independently for organizations that want full control over their identity infrastructure.

# Run Keycloak in development mode (Docker)
docker run -p 8080:8080 \
  -e KEYCLOAK_ADMIN=admin \
  -e KEYCLOAK_ADMIN_PASSWORD=admin \
  quay.io/keycloak/keycloak:latest \
  start-dev

# Keycloak concepts:
# Realm     — an isolated namespace (like a tenant)
# Client    — an application that uses Keycloak for auth (SP/RP)
# User federation — connect Keycloak to an existing LDAP/AD directory
# Identity brokering — federate with external IdPs (Google, GitHub, another SAML IdP)

Keycloak reads users from AD/LDAP via its User Federation feature — it doesn’t replace the directory, it federates it. Users still live in AD; Keycloak issues SAML/OIDC tokens based on those users.

Shibboleth

Shibboleth is the dominant IdP in academia. Most universities run it. It’s SAML-native, designed for federation between institutions — a student can authenticate at their home university’s IdP and access resources at a partner institution.


Cloud Identity Providers

Okta

Okta is a cloud IdP + directory. It can:
– Act as the primary user directory (storing users, credentials)
– Connect to on-prem AD via the Okta Active Directory Agent (a lightweight sync service)
– Federate with other IdPs (act as IdP or SP in a SAML/OIDC chain)
– Enforce MFA, Adaptive Authentication, Device Trust

Okta’s Lifecycle Management handles provisioning: when a user is created/disabled in Okta (or synced from AD), Okta can automatically create/deactivate accounts in downstream SaaS apps — via SCIM or app-specific APIs.

Entra ID (Azure Active Directory)

Entra ID is Microsoft’s cloud IdP. It’s both a directory (stores users, groups) and an IdP (issues tokens). For organizations running on-prem AD, Entra ID Connect syncs users from AD to Entra ID.

Entra ID is OIDC and OAuth2 native — it speaks SAML for legacy apps but JWT/OIDC for everything modern. Its OIDC implementation follows the standard closely; its token validation happens via /.well-known/openid-configuration and the JWKS endpoint.

On-prem AD  →  Entra ID Connect (sync agent)  →  Entra ID (cloud)
                                                      │
                                              SAML / OIDC
                                                      │
                                            SaaS apps, Azure resources

Google Workspace

Google Workspace is Google’s combined directory + IdP. Google accounts are the users. Apps integrate via SAML or OIDC. Google’s OIDC implementation is one of the most widely used reference implementations — most OIDC libraries are tested against it.


Federation: IdPs Trusting Each Other

Federation is the mechanism that lets IdPs delegate to each other. Two patterns:

SAML Federation (IdP-to-IdP)

Common in academia and partner integrations:

User at University A → requests resource at University B
                              │
                              │ doesn't know user
                              ▼
                    University B SP redirects to...
                    Discovery Service: "which IdP are you from?"
                              │
                              ▼
                    University A IdP authenticates user
                              │
                    Sends SAML assertion to University B SP

University B’s SP trusts University A’s IdP because both are members of a SAML federation (e.g., InCommon in the US, eduGAIN globally). The federation metadata aggregates all members’ SAML metadata — certificates, endpoints — so members don’t have to manually configure each bilateral trust.

OIDC Identity Brokering

Keycloak, Okta, and Entra ID can all act as identity brokers — they sit between the application and the actual authenticating IdP:

App (OIDC RP) → Keycloak (broker IdP) → Google / GitHub / SAML IdP
                                               │ authenticate
                                               ▼
                                      Keycloak receives assertion
                                      Maps external claims to local claims
                                      Issues OIDC token to app

The app only knows Keycloak. Keycloak handles the upstream IdP complexity.


SCIM: Provisioning ≠ Authentication

SCIM (RFC 7644) is a REST API standard for user lifecycle management — creating, updating, and deactivating user accounts in a target system when changes happen in the source directory.

Source (Okta / Entra ID)           Target (Slack / GitHub / Jira)
         │                                    │
         │  SCIM 2.0 (REST + JSON)            │
         ├─ POST /Users  ─────────────────────► create user
         ├─ PATCH /Users/id ──────────────────► update attributes
         └─ DELETE /Users/id ─────────────────► deactivate account

SCIM is not SSO. A SCIM-provisioned user in Slack can log in to Slack — but the authentication still goes through the IdP (SAML/OIDC). SCIM ensures the account exists. The IdP proves the user’s identity.

Why both? Because SSO alone doesn’t create accounts in target systems — it just authenticates to them. If a user tries to log in to Slack for the first time via SSO, Slack needs an account to map them to. SCIM creates that account before the first login (Just-in-Time provisioning handles it at first login, but SCIM handles it in bulk and handles deprovisioning reliably).

Deprovisioning is where SCIM matters most. When an employee leaves, you disable them in Okta — SCIM deactivates their account in every connected app within minutes. Without SCIM, IT runs a manual checklist. Someone misses Jira. The ex-employee has access for three weeks.


Directory Sync vs Federation

These are commonly confused:

Directory sync — copy user data from source to target. Entra ID Connect copies users from on-prem AD to Entra ID. This is not authentication; it’s data replication. After sync, Entra ID has its own copy of the user record.

Federation — delegate authentication to an external IdP. The target system doesn’t store credentials; it redirects to the IdP for authentication and trusts the assertion that comes back.

You often need both:
– Sync: so the target system has the user record and can enforce policies (group membership, license assignment)
– Federation: so the user authenticates against the source of truth (your IdP) rather than maintaining a separate password in every system


⚠ Common Misconceptions

“SCIM is an authentication protocol.” SCIM is a provisioning protocol. It creates and manages accounts. Authentication is SAML/OIDC. Both solve different parts of the identity lifecycle problem.

“SSO means you only have one password.” SSO means you only authenticate once per session. The password still exists (at the IdP). SSO reduces the number of authentication events, not the number of credentials.

“On-prem IdP + cloud sync is the same as a cloud IdP.” With on-prem IdP + cloud sync (e.g., AD + Entra ID Connect), authentication happens via the on-prem IdP — if it goes down, cloud SSO breaks. A pure cloud IdP (Okta standalone, Entra ID without on-prem AD) authenticates entirely in the cloud.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management IdPs are the central control plane for federated identity — their architecture, trust relationships, and provisioning workflows define the enterprise IAM posture
CISSP Domain 1: Security and Risk Management SCIM-based deprovisioning is an access control risk management practice — without it, terminated employee access persists across connected systems
CISSP Domain 3: Security Architecture and Engineering The choice of on-prem vs cloud IdP, federation vs sync, and SCIM vs JIT provisioning are architectural decisions with long-term operational and security implications

Key Takeaways

  • An IdP authenticates users and issues assertions (SAML) or tokens (OIDC/OAuth2) — applications trust the IdP, not the user directly
  • On-prem: AD FS (Windows/legacy), Keycloak (open source, flexible), Shibboleth (academia)
  • Cloud: Okta (cloud-native, strong lifecycle management), Entra ID (Microsoft-integrated), Google Workspace
  • Federation = authentication delegation between IdPs; Directory sync = data replication; SCIM = account lifecycle (provisioning/deprovisioning)
  • SCIM deprovisioning is the critical control — it ensures ex-employees lose access automatically across all connected systems

What’s Next

EP11 covered the IdP landscape. EP12 gets specific: Entra ID and Linux — how you configure a Linux VM to accept SSH logins authenticated against Azure AD credentials, and how the aad-auth / pam_aad stack works end to end.

Next: Entra ID Linux Login: SSH Authentication with Azure AD Credentials

Get EP12 in your inbox when it publishes → linuxcent.com/subscribe

SAML vs OIDC vs OAuth2: Which Protocol Handles Which Identity Problem

Reading Time: 6 minutes

The Identity Stack, Episode 10
EP09: Active DirectoryEP10EP11: Identity Providers → …


TL;DR

  • SAML 2.0 is a federation protocol for browser-based SSO — an IdP issues a signed XML assertion that a Service Provider trusts; designed for enterprise applications
  • OAuth2 is an authorization delegation protocol, not authentication — it lets an application act on your behalf without knowing your password; the access token says what, not who
  • OIDC (OpenID Connect) = OAuth2 + an identity layer — adds the id_token (a JWT containing who you are) on top of OAuth2’s access_token (what you can do)
  • SAML vs OIDC: SAML is XML, enterprise-native, stateful; OIDC is JSON/JWT, API-native, stateless — new applications almost always use OIDC
  • The id_token is a JWT — decode it at jwt.io and read every claim — it tells you exactly what the IdP asserts about the user
  • The browser SSO flow is three redirects: user → SP → IdP (authenticate) → SP (consume assertion)

The Problem: LDAP and Kerberos Don’t Cross the Internet

EP09 showed how authentication works inside a corporate network. LDAP and Kerberos both assume network proximity to the directory server — firewall-friendly ports don’t help when the authentication protocol requires a direct connection to the KDC or directory.

Internal network: works
  Browser → intranet app → LDAP/Kerberos → AD DC (all on 10.0.0.0/8)

Internet: breaks
  Browser → SaaS app (AWS) → LDAP/Kerberos → AD DC (on-prem behind firewall)
  ✗ KDC not reachable across NAT
  ✗ LDAP not exposed to internet (shouldn't be)
  ✗ Every SaaS app can't have its own LDAP connection to your DC

SAML was invented in 2002 to solve this. OIDC in 2014. Both let identity assertions travel over HTTPS — the one protocol that crosses every firewall.


SAML 2.0: Enterprise Browser SSO

SAML 2.0 has three actors: the User, the Identity Provider (IdP), and the Service Provider (SP).

1. User visits SP (e.g., Salesforce)
   SP: "I don't know this user — send them to the IdP"
   ↓  HTTP redirect with SAMLRequest (base64-encoded AuthnRequest)

2. User arrives at IdP (e.g., Okta, AD FS, Entra ID)
   IdP: "Authenticate me" → user enters credentials
   IdP: generates a signed SAML Assertion (XML)
   ↓  HTTP POST to SP's Assertion Consumer Service (ACS) URL

3. SP receives the SAMLResponse
   SP: verifies the signature using IdP's public key
   SP: extracts user attributes from the Assertion
   SP: creates a session — user is logged in

The SAML Assertion is an XML document signed by the IdP. It contains:

<saml:Assertion>
  <saml:Issuer>https://idp.corp.com</saml:Issuer>
  <saml:Subject>
    <saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">
      [email protected]
    </saml:NameID>
  </saml:Subject>
  <saml:Conditions
    NotBefore="2026-04-27T01:00:00Z"
    NotOnOrAfter="2026-04-27T01:05:00Z">  ← short-lived: replay protection
  </saml:Conditions>
  <saml:AttributeStatement>
    <saml:Attribute Name="email">
      <saml:AttributeValue>[email protected]</saml:AttributeValue>
    </saml:Attribute>
    <saml:Attribute Name="groups">
      <saml:AttributeValue>engineers</saml:AttributeValue>
      <saml:AttributeValue>sre-team</saml:AttributeValue>
    </saml:Attribute>
  </saml:AttributeStatement>
</saml:Assertion>

The SP trusts the assertion because it’s signed with the IdP’s private key, and the SP has the IdP’s public certificate configured. No direct connection between SP and IdP needed during authentication — only the browser carries the assertion.

SP-initiated vs IdP-initiated:
– SP-initiated: user visits the SP, gets redirected to IdP, authenticates, redirected back — the common flow
– IdP-initiated: user starts at the IdP (e.g., company portal), clicks an app, IdP sends assertion directly — simpler but no SP-generated RequestID, so the SP can’t verify the request was expected (a security concern)


OAuth2: Authorization Delegation (Not Authentication)

This distinction is important and consistently confused: OAuth2 is for authorization, not authentication.

OAuth2 solves: “I want to let GitHub Actions post to my Slack without giving GitHub my Slack password.”

Resource Owner (you)  → grants permission to →  Client (GitHub Actions)
                                                        │
                                                        │ access_token
                                                        ▼
                                               Resource Server (Slack API)
                                               "this token can post messages"

The access_token answers “what can this client do?” not “who is this user?” A resource server receiving an access token knows the token is valid and what scopes it carries — it does not necessarily know which human authorized it.

The four OAuth2 grant types:

Grant Use case
Authorization Code Web apps (server-side) — most secure, recommended
PKCE (+ Auth Code) Native/SPA apps — Auth Code without client secret
Client Credentials Machine-to-machine (no user) — service accounts
Device Code Devices without browsers (smart TVs, CLIs)

The Implicit grant (tokens in URL fragment) is deprecated. Don’t use it.


OIDC: OAuth2 + Who You Are

OpenID Connect adds identity to OAuth2 by adding the id_token — a JWT that the IdP signs and that contains claims about the authenticated user.

Authorization Code flow with OIDC:

1. Client redirects user to IdP:
   GET /authorize?
     response_type=code
     &client_id=myapp
     &scope=openid email profile    ← "openid" scope triggers OIDC
     &redirect_uri=https://app.com/callback
     &state=random-nonce

2. IdP authenticates user, returns:
   GET /callback?code=AUTH_CODE&state=random-nonce

3. Client exchanges code for tokens:
   POST /token
   grant_type=authorization_code&code=AUTH_CODE...

4. IdP returns:
   {
     "access_token": "eyJ...",    ← what the user authorized
     "id_token": "eyJ...",        ← who the user is (JWT)
     "token_type": "Bearer",
     "expires_in": 3600
   }

The id_token decoded:

{
  "iss": "https://idp.corp.com",          ← issuer (the IdP)
  "sub": "user-guid-12345",               ← subject (stable user identifier)
  "aud": "myapp",                          ← audience (your client_id)
  "exp": 1745730000,                       ← expiry (Unix timestamp)
  "iat": 1745726400,                       ← issued at
  "email": "[email protected]",
  "name": "Vamshi Krishna",
  "groups": ["engineers", "sre-team"]     ← custom claims from IdP
}
# Decode any JWT at the command line (no verification — for debugging only)
echo "eyJ..." | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

# Or: jwt.io — paste the token, read every claim

sub is the stable user identifier. Email addresses change. Names change. The sub claim is the IdP’s internal identifier for the user — use it as the primary key when storing user data. Never store email as the primary key.


SAML vs OIDC: When to Use Which

SAML 2.0 OIDC
Format XML JSON / JWT
Transport HTTP POST (browser only) HTTP redirect + JSON API
Age 2002 2014
Enterprise adoption Very high (AD FS, Okta, Entra ID) Very high (newer apps)
API-friendly No Yes
Mobile apps No Yes
Complexity High (XML, schemas, signatures) Medium (JWT, JSON)
Single Logout Specified (rarely works well) Optional, inconsistent

Use SAML when: You’re integrating with an enterprise SaaS that only supports SAML (Salesforce classic, legacy HR systems), or your IdP team mandates it.

Use OIDC when: You’re building a new application, integrating with a modern IdP, or need API-based token validation. OIDC is the default for everything new.

Use OAuth2 (Client Credentials) when: Service-to-service authentication with no user — your CI/CD pipeline authenticating to an API, your microservice calling another microservice.


A Complete Browser SSO Flow (OIDC)

1. User visits https://app.corp.com (not logged in)
   App: no session → redirect to IdP

2. GET https://idp.corp.com/authorize?
        response_type=code
        &client_id=app-corp
        &scope=openid email
        &redirect_uri=https://app.corp.com/callback
        &state=abc123
        &nonce=xyz789

3. IdP: user is not authenticated → show login form
   User: enters [email protected] + password
   (or: IdP sees existing session cookie → skip login)

4. IdP: authentication success
   Redirect: GET https://app.corp.com/callback?code=AUTH_CODE&state=abc123

5. App (server-side): validate state=abc123 (CSRF protection)
   POST https://idp.corp.com/token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=app-corp
     &client_secret=SECRET
     &redirect_uri=https://app.corp.com/callback

6. IdP responds:
   { "id_token": "JWT...", "access_token": "JWT...", "expires_in": 3600 }

7. App: validate id_token signature (using IdP's JWKS endpoint)
   App: extract sub, email, groups from id_token
   App: create session for [email protected]
   App: redirect user to original destination

Step 7 is where most bugs live. The app must validate: signature (using IdP’s public keys from /.well-known/jwks.json), iss (matches the expected IdP), aud (matches the client_id), exp (not expired), and nonce (matches what was sent in step 2). Skip any of these and you have an authentication bypass.


⚠ Common Misconceptions

“OAuth2 is for login.” OAuth2 is for authorization delegation. It can be used as a login mechanism only when OIDC (the openid scope + id_token) is added on top. “Login with Google” uses OIDC, not bare OAuth2.

“JWTs are encrypted.” By default, JWTs are signed (JWS), not encrypted. The header and payload are base64url-encoded — anyone can decode them. Encryption (JWE) is a separate, less commonly used spec. Never put secrets in a JWT payload assuming it’s private.

“SAML Single Logout works reliably.” SAML SLO is specified but inconsistently implemented. Many SPs ignore SLO requests or don’t propagate them correctly. Don’t depend on SLO for security — session revocation requires additional mechanisms (short-lived tokens, token introspection, session registries).


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management SAML, OAuth2, and OIDC are the three protocols that enable federated identity and SSO — understanding which does what is foundational to modern IAM design
CISSP Domain 4: Communications and Network Security JWT validation (signature, claims, expiry) is a network security control — failing to validate any claim is an authentication bypass vulnerability
CISSP Domain 3: Security Architecture and Engineering The choice of SAML vs OIDC is an architectural decision that affects every application integration, mobile support, and API design

Key Takeaways

  • SAML 2.0: XML-based browser SSO — three redirects, signed assertion, enterprise legacy apps
  • OAuth2: authorization delegation — access tokens grant scopes, not identity
  • OIDC: OAuth2 + id_token — adds who the user is on top of what they can do
  • sub is the stable user identifier in OIDC — never use email as a primary key
  • JWT validation must check: signature, iss, aud, exp, nonce — missing any is a security bypass
  • New applications: OIDC. Legacy enterprise SaaS: SAML. Service-to-service: OAuth2 Client Credentials

What’s Next

EP10 covered the protocols. EP11 covers the systems that implement them — the identity providers: what Okta, Entra ID, Keycloak, and AD FS actually do, how they federate with each other, and how SCIM handles user provisioning separately from authentication.

Next: Identity Providers Explained: On-Prem, Cloud, SCIM, and Federation

Get EP11 in your inbox when it publishes → linuxcent.com/subscribe

How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Reading Time: 6 minutes

The Identity Stack, Episode 9
EP08: FreeIPAEP09EP10: SAML/OIDC → …


TL;DR

  • Active Directory is not a product that happens to use LDAP — it is an LDAP directory with a Microsoft-extended schema, a built-in Kerberos KDC, and DNS tightly integrated
  • Replication uses USNs (Update Sequence Numbers) and GUIDs — the Knowledge Consistency Checker (KCC) automatically builds the replication topology
  • Sites and site links tell AD which DCs are physically close — AD prefers to authenticate users against a DC in the same site to minimize WAN latency
  • Group Policy Objects (GPOs) are stored as LDAP entries (in the CN=Policies container) and Sysvol files — LDAP tells clients which GPOs apply; Sysvol delivers the policy files
  • Linux joins AD via realm join (uses adcli + SSSD) or net ads join (Samba + winbind) — both register a machine account in AD and get a Kerberos keytab
  • The difference between Linux in AD and Linux in FreeIPA: AD is optimized for Windows; FreeIPA is optimized for Linux — both interoperate

The Big Picture: What AD Actually Is

Active Directory Domain: corp.com
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  LDAP directory          Kerberos KDC                      │
│  ─────────────           ──────────                        │
│  Schema: 1000+ classes   Realm: CORP.COM                   │
│  Objects: users, groups, Issues TGTs + service tickets     │
│  computers, GPOs, OUs    Uses LDAP as the account DB       │
│                                                            │
│  DNS                     Sysvol (DFS share)                │
│  ────                    ────────────────                  │
│  SRV records for KDC     GPO templates                     │
│  and LDAP discovery      Login scripts                     │
│                          Replicated via DFSR               │
│                                                            │
│  Replication engine: USN + GUID + KCC                      │
└────────────────────────────────────────────────────────────┘
          │ replicates to          │ replicates to
          ▼                        ▼
   DC: dc02.corp.com        DC: dc03.corp.com

EP08 showed FreeIPA as the Linux-native answer to enterprise identity. AD is the Microsoft answer — and because most enterprises run Windows clients, understanding AD is unavoidable for Linux infrastructure engineers. This episode goes behind the LDAP and Kerberos protocols to explain what makes AD specifically work.


The AD Schema: LDAP With 1000+ Object Classes

AD’s schema extends the base LDAP schema with Microsoft-specific classes and attributes. Every user object is a user class (which extends organizationalPerson which extends person which extends top) with additional attributes like:

sAMAccountName   ← the pre-Windows 2000 login name (vamshi)
userPrincipalName ← the modern UPN ([email protected])
objectGUID       ← a globally unique 128-bit identifier (never changes, even if DN changes)
objectSid        ← Windows Security Identifier (used for ACL enforcement on Windows)
whenCreated      ← creation timestamp
pwdLastSet       ← password change timestamp
userAccountControl ← bitmask: disabled, locked, password never expires, etc.
memberOf         ← back-link: groups this user belongs to

objectGUID is the authoritative identifier in AD — not the DN. When a user is renamed or moved to a different OU, the GUID stays the same. Applications that store a user’s DN will break on rename; applications that store the GUID won’t.

userAccountControl is the bitmask that controls account state:

Flag          Value   Meaning
ACCOUNTDISABLE  2     Account disabled
LOCKOUT         16    Account locked out
PASSWD_NOTREQD  32    Password not required
NORMAL_ACCOUNT  512   Normal user account (set on almost all accounts)
DONT_EXPIRE_PASSWD 65536  Password never expires
# Query AD from a Linux machine
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName userPrincipalName objectGUID memberOf userAccountControl

Replication: USN + GUID + KCC

AD replication is multi-master — every DC accepts writes. The replication engine uses:

USN (Update Sequence Number) — a per-DC counter that increments on every local write. Each attribute in the directory stores the USN at which it was last modified (uSNChanged, uSNCreated). When DC-A replicates to DC-B, DC-B asks: “give me everything you’ve changed since the last USN I saw from you.”

GUID — each object has a globally unique identifier. If the same attribute is modified on two DCs before replication (a conflict), the conflict is resolved: last-writer-wins at the attribute level, based on the modification timestamp. If timestamps are equal, the attribute value from the DC with the lexicographically higher GUID wins.

KCC (Knowledge Consistency Checker) — a component that runs on every DC and automatically constructs the replication topology. You don’t configure which DCs replicate to which — the KCC builds a minimum spanning tree that ensures every DC is connected to every other within a set number of hops. You configure Sites and site links; the KCC does the rest.

# Check replication status from a Linux machine (requires rpcclient or adcli)
# Or on the DC: repadmin /showrepl (Windows tool)

# Simulate: query the highestCommittedUSN from a DC
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "" -s base highestCommittedUSN

Sites are AD’s concept of physical network topology. A site is a set of IP subnets with high-bandwidth connectivity between them. Site links represent the WAN connections between sites.

Site: Mumbai              Site: Hyderabad
┌────────────────┐        ┌────────────────┐
│ DC: dc-mum-01  │        │ DC: dc-hyd-01  │
│ DC: dc-mum-02  │        │ DC: dc-hyd-02  │
│ subnet: 10.1/16│        │ subnet: 10.2/16│
└───────┬────────┘        └────────┬───────┘
        │                          │
        └──── Site Link ───────────┘
              Cost: 100
              Replication interval: 15 min

When a user in Mumbai authenticates, AD’s KDC locates a DC in the same site using DNS SRV records. The SRV records include the site name in the service name: _ldap._tcp.Mumbai._sites.dc._msdcs.corp.com. SSSD and Windows clients query site-local SRV records first.

If no DC is available in the local site, authentication falls back to a DC in another site across the WAN link. Configuring sites correctly prevents remote authentication failures from killing local operations.


Group Policy: LDAP + Sysvol

GPOs are stored in two places:

LDAP — the CN=Policies,CN=System,DC=corp,DC=com container holds GPO metadata objects. Each GPO has a GUID, a display name, and version numbers. The gPLink attribute on OUs and the domain root links GPOs to where they apply.

Sysvol — the actual policy templates and scripts live in \\corp.com\SYSVOL\corp.com\Policies\{GPO-GUID}\. Sysvol is a DFS-R (Distributed File System Replication) share replicated to every DC.

When a Windows client applies Group Policy:
1. LDAP query: what GPOs are linked to my OU chain?
2. Sysvol fetch: download the policy templates from the GPO’s Sysvol path
3. Apply: process Registry settings, Security settings, Scripts

Linux clients don’t process GPOs natively. The adcli and sssd tools interpret a small subset of AD policy (password policy, account lockout) via LDAP. Full GPO processing on Linux requires Samba’s samba-gpupdate or third-party tools.


Joining Linux to AD

# Install required packages
dnf install -y realmd sssd adcli samba-common

# Discover the domain
realm discover corp.com
# corp.com
#   type: kerberos
#   realm-name: CORP.COM
#   domain-name: corp.com
#   configured: no
#   server-software: active-directory
#   client-software: sssd

# Join
realm join corp.com -U Administrator
# Prompts for Administrator password
# Creates machine account in AD
# Configures sssd.conf, krb5.conf, nsswitch.conf, pam.d automatically

# Verify
realm list
id [email protected]

What the join does:

  1. Creates a machine account HOSTNAME$ in CN=Computers,DC=corp,DC=com
  2. Sets a machine password (rotated automatically by SSSD)
  3. Retrieves a Kerberos keytab to /etc/krb5.keytab
  4. Configures SSSD with id_provider = ad, auth_provider = ad
  5. Updates /etc/nsswitch.conf to include sss
  6. Updates /etc/pam.d/ to include pam_sss

After joining, SSSD uses the machine’s Kerberos keytab to authenticate to the DC and query LDAP — no hardcoded service account credentials required.


LDAP Queries Against AD from Linux

# Find a user (after kinit or with -w password)
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName mail memberOf

# Find all members of a group
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(cn=engineers)" \
  member

# Find all AD-joined Linux machines
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(&(objectClass=computer)(operatingSystem=*Linux*))" \
  cn operatingSystem lastLogonTimestamp

# Find disabled accounts
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(userAccountControl:1.2.840.113556.1.4.803:=2)" \
  sAMAccountName

The last filter uses an LDAP extensible match (1.2.840.113556.1.4.803 is the OID for bitwise AND). userAccountControl:1.2.840.113556.1.4.803:=2 means “entries where userAccountControl AND 2 equals 2” — i.e., the ACCOUNTDISABLE bit is set. This is a Microsoft AD extension not in standard LDAP.


⚠ Common Misconceptions

“AD is just Microsoft’s LDAP.” AD is LDAP + Kerberos + DNS + DFS-R + GPO, all tightly integrated and with a schema that the Microsoft ecosystem depends on. You can query AD with standard ldapsearch. You cannot replace it with OpenLDAP without breaking every Windows client.

“Linux machines in AD get GPO.” Linux machines appear in AD and can be organized into OUs. Standard GPOs don’t apply to them. Samba’s samba-gpupdate can process a subset of AD policy for Linux — mostly Registry and Security settings mapped to Linux equivalents.

“realm leave removes the machine cleanly.” realm leave removes local configuration but does not delete the machine account from AD. The stale computer object stays in CN=Computers until an AD admin deletes it. Always run realm leave && adcli delete-computer -U Administrator for a clean removal.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management AD is the dominant enterprise identity store — understanding its LDAP structure, Kerberos realm, and GPO model is essential for IAM in mixed environments
CISSP Domain 4: Communications and Network Security AD replication traffic (RPC, LDAP, Kerberos) is a significant portion of enterprise WAN traffic — Sites and site links are a network security and performance design decision
CISSP Domain 3: Security Architecture and Engineering AD forest/domain/OU hierarchy is an architectural decision with long-term security consequences — getting OU structure wrong constrains GPO delegation for years

Key Takeaways

  • AD is LDAP + Kerberos + DNS + GPO + DFS-R — not a product that “uses” these; they’re the implementation
  • Replication is multi-master via USN + GUID; the KCC builds the topology automatically from Sites configuration
  • objectGUID is the stable identifier — not the DN, which changes on rename/move
  • realm join is the correct way to join Linux to AD — it configures SSSD, Kerberos, PAM, and NSS correctly in one command
  • userAccountControl is the bitmask that controls account state — (userAccountControl:1.2.840.113556.1.4.803:=2) finds disabled accounts

What’s Next

EP09 covered AD — LDAP and Kerberos inside the corporate network. EP10 covers what happens when identity needs to work across the internet, where Kerberos doesn’t reach: SAML, OAuth2, and OIDC — the protocols that let identity leave the building.

Next: SAML vs OIDC vs OAuth2: Which Protocol Handles Which Identity Problem

Get EP10 in your inbox when it publishes → linuxcent.com/subscribe

FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Reading Time: 5 minutes

The Identity Stack, Episode 8
EP07: LDAP HAEP08EP09: Active Directory → …


TL;DR

  • FreeIPA is 389-DS (LDAP) + MIT Kerberos + Dogtag PKI + Bind DNS + SSSD — one ipa-server-install command gets you an enterprise identity platform
  • Host-Based Access Control (HBAC) lets you define centrally: which users can SSH to which hosts — no more managing /etc/security/access.conf per machine
  • Sudo rules from the directory: define sudo policy centrally, have every machine pull it — no /etc/sudoers.d/ files scattered across the fleet
  • ipa CLI is the management interface — ipa user-add, ipa group-add, ipa hbacrule-add — everything that took five LDAP commands takes one ipa command
  • FreeIPA trusts with Active Directory let Linux machines authenticate AD users without joining the AD domain
  • The right choice for Linux-centric environments; AD is the right choice when Windows clients dominate

The Big Picture: What FreeIPA Integrates

┌─────────────────────────────────────────────────────────┐
│                    FreeIPA Server                        │
│                                                         │
│  389-DS (LDAP)    MIT Kerberos    Dogtag PKI            │
│  ─────────────    ───────────     ─────────             │
│  User/group       TGT + service   Machine certs         │
│  storage          ticket issuing  User certs             │
│                                   OCSP / CRL            │
│  Bind DNS         SSSD (client)   Apache (WebUI)        │
│  ──────────       ────────────    ──────────────        │
│  SRV records      Enrollment      Management UI         │
│  for KDC/LDAP     automation      REST API              │
└─────────────────────────────────────────────────────────┘
              ▲                  ▲
              │ enrollment       │ SSH + sudo rules
   ┌──────────┴──────────┐  ┌───┴──────────────────┐
   │  Linux client        │  │  Linux client         │
   │  (ipa-client-install)│  │  (ipa-client-install) │
   └─────────────────────┘  └──────────────────────┘

EP06 and EP07 built OpenLDAP from components. FreeIPA gives you all of that plus Kerberos, PKI, DNS, and HBAC — opinionated, integrated, and managed through a single CLI and WebUI. This episode shows what you actually get from it.


Why FreeIPA Instead of Bare OpenLDAP

Running bare OpenLDAP requires you to:
– Configure schema for POSIX accounts, SSH keys, sudo rules, HBAC manually
– Set up MIT Kerberos separately and integrate it with LDAP
– Build your own PKI for machine certificates
– Maintain DNS SRV records for Kerberos discovery
– Write client enrollment scripts
– Build a management interface (or live in LDIF)

FreeIPA does all of this in one installer, with a consistent data model across all components. The trade-off is opacity — FreeIPA makes decisions for you (schema, replication topology, Kerberos realm name) that bare OpenLDAP leaves to you.


Installing FreeIPA Server

# RHEL / Rocky / AlmaLinux
dnf install -y freeipa-server freeipa-server-dns

# Run the installer (interactive)
ipa-server-install

# Or non-interactive:
ipa-server-install \
  --realm=CORP.COM \
  --domain=corp.com \
  --ds-password=DM_password \
  --admin-password=Admin_password \
  --setup-dns \
  --forwarder=8.8.8.8 \
  --unattended

# After install: get an admin Kerberos ticket
kinit admin

The installer creates:
– 389-DS instance with the FreeIPA schema
– MIT KDC with realm CORP.COM
– Dogtag CA and all certificate infrastructure
– Bind DNS with SRV records for the KDC and LDAP server
– Apache WebUI at https://ipa.corp.com/ipa/ui/
– SSSD configured on the server itself

Time: 5–10 minutes. What used to take a week of manual configuration.


The ipa CLI

Every management action goes through ipa. It talks to the IPA server’s REST API and handles Kerberos authentication transparently (it uses your kinit session).

# Users
ipa user-add vamshi \
  --first=Vamshi --last=Krishna \
  [email protected] \
  --password

ipa user-show vamshi
ipa user-find --all              # search all users
ipa user-disable vamshi          # lock account without deleting
ipa user-mod vamshi --shell=/bin/zsh

# Groups
ipa group-add engineers --desc "Engineering team"
ipa group-add-member engineers --users=vamshi,alice

# Password policy
ipa pwpolicy-mod --minlength=12 --maxlife=90 --history=10

# SSH public keys — stored centrally, pushed to every host
ipa user-mod vamshi --sshpubkey="ssh-ed25519 AAAA..."
# SSSD on enrolled hosts will use this key for SSH login — no authorized_keys file needed

Host-Based Access Control (HBAC)

HBAC is the feature that justifies FreeIPA for most Linux shops. It lets you define centrally: which users (or groups) can log in to which hosts (or host groups), using which services (SSH, sudo, FTP).

Without HBAC, access control is per-machine: /etc/security/access.conf or PAM pam_access rules, replicated across every server, managed inconsistently.

With HBAC: one rule, enforced everywhere.

# Create host groups
ipa hostgroup-add production-servers --desc "Production Linux hosts"
ipa hostgroup-add-member production-servers --hosts=web01.corp.com,db01.corp.com

# Create user groups
ipa group-add sre-team
ipa group-add-member sre-team --users=vamshi,alice

# Create an HBAC rule
ipa hbacrule-add allow-sre-to-prod \
  --desc "SRE team can SSH to production"
ipa hbacrule-add-user allow-sre-to-prod --groups=sre-team
ipa hbacrule-add-host allow-sre-to-prod --hostgroups=production-servers
ipa hbacrule-add-service allow-sre-to-prod --hbacsvcs=sshd

# Test the rule before applying
ipa hbactest \
  --user=vamshi \
  --host=web01.corp.com \
  --service=sshd
# Access granted: True
# Matched rules: allow-sre-to-prod

SSSD on each enrolled host enforces the HBAC rules at login time by querying the IPA server. No per-machine configuration. Add a new server to the production-servers host group and the HBAC rules apply immediately.


Sudo Rules from the Directory

# Create a sudo rule
ipa sudorule-add allow-sre-sudo \
  --cmdcat=all \
  --desc "SRE team gets full sudo on production"
ipa sudorule-add-user allow-sre-sudo --groups=sre-team
ipa sudorule-add-host allow-sre-sudo --hostgroups=production-servers

# Or a scoped rule — only specific commands
ipa sudorule-add allow-service-restart
ipa sudocmdgroup-add service-commands
ipa sudocmd-add /usr/bin/systemctl
ipa sudocmdgroup-add-member service-commands --sudocmds="/usr/bin/systemctl"
ipa sudorule-add-allow-command allow-service-restart --sudocmdgroups=service-commands

On enrolled hosts, SSSD’s sssd_sudo responder pulls these rules and the sudo command evaluates them locally. No /etc/sudoers.d/ files. Central policy, local enforcement.


Enrolling a Client

# On the client machine
dnf install -y freeipa-client

ipa-client-install \
  --domain=corp.com \
  --server=ipa.corp.com \
  --realm=CORP.COM \
  --principal=admin \
  --password=Admin_password \
  --unattended

# What this does:
# 1. Registers the host in IPA as a machine principal
# 2. Retrieves a host Kerberos keytab (/etc/krb5.keytab)
# 3. Configures SSSD (sssd.conf, nsswitch.conf, pam.d)
# 4. Configures Kerberos (/etc/krb5.conf)
# 5. Optionally configures NTP and DNS

After enrollment: getent passwd vamshi returns the IPA user. SSH with an IPA password works. HBAC rules are enforced. Sudo rules from the directory apply. SSH public keys from the user’s IPA profile work without authorized_keys files.


FreeIPA Trust with Active Directory

In mixed environments (Linux servers + Windows clients), you can establish a trust between FreeIPA and AD without joining the Linux servers to the AD domain directly.

# On the IPA server (after installing ipa-server-trust-ad)
ipa-adtrust-install --netbios-name=CORP

# Establish the trust
ipa trust-add ad.corp.com \
  --admin=Administrator \
  --password \
  --type=ad

# AD users can now log in to IPA-enrolled Linux hosts
# They appear as: CORP.COM\username or [email protected]

Under the hood: FreeIPA acts as an SSSD-enabled Samba DC for the trust relationship. AD users’ Kerberos tickets from the AD KDC are accepted by the FreeIPA KDC, which maps them to POSIX attributes stored in IPA (or automatically generated via ID mapping).


⚠ Common Misconceptions

“FreeIPA is just OpenLDAP with a UI.” FreeIPA uses 389-DS (not OpenLDAP), adds a full Kerberos KDC, a certificate authority, DNS, HBAC enforcement, and sudo management — all with a consistent schema designed for these use cases. It’s an integrated identity platform, not a wrapper.

“HBAC rules replace firewall rules.” HBAC controls who can log in to a host at the authentication layer — not network access. A blocked HBAC rule means the SSH session is rejected after TCP connection. You still need firewall rules to block TCP access.

“FreeIPA replicas are identical.” FreeIPA uses 389-DS Multi-Supplier replication. All replicas accept reads and writes. But the CA is separate — only the initial server (and explicitly designated CA replicas) run the CA. If the CA goes down, certificate operations stop; authentication does not.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management FreeIPA is an enterprise IAM platform — HBAC, sudo policy, SSH key management, and certificate-based authentication are all IAM controls
CISSP Domain 3: Security Architecture and Engineering FreeIPA’s integrated CA enables certificate-based authentication for machines and users — a stronger authentication factor than passwords
CISSP Domain 1: Security and Risk Management Centralized HBAC and sudo policy reduces the attack surface of privilege escalation — no more inconsistent sudoers files that drift across the fleet

Key Takeaways

  • FreeIPA = 389-DS + MIT Kerberos + Dogtag PKI + Bind DNS — one installer, one management interface
  • HBAC rules define centrally who can SSH to which host groups — enforced by SSSD on every enrolled client, no per-machine config
  • Sudo rules from the directory replace scattered /etc/sudoers.d/ files — central policy, SSSD-enforced locally
  • ipa hbactest lets you verify access rules before a user hits a blocked login — use it before every policy change
  • For Linux-centric environments: FreeIPA. For Windows-dominant environments: AD. For mixed: FreeIPA trust with AD.

What’s Next

FreeIPA is the Linux answer to enterprise identity. EP09 covers the Microsoft answer — Active Directory — which extended LDAP and Kerberos into a complete enterprise platform with Group Policy, Sites, and a replication model built for global scale.

Next: How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Get EP09 in your inbox when it publishes → linuxcent.com/subscribe

LDAP High Availability: Load Balancing and Production Architecture

Reading Time: 6 minutes

The Identity Stack, Episode 7
EP06: OpenLDAPEP07EP08: FreeIPA → …


TL;DR

  • LDAP HA means multiple directory servers behind a load balancer — clients connect to a VIP, not to individual servers
  • Read/write split: all writes go to the provider, reads are distributed across consumers — the load balancer enforces this by routing on port or backend check
  • SSSD handles multi-server failover natively (ldap_uri accepts a comma-separated list) — for apps without built-in failover, HAProxy with health checks does the work
  • Connection pooling is critical at scale — nss_ldap and pam_ldap opened a new connection per login; SSSD maintains a pool; apps that use libldap directly must implement their own
  • cn=monitor is the built-in monitoring endpoint — exposes connection counts, operation rates, and backend stats readable via ldapsearch
  • 389-DS (Red Hat Directory Server) is the production choice for >1M entries — purpose-built for large directories with a dedicated replication engine

The Big Picture: Production LDAP Topology

         Clients (SSSD, apps, VPN concentrators)
                      │
              ┌───────▼───────┐
              │   HAProxy VIP  │   ← single endpoint, port 389/636
              │  10.0.0.10     │
              └───────┬───────┘
                      │
          ┌───────────┼───────────┐
          ▼           ▼           ▼
   ldap1.corp.com  ldap2.corp.com  ldap3.corp.com
   (Provider)      (Consumer)      (Consumer)
   Reads + Writes  Reads only      Reads only
          │           ▲               ▲
          └───────────┴───────────────┘
               SyncRepl replication

EP06 built a two-node replicated directory. This episode covers what happens when the directory becomes infrastructure — when it needs to survive a node failure, handle thousands of connections, and be monitored like any other critical service.


HAProxy for LDAP

HAProxy is the standard choice for LDAP load balancing. Unlike HTTP, LDAP is a stateful protocol — once a client binds, subsequent operations on that connection share the authenticated session. The load balancer must use connection persistence, not per-request routing.

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 50000

defaults
    mode tcp                  # LDAP is TCP, not HTTP
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    option tcplog

# ── LDAP read/write split ─────────────────────────────────────────────

# Writes → provider only
frontend ldap-write
    bind *:389
    default_backend ldap-provider

backend ldap-provider
    balance first                   # always use first available (provider)
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s rise 2 fall 3
    server ldap2 ldap2.corp.com:389 check inter 5s rise 2 fall 3 backup

# Reads → all nodes round-robin
frontend ldap-read
    bind *:3389                     # internal read port
    default_backend ldap-consumers

backend ldap-consumers
    balance roundrobin
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s
    server ldap2 ldap2.corp.com:389 check inter 5s
    server ldap3 ldap3.corp.com:389 check inter 5s

# LDAPS (TLS)
frontend ldaps
    bind *:636
    default_backend ldap-consumers-tls

backend ldap-consumers-tls
    balance roundrobin
    server ldap1 ldap1.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem
    server ldap2 ldap2.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem

The health check (tcp-check connect) just verifies TCP connectivity. For a more precise check — verifying that slapd is actually responding to LDAP requests — use a custom script that runs ldapsearch and checks the result code.


SSSD Multi-Server Failover

SSSD has native failover — no load balancer required for SSSD-based clients:

# /etc/sssd/sssd.conf
[domain/corp.com]
ldap_uri = ldap://ldap1.corp.com, ldap://ldap2.corp.com, ldap://ldap3.corp.com
# SSSD tries them in order; switches to next on failure
# Switches back to primary after ldap_recovery_interval (default: 30s)

# For AD, discovery via DNS SRV records is even better:
ad_server = _srv_
# SSSD queries _ldap._tcp.corp.com SRV records and gets all DCs automatically

SSSD monitors the connection health. If the current server becomes unreachable, it switches to the next in the list within seconds. Existing cached data keeps serving during the switchover. Clients using SSSD don’t need a load balancer for basic HA.


Connection Pooling

Every LDAP bind creates an authenticated session on the server. A server with connection limits (olcConnMaxPending, olcConnMaxPendingAuth in OLC) will reject new connections when those limits are hit.

The problem: applications that use libldap directly tend to open a new connection per operation. At 500 requests/second, that’s 500 new TCP connections, 500 binds, 500 TLS handshakes per second — a directory that can handle 5000 concurrent connections starts refusing new ones.

The solutions:

SSSD — handles this automatically. SSSD maintains one or a small number of persistent connections per domain and multiplexes all PAM/NSS queries through them.

Application-level pooling — frameworks like python-ldap with connection pooling, ldap3 with connection strategies, or dedicated middleware like 389-DS‘s Directory Proxy Server.

ldap_maxconnections in OpenLDAP — sets a hard limit. When hit, new connections block until existing ones close. Set this to something reasonable (olcConnMaxPending: 100 in OLC) so you get a controlled failure mode instead of unbounded queuing.


Monitoring with cn=monitor

OpenLDAP exposes live operational statistics via the cn=monitor database — a virtual LDAP subtree that reflects the server’s current state. Enable it:

# enable-monitor.ldif
dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/ldap
olcModuleLoad: back_monitor

dn: olcDatabase=monitor,cn=config
objectClass: olcDatabaseConfig
olcDatabase: monitor
olcAccess: to *
  by dn="cn=admin,dc=corp,dc=com" read
  by * none

Query it:

# Overall statistics
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=monitor" -s sub "(objectClass=*)" \
  monitorOpInitiated monitorOpCompleted

# Connection counts
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Connections,cn=monitor" -s one \
  monitorConnectionNumber

# Operations by type
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Operations,cn=monitor" -s one \
  monitorOpInitiated monitorOpCompleted

Useful metrics to export to Prometheus (via prometheus-openldap-exporter or similar):
monitorOpCompleted per operation type (bind, search, modify)
monitorConnectionNumber — current connection count
– Backend-specific: olmMDBEntries, olmMDBPagesMax, olmMDBPagesUsed


389-DS: LDAP at Scale

OpenLDAP is excellent for directories up to a few million entries. When you need:
– 10M+ entries
– High write throughput (more than a few hundred writes/second)
– Fine-grained replication filtering
– A dedicated web-based admin UI

…389-DS (Red Hat Directory Server, community edition) is the production answer. It’s what FreeIPA uses under the hood.

Key architectural differences from OpenLDAP:

Multi-supplier replication — 389-DS’s replication engine uses a dedicated changelog (stored in LMDB) and Change Sequence Numbers (CSNs) for conflict resolution. Multi-supplier (multi-master) replication is first-class, not a bolted-on feature.

Changelog — every change is written to a persistent changelog before being applied. This enables precise replication: a consumer can reconnect after a network partition and get exactly the changes it missed, rather than doing a full resync.

Plugin architecture — 389-DS functionality (replication, managed entries, DNA for automatic UID allocation, memberOf, password policy) is all implemented as plugins that can be enabled/disabled per directory instance.

# Install 389-DS
dnf install -y 389-ds-base

# Create a new instance
dscreate interactive
# — or use a template:
dscreate from-file /path/to/instance.inf

# Manage with dsctl
dsctl slapd-corp status
dsctl slapd-corp start
dsctl slapd-corp stop

# Admin with dsconf
dsconf slapd-corp backend suffix list
dsconf slapd-corp replication status -suffix "dc=corp,dc=com"

The dsconf replication status command gives a live view of replication lag across all suppliers and consumers — something OpenLDAP requires you to compute manually from contextCSN comparisons.


Global Catalog: Cross-Domain Search in AD

When your directory spans multiple AD domains in a forest, the Global Catalog solves a specific problem: a user in emea.corp.com needs to be found by an app that only knows corp.com.

Forest: corp.com
  ├── corp.com       → DC port 389    full directory: 500K entries
  ├── emea.corp.com  → DC port 389    full directory: 200K entries
  └── Global Catalog → GC port 3268  partial replica: 700K entries
                                       (not all attributes — just the most queried ones)

The GC replicates a subset of attributes from every domain in the forest. By default: cn, mail, sAMAccountName, userPrincipalName, memberOf, and about 150 others. Attributes marked with isMemberOfPartialAttributeSet in the schema are replicated to the GC.

If an application is configured to use port 3268 instead of 389, it’s using the GC — and it won’t see attributes not included in the partial attribute set. This surprises teams that add a custom attribute to AD and then wonder why their application can’t see it on 3268 but can on 389.


⚠ Production Gotchas

HAProxy TCP health checks don’t verify LDAP is responsive. A server can accept TCP connections but have slapd in a degraded state (database corruption, out-of-memory). Build a proper LDAP health check: a script that binds and searches a known entry and checks the result.

replication lag under write load. SyncRepl consumers can fall behind under sustained write load. Monitor the contextCSN difference between provider and consumers. If consumers are more than a few seconds behind, investigate the provider’s write throughput and the consumer’s processing speed.

Directory size and the MDB mapsize. LMDB requires a pre-configured maximum database size (olcDbMaxSize). If the database grows beyond this, slapd starts failing writes. Set it to 2–4x your expected data size and monitor olmMDBPagesUsed / olmMDBPagesMax.


Key Takeaways

  • HAProxy in TCP mode provides LDAP load balancing — use balance first for write routing (provider only), balance roundrobin for reads
  • SSSD has native failover via ldap_uri — for SSSD clients, a load balancer adds HA but isn’t strictly required
  • cn=monitor is the built-in OpenLDAP monitoring endpoint — export its counters to Prometheus for operational visibility
  • 389-DS is the right choice for >1M entries, high write throughput, or multi-supplier replication as a first-class feature
  • Global Catalog (port 3268/3269) is a partial replica of all AD domains — useful for forest-wide searches, but missing non-replicated attributes

What’s Next

EP07 covers the infrastructure layer. EP08 zooms out to FreeIPA — what you get when LDAP, Kerberos, DNS, PKI, and HBAC are integrated into a single Linux-native identity stack, and why most Linux shops running their own directory should be running FreeIPA instead of bare OpenLDAP.

Next: FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

OpenLDAP Setup and Replication: Running Your Own Directory

Reading Time: 5 minutes

The Identity Stack, Episode 6
EP01 → … → EP05: KerberosEP06EP07: LDAP HA → …


TL;DR

  • OpenLDAP’s server process is slapd — the backend that stores data is MDB (LMDB), a memory-mapped B-tree that replaced the old Berkeley DB backend
  • Configuration lives in the directory itself: cn=config (OLC — Online Configuration) lets you modify slapd at runtime without restarting
  • SyncRepl is the replication protocol: a consumer subscribes to a provider and stays in sync via either polling (refreshOnly) or a persistent connection (refreshAndPersist)
  • Multi-Provider (formerly Multi-Master) lets multiple nodes accept writes — conflict resolution uses CSN (Change Sequence Number), last-writer-wins
  • The essential tools: slapd, ldapadd, ldapmodify, ldapsearch, slapcat, slaptest
  • Always build indexes on the attributes you search most — uid, cn, memberOf — or every search is a full scan

The Big Picture: slapd Architecture

ldapsearch / ldapadd / SSSD / any LDAP client
              │ TCP 389 / 636
              ▼
         ┌─────────────────────────────────┐
         │  slapd (OpenLDAP server)         │
         │                                 │
         │  Frontend (protocol layer)       │
         │    • parse BER requests          │
         │    • ACL enforcement             │
         │    • schema validation           │
         │                                 │
         │  Backend (storage layer)         │
         │    • MDB (LMDB) — default       │
         │    • memory-mapped file I/O      │
         │    • ACID transactions           │
         └────────────┬────────────────────┘
                      │
              /var/lib/ldap/
              data.mdb   (the directory data)
              lock.mdb   (LMDB lock file)

EP05 showed Kerberos in isolation. OpenLDAP is where you run the identity store that Kerberos references — and where SSSD looks up user and group attributes. This episode builds a working two-node replicated directory from scratch.


Installation

# Ubuntu / Debian
apt-get install -y slapd ldap-utils

# RHEL / Rocky / AlmaLinux
dnf install -y openldap-servers openldap-clients

# After install — Ubuntu runs a configuration wizard
# Skip it: dpkg-reconfigure slapd
# Or answer it and then switch to OLC management

On RHEL-family systems, slapd is not configured after install — you work entirely through OLC from the start.


OLC: The Directory Configures Itself

The old way was slapd.conf — a static file that required a full restart on every change. OLC (Online Configuration) replaced it: slapd‘s own configuration is stored as LDAP entries under cn=config. You modify configuration the same way you modify data — with ldapmodify. Changes take effect immediately.

cn=config                        ← root config entry
├── cn=schema,cn=config          ← schema definitions
│     ├── cn={0}core             ← core schema
│     ├── cn={1}cosine           ← RFC 1274 attributes
│     └── cn={2}inetorgperson    ← inetOrgPerson object class
├── olcDatabase={-1}frontend     ← default settings for all databases
├── olcDatabase={0}config        ← the config database itself
└── olcDatabase={1}mdb           ← your actual directory data
      ├── olcAccess              ← ACLs
      ├── olcSuffix              ← base DN (e.g., dc=corp,dc=com)
      └── olcDbIndex             ← search indexes

Everything under cn=config has attributes prefixed with olc (OpenLDAP Configuration). You query and modify it just like any other LDAP subtree — with one restriction: only the cn=config admin (usually gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth — the local root via SASL EXTERNAL) can write to it.


Bootstrapping a Directory

The quickest way to get a working directory is a set of LDIF files applied in order.

1. Load schemas

# Apply the schemas OpenLDAP ships with
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/inetorgperson.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/nis.ldif       # adds posixAccount, posixGroup

2. Configure the MDB database

# mdb-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=corp,dc=com
-
replace: olcRootDN
olcRootDN: cn=admin,dc=corp,dc=com
-
replace: olcRootPW
olcRootPW: {SSHA}hashed_password_here

Generate the hash: slappasswd -s yourpassword

ldapmodify -Y EXTERNAL -H ldapi:/// -f mdb-config.ldif

3. Add indexes

# indexes.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcDbIndex
olcDbIndex: uid eq,pres
olcDbIndex: cn eq,sub
olcDbIndex: sn eq,sub
olcDbIndex: mail eq
olcDbIndex: memberOf eq
olcDbIndex: entryCSN eq
olcDbIndex: entryUUID eq

The last two (entryCSN, entryUUID) are required for SyncRepl replication to work efficiently.

4. Load initial data

# base.ldif
dn: dc=corp,dc=com
objectClass: top
objectClass: dcObject
objectClass: organization
o: Corp
dc: corp

dn: ou=people,dc=corp,dc=com
objectClass: organizationalUnit
ou: people

dn: ou=groups,dc=corp,dc=com
objectClass: organizationalUnit
ou: groups

dn: uid=vamshi,ou=people,dc=corp,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
cn: Vamshi Krishna
sn: Krishna
uid: vamshi
uidNumber: 1001
gidNumber: 1001
homeDirectory: /home/vamshi
loginShell: /bin/bash
mail: [email protected]
userPassword: {SSHA}hashed_password_here
ldapadd -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" \
  -w adminpassword \
  -f base.ldif

ACLs: Who Can Read What

OpenLDAP ACLs are evaluated top-to-bottom; first match wins.

# acls.ldif — set via OLC
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcAccess
# Users can change their own passwords
olcAccess: to attrs=userPassword
  by self write
  by anonymous auth
  by * none
# Users can read their own entry
olcAccess: to dn.base="ou=people,dc=corp,dc=com"
  by self read
  by users read
  by * none
# Service accounts can read everything (for SSSD)
olcAccess: to *
  by dn="cn=svc-ldap,ou=services,dc=corp,dc=com" read
  by self read
  by * none

A service account (cn=svc-ldap) that SSSD uses to search the directory needs read access to ou=people and ou=groups. Never give SSSD admin (write) access.


SyncRepl Replication

SyncRepl is a pull-based replication protocol built on the LDAP Sync operation (RFC 4533). A consumer connects to a provider and requests changes. The provider sends them. The consumer stays in sync.

On the Provider: Enable the syncprov overlay

# syncprov.ldif
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
olcSpCheckpoint: 100 10     # checkpoint every 100 ops or 10 minutes
olcSpSessionLog: 100        # keep last 100 changes for delta-sync
ldapadd -Y EXTERNAL -H ldapi:/// -f syncprov.ldif

On the Consumer: Configure syncrepl

# consumer-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcSyncrepl
olcSyncrepl: rid=001
  provider=ldap://ldap1.corp.com:389
  bindmethod=simple
  binddn="cn=repl-svc,dc=corp,dc=com"
  credentials=replication-password
  searchbase="dc=corp,dc=com"
  scope=sub
  schemachecking=on
  type=refreshAndPersist    # persistent connection (vs refreshOnly = polling)
  retry="5 5 60 +"          # retry: 5 times every 5s, then every 60s forever
  interval=00:00:05:00      # (for refreshOnly) sync every 5 minutes
-
add: olcUpdateRef
olcUpdateRef: ldap://ldap1.corp.com   # redirect writes to provider

refreshAndPersist keeps a persistent connection open. Changes replicate within milliseconds. refreshOnly polls on an interval — simpler, but adds latency.

Verify Replication

# On provider: check the contextCSN (the sync state token)
ldapsearch -x -H ldap://ldap1.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# contextCSN: 20260427010000.000000Z#000000#000#000000

# On consumer: should match after sync
ldapsearch -x -H ldap://ldap2.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# Same CSN = in sync

Multi-Provider: Accepting Writes on Both Nodes

Standard SyncRepl has one provider and one or more consumers — only the provider accepts writes. Multi-Provider (formerly Multi-Master) lets every node accept writes.

# On each node — add mirrormode to the database config
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcMirrorMode
olcMirrorMode: TRUE

With mirrormode enabled and each node configured as both provider and consumer of the other, writes on either node replicate to the other. Conflict resolution is CSN-based (Change Sequence Number) — a monotonically increasing timestamp. Last write wins at the attribute level.

Multi-Provider does not prevent split-brain conflicts — if two clients write the same attribute on two different nodes during a network partition, the higher CSN wins when the partition heals. For most directory use cases (user passwords, group memberships), this is acceptable. For others, it requires careful thought.


⚠ Production Gotchas

MDB data file grows monotonically. LMDB never shrinks the data file automatically. Deleted entries leave free space inside the file that gets reused, but the file on disk doesn’t shrink. Use slapcat to export and slapadd to reimport if you need to reclaim disk space.

slapcat is the only safe backup. slapcat reads the MDB database directly and exports LDIF — it does not go through slapd. Run it while slapd is running (LMDB is MVCC-safe for readers), but never copy the raw MDB files while slapd is running.

Schema changes on a replicated directory require coordination. Load the new schema on the provider first. SyncRepl will propagate it to consumers — but if a consumer gets a new entry using the new schema before the schema itself is replicated, the import will fail. Load schemas manually on all nodes before adding entries that use them.


Key Takeaways

  • OpenLDAP uses LMDB (MDB backend) — a memory-mapped, ACID-compliant storage engine with no external dependency
  • OLC (cn=config) is the right way to configure slapd — changes apply without restarts
  • SyncRepl pulls changes from a provider to a consumer — refreshAndPersist for near-real-time, refreshOnly for poll-based
  • Always index uid, cn, entryCSN, and entryUUID — unindexed searches are full scans
  • Multi-Provider allows writes on all nodes with CSN-based last-write-wins conflict resolution

What’s Next

A single OpenLDAP server works. Two nodes with SyncRepl work better. EP07 goes further: how you put multiple LDAP servers behind a load balancer, how connection pooling works, what to monitor, and how 389-DS handles directories with tens of millions of entries.

Next: LDAP High Availability: Load Balancing and Production Architecture

Get EP07 in your inbox when it publishes → linuxcent.com/subscribe

How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Reading Time: 7 minutes

The Identity Stack, Episode 5
EP01EP02EP03EP04: SSSDEP05EP06: OpenLDAP → …


TL;DR

  • Kerberos is a network authentication protocol — it proves identity without sending passwords over the network, using time-limited cryptographic tickets
  • Three actors: the client, the KDC (Key Distribution Center), and the service — the KDC issues tickets; clients use tickets to authenticate to services
  • The ticket flow: AS-REQ (get a TGT) → TGS-REQ (exchange TGT for a service ticket) → AP-REQ (present service ticket to the target service)
  • A TGT (Ticket-Granting Ticket) is a session credential — it lets you request service tickets without re-entering your password for the lifetime of the ticket (default 10 hours)
  • LDAP + Kerberos together: LDAP stores identity (who you are), Kerberos authenticates it (proves you are who you say you are) — Active Directory is exactly this combination
  • kinit, klist, kdestroy are the hands-on tools — run them and read the ticket output

The Big Picture: Three Actors, Three Steps

         1. AS-REQ / AS-REP
Client ◄────────────────────► AS (Authentication Server)
  │                                     │
  │    (part of KDC)                    │
  │                                     ▼
  │         2. TGS-REQ / TGS-REP   TGS (Ticket-Granting Server)
  ├───────────────────────────────────►│
  │         (part of KDC)              │
  │                                    │
  │    3. AP-REQ / AP-REP              │
  └─────────────────────────────► Service (SSH, LDAP, NFS, HTTP...)

KDC = AS + TGS (usually the same process, same machine)

EP04 mentioned Kerberos tickets and clock skew requirements without explaining the protocol. This episode explains why Kerberos was invented, what a ticket actually is, and how the three-step flow works — so that when SSSD says “KDC unreachable” or kinit fails with “pre-authentication required,” you know exactly what’s happening.


The Problem Kerberos Was Built to Solve

MIT’s Project Athena started in 1983 — a campus-wide computing initiative giving students access to thousands of workstations. The problem: how do you authenticate a student at workstation 847 to a file server across campus without sending their password over the network?

In 1988, Steve Miller and Clifford Neuman published Kerberos version 4. The core insight: a trusted third party (the KDC) can issue cryptographic proof that a user has authenticated, and that proof can be presented to any service on the network without the service ever seeing the user’s password.

The password never leaves the client machine after the initial authentication. Every subsequent authentication — to a different service, to the same service again — uses a ticket. The KDC knows both the client and the service. The client and service only need to trust the KDC.


Keys, Tickets, and Sessions

Before the protocol, the primitives:

Long-term keys — derived from passwords. When you set a password in Kerberos, it’s hashed into a key stored in the KDC database (in the krbtgt account on AD, in /var/lib/krb5kdc/principal on MIT Kerberos). The client also derives this key from the password at authentication time. Neither ever sends the raw password.

Session keys — temporary symmetric keys created by the KDC for a specific session. They’re valid for the ticket’s lifetime. After the ticket expires, the session key is useless.

Tickets — encrypted blobs issued by the KDC. A ticket contains the session key, the client identity, the expiry time, and optional flags. It’s encrypted with the target service’s long-term key — only the service can decrypt it. The client carries the ticket but can’t read the contents.


The Three-Step Flow

Step 1: AS-REQ / AS-REP — Getting a TGT

Client                        KDC (AS component)
  │                                │
  │── AS-REQ ──────────────────────►
  │   {username, timestamp}         │
  │   (timestamp encrypted with     │
  │    client's long-term key)       │
  │                                 │
  │   KDC verifies: decrypts        │
  │   timestamp with stored key.    │
  │   If valid → issues TGT         │
  │                                 │
  ◄── AS-REP ──────────────────────│
      {session_key_enc_with_client, │
       TGT_enc_with_krbtgt_key}     │

The client decrypts the session key using its long-term key (derived from the password). The TGT is encrypted with the KDC’s own key (krbtgt) — the client can’t read it, but carries it.

This is the step that requires the password. After this, the TGT is what the client uses for everything else.

Step 2: TGS-REQ / TGS-REP — Getting a Service Ticket

Client                        KDC (TGS component)
  │                                │
  │── TGS-REQ ─────────────────────►
  │   {TGT, authenticator,         │
  │    target_service_name}        │
  │   (authenticator encrypted      │
  │    with TGT session key)        │
  │                                 │
  │   KDC: decrypts TGT,           │
  │   verifies authenticator,       │
  │   issues service ticket         │
  │                                 │
  ◄── TGS-REP ────────────────────│
      {service_session_key_enc,    │
       service_ticket_enc_with_    │
       service_long_term_key}      │

No password involved. The client proves its identity by presenting the TGT (which only the KDC can issue) and an authenticator (a timestamp encrypted with the TGT’s session key, proving the client holds the session key without revealing it).

Step 3: AP-REQ / AP-REP — Authenticating to the Service

Client                        Service (sshd, LDAP, NFS...)
  │                                │
  │── AP-REQ ──────────────────────►
  │   {service_ticket,             │
  │    authenticator_enc_with_      │
  │    service_session_key}        │
  │                                 │
  │   Service: decrypts ticket      │
  │   with its long-term key,       │
  │   verifies authenticator        │
  │                                 │
  ◄── AP-REP (optional) ───────────│
      {mutual authentication}       │

The service decrypts the ticket using its own key. It extracts the client identity and session key. It verifies the authenticator. No communication with the KDC required — the service trusts what the KDC signed.


Why Clock Skew Matters

Every Kerberos authenticator contains a timestamp. The service rejects authenticators older than 5 minutes (by default) — this prevents replay attacks where an attacker captures an authenticator and replays it later.

This is why clock skew over 5 minutes breaks Kerberos authentication entirely. If your machine’s clock drifts 6 minutes from the KDC, every authenticator you generate is rejected as too old or too far in the future. No tickets. No AD logins. No SSSD authentication.

# Check time sync status
timedatectl status
chronyc tracking        # if using chrony
ntpq -p                 # if using ntpd

# If clock is off: force a sync
chronyc makestep        # immediate step correction (chrony)

Hands-On: kinit, klist, kdestroy

# Get a TGT (will prompt for password)
kinit [email protected]

# Show current tickets
klist
# Credentials cache: FILE:/tmp/krb5cc_1001
# Principal: [email protected]
#
# Valid starting     Expires            Service principal
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#   renew until 05/04/26 01:00:00

# Show encryption types used (the -e flag)
klist -e
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#         Etype: aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

# Get a service ticket for a specific service
kvno host/[email protected]
# host/[email protected]: kvno = 3

# Show all tickets including service tickets
klist -f
# Flags: F=forwardable, f=forwarded, P=proxiable, p=proxy, D=postdated,
#        d=postdated, R=renewable, I=initial, i=invalid, H=hardware auth

# Destroy all tickets
kdestroy

The Valid starting and Expires fields are the ticket lifetime. After expiry, you need to re-authenticate (or renew the ticket if it’s within the renew until window). The renew until date is when even renewal stops working.


/etc/krb5.conf

[libdefaults]
    default_realm = CORP.COM
    dns_lookup_realm = false
    dns_lookup_kdc = true         # find KDCs via DNS SRV records
    ticket_lifetime = 10h
    renew_lifetime = 7d
    forwardable = true            # tickets can be forwarded to remote hosts (needed for SSH forwarding)
    rdns = false

[realms]
    CORP.COM = {
        kdc = dc01.corp.com
        kdc = dc02.corp.com       # failover KDC
        admin_server = dc01.corp.com
    }

[domain_realm]
    .corp.com = CORP.COM
    corp.com = CORP.COM

With dns_lookup_kdc = true, Kerberos finds KDCs by querying DNS SRV records (_kerberos._tcp.corp.com). AD sets these up automatically. On MIT Kerberos, you add them manually. DNS-based discovery is the recommended approach for AD environments — it picks up new DCs automatically.


Kerberos + LDAP: Why Enterprises Run Both

LDAP and Kerberos solve different problems and are almost always deployed together:

LDAP answers:  "Who is vamshi? What groups is he in? What's his home directory?"
Kerberos answers: "Is this really vamshi? Prove it without sending a password."

Active Directory is exactly this combination — the directory is LDAP-based, the authentication is Kerberos. When a Linux machine joins an AD domain via realm join or adcli, it gets:
– LDAP access to the AD directory (for NSS: user and group lookups)
– A Kerberos principal registered in AD (for PAM: ticket-based authentication)
– A machine account (the machine’s identity in the directory)

When you SSH into an AD-joined Linux machine:
1. SSSD issues a Kerberos AS-REQ for the user’s TGT
2. SSSD uses the TGT to get a service ticket for the Linux machine’s PAM service
3. Authentication is verified via the service ticket — no LDAP Bind with a password
4. SSSD does an LDAP Search to get POSIX attributes (UID, GID, home dir)

Password-based LDAP Bind is the fallback when Kerberos isn’t available. Kerberos is the default on AD-joined systems — and it’s more secure because the password never leaves the client.


⚠ Common Misconceptions

“Kerberos sends your password to the KDC.” It doesn’t. The client derives a key from the password locally and uses that key to encrypt a timestamp (the pre-authentication data). The KDC verifies the timestamp using the stored key. The raw password never travels.

“Kerberos is an authorization protocol.” Kerberos authenticates — it proves who you are. Authorization (what you can do) is a separate decision, usually handled by ACLs on the service or directory group membership.

“Once you have a TGT, you’re authenticated to everything.” A TGT only proves your identity to the KDC. Each service requires a separate service ticket. The TGT is what lets you get those service tickets without re-entering your password.

“Kerberos requires AD.” MIT Kerberos 5 is a standalone implementation. FreeIPA (EP08) runs MIT Kerberos. Heimdal is another implementation. AD uses a Microsoft-extended version of Kerberos 5, but the core protocol is the same RFC.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management Kerberos is the de facto enterprise authentication protocol — SSO, delegation, and service account authentication all depend on it
CISSP Domain 4: Communications and Network Security Kerberos prevents credential sniffing and replay attacks — two of the core network authentication threat categories
CISSP Domain 3: Security Architecture and Engineering The KDC is a critical single point of trust — its availability, key management, and account (krbtgt) rotation are architectural security decisions

Key Takeaways

  • Kerberos is a ticket-based protocol — the password is used once to get a TGT; from then on, tickets prove identity without the password
  • The three-step flow: get a TGT from the AS, exchange it for a service ticket at the TGS, present the service ticket to the target service
  • Clock skew over 5 minutes breaks Kerberos — time synchronization is a hard dependency
  • LDAP stores identity; Kerberos authenticates it — Active Directory is exactly this combination, and so is FreeIPA
  • klist -e shows the encryption types in use — aes256-cts-hmac-sha1-96 is what you want to see; arcfour-hmac (RC4) is legacy and should be disabled

What’s Next

EP05 covered Kerberos as a protocol. EP06 goes hands-on: building a real LDAP directory with OpenLDAP, configuring replication, and understanding how the server-side components — slapd, the MDB backend, SyncRepl — fit together.

Next: OpenLDAP Setup and Replication: Running Your Own Directory

Get EP06 in your inbox when it publishes → linuxcent.com/subscribe

SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Reading Time: 7 minutes

The Identity Stack, Episode 4
EP01: What Is LDAPEP02: LDAP InternalsEP03: LDAP Auth on LinuxEP04EP05: Kerberos → …


TL;DR

  • SSSD (System Security Services Daemon) is the caching and brokering layer between Linux and directory services — it handles LDAP, Kerberos, and AD so PAM and NSS don’t have to
  • Architecture: three tiers — responders (answer PAM/NSS queries), providers (talk to AD/LDAP/Kerberos), and a shared cache (LDB database on disk)
  • Credential caching means offline logins work — a user who authenticated yesterday can log in today even if the domain controller is unreachable
  • Key config: sssd.conf — the [domain] section is where almost all tuning happens
  • Debugging toolkit: sssctl, sss_cache, id, getent, journalctl -u sssd
  • The most common failure modes are: SSSD not running, stale cache, misconfigured ldap_search_base, and clock skew breaking Kerberos

The Big Picture: SSSD as the Identity Broker

PAM (pam_sss)         NSS (sss module)
      │                      │
      └──────────┬───────────┘
                 ▼
          SSSD Responders
          ┌────────────────────────────────────┐
          │  PAM responder   NSS responder      │
          │  (auth, account, (passwd, group,    │
          │   session)        shadow lookups)   │
          └────────────┬───────────────────────┘
                       │  shared cache (LDB)
                       ▼
          SSSD Providers
          ┌────────────────────────────────────┐
          │  identity provider  auth provider   │
          │  (user/group attrs) (credentials)   │
          └────────────┬───────────────────────┘
                       │
          ┌────────────┼────────────┐
          ▼            ▼            ▼
       LDAP          Kerberos    Local files
    (AD / OpenLDAP)  (KDC / AD)

EP03 showed that SSSD sits between PAM and LDAP. This episode goes inside it — the architecture, the config, and how to tell exactly what it’s doing on any given login attempt.


Why SSSD Exists

The problem before SSSD: nss_ldap and pam_ldap made direct LDAP connections for every query. No caching, no connection pooling, no failover, no offline support. On a system that makes dozens of getpwuid() calls per second (every ls -l, every process spawn), this meant dozens of LDAP roundtrips per second hitting the domain controller.

SSSD solved this with a single daemon that:
– Maintains a persistent connection pool to the directory
– Caches identity and credential data in an LDB (LDAP-like) database on disk
– Handles failover across multiple directory servers
– Satisfies PAM and NSS queries from cache when the directory is unreachable

The credential cache is the key insight. When you authenticate successfully, SSSD stores a hash of your credentials locally. If the domain controller is unreachable on your next login — network outage, laptop offline, VPN not connected — SSSD can verify your credentials against the local cache. You log in. You never knew the DC was down.


SSSD Architecture

SSSD is a set of cooperating processes sharing a cache:

Monitor — the parent process. Starts and restarts all other SSSD processes. If a responder or provider crashes, the monitor restarts it.

Responders — answer queries from PAM and NSS. Each responder handles a specific interface:
sssd_nss — answers getpwnam(), getpwuid(), getgrnam(), initgroups() calls
sssd_pam — handles PAM authentication, account checks, and session management
sssd_autofs, sssd_ssh, sssd_sudo — optional responders for specific services

Providers — the backend processes that talk to the actual directory:
– Each domain gets its own provider process (sssd_be[domain_name])
– The provider connects to LDAP/Kerberos/AD, fetches data, and writes it to the shared cache
– If the provider crashes or loses connectivity, responders fall back to serving from cache

Cache — LDB files in /var/lib/sss/db/. One database per configured domain, plus a cache for negative results (lookups that returned “not found”). The cache is an LDAP-like directory stored on disk — SSSD uses the same hierarchical structure for local storage as the remote directory uses.

# See the cache files
ls -la /var/lib/sss/db/
# cache_corp.com.ldb         ← user/group data for domain corp.com
# ccache_corp.com            ← Kerberos credential cache
# timestamps_corp.com.ldb   ← when entries were last refreshed

sssd.conf: The Config That Matters

/etc/sssd/sssd.conf has a [sssd] section (global) and one [domain/name] section per directory. The domain section is where almost all tuning happens.

[sssd]
services = nss, pam, sudo
domains = corp.com
config_file_version = 2

[domain/corp.com]
# What type of directory this is
id_provider = ad               # or: ldap, ipa, files
auth_provider = ad             # or: ldap, krb5, none
access_provider = ad           # controls who can log in

# The AD/LDAP server (can be a list for failover)
ad_domain = corp.com
ad_server = dc01.corp.com, dc02.corp.com

# Where to look for users and groups
ldap_search_base = dc=corp,dc=com

# Cache behavior
cache_credentials = true       # enable offline login
entry_cache_timeout = 5400     # how long before re-querying (seconds)
offline_credentials_expiration = 1  # days cached credentials stay valid offline

# What uid/gid range belongs to this domain (prevents UID conflicts)
ldap_id_mapping = true         # auto-map AD SIDs to UIDs (no uidNumber needed)
# OR for classical POSIX LDAP:
# ldap_id_mapping = false      # use uidNumber/gidNumber from directory

# Restrict logins to specific AD groups
# access_provider = simple
# simple_allow_groups = linux-admins, sre-team

# Home directory and shell defaults
override_homedir = /home/%u
default_shell = /bin/bash
fallback_homedir = /home/%u

# Enumerate all users (expensive on large dirs — disable unless needed)
enumerate = false

The two most commonly wrong settings:

ldap_search_base — if this doesn’t include the OU where your users live, SSSD won’t find them. On AD, the default searches the entire domain, which is usually correct. On OpenLDAP, you may need ou=people,dc=corp,dc=com.

ldap_id_mapping — on AD, users typically don’t have uidNumber attributes. Setting ldap_id_mapping = true tells SSSD to derive a UID from the user’s SID algorithmically. This produces consistent UIDs across machines. Setting it to false requires actual uidNumber attributes in the directory.


Credential Caching and Offline Logins

The cache is what separates SSSD from a simple proxy. When cache_credentials = true:

  1. On successful authentication, SSSD stores a hash of the credential in the LDB cache
  2. On the next authentication attempt, SSSD first tries the domain controller
  3. If the DC is unreachable, SSSD falls back to the local credential hash
  4. If the hash matches, login succeeds — even with no network

The credential hash is not the cleartext password — it’s a salted hash stored in /var/lib/sss/db/cache_corp.com.ldb. The security model is the same as /etc/shadow: someone with root access to the machine can access the hashes.

offline_credentials_expiration controls how long cached credentials stay valid when the DC is unreachable. 0 means forever (not recommended for high-security environments). 1 means one day — after 24 hours offline, even cached credentials expire and the user must authenticate online.


The Debugging Toolkit

# 1. Is SSSD running?
systemctl status sssd
pgrep -a sssd    # shows all SSSD processes (monitor + responders + providers)

# 2. Domain connectivity status
sssctl domain-status corp.com
# Domain: corp.com
# Active servers:
#   LDAP: dc01.corp.com
#   KDC: dc01.corp.com
# Discovered servers:
#   LDAP: dc01.corp.com, dc02.corp.com

# 3. Can SSSD find a specific user?
sssctl user-checks vamshi
# user: vamshi
# user name: [email protected]
# POSIX attributes: UID=1001, GID=1001, ...
# Authentication: success (uses actual PAM auth stack)

# 4. What does NSS see?
getent passwd vamshi          # full passwd entry
id vamshi                     # uid, gid, groups

# 5. Flush stale cache entries
sss_cache -u vamshi           # invalidate one user
sss_cache -G engineers        # invalidate one group
sss_cache -E                  # invalidate everything (nuclear option)

# 6. Live logs
journalctl -u sssd -f         # tail all SSSD logs
# Then attempt login in another terminal — watch the auth flow in real time

# 7. Increase log verbosity temporarily
sssctl config-check            # validate sssd.conf syntax
# Edit sssd.conf: add debug_level = 6 under [domain/corp.com]
systemctl restart sssd
journalctl -u sssd -f          # now shows LDAP queries, cache hits/misses

The single most useful command is sssctl user-checks <username>. It runs the full NSS + PAM auth stack internally and prints what SSSD would do on a real login — without creating a session or touching the running system.


Breaking SSSD (and What Each Failure Looks Like)

SSSD not running:

ssh vamshi@server
# Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
# getent passwd vamshi → (empty)
# Fix: systemctl start sssd

Stale cache after AD password change:

# User changed password in AD but SSSD still has old credential hash
ssh vamshi@server  # password accepted (wrong!) — cache hit with old hash
# Fix: sss_cache -u vamshi, then attempt login again

Clock skew > 5 minutes (breaks Kerberos):

journalctl -u sssd | grep -i "clock skew\|KDC\|kinit"
# sssd_be[corp.com]: Kerberos authentication failed: Clock skew too great
# Fix: systemctl restart chronyd (or ntpd), verify time sync

ldap_search_base wrong:

getent passwd vamshi  # empty, but user exists in AD
sssctl user-checks vamshi  # "User not found"
# Check: ldap_search_base must include the OU containing users
# Test: ldapsearch -x -H ldap://dc -b "ou=engineers,dc=corp,dc=com" "(uid=vamshi)"

⚠ Common Misconceptions

“Restarting SSSD logs everyone out.” Restarting SSSD doesn’t affect existing authenticated sessions. Active shell sessions, running processes — all unaffected. Only new authentication attempts are disrupted during the restart window, which takes a few seconds.

“sss_cache -E fixes everything.” Flushing the entire cache forces SSSD to re-fetch all entries from the domain controller on the next lookup. On a system with many users or enumeration enabled, this can cause a brief spike in LDAP traffic and slow lookups. Use targeted flushes (-u username, -G group) when possible.

“debug_level should always be high.” SSSD at debug_level = 9 logs every LDAP packet. On a production system with active logins, this generates gigabytes of logs quickly. Set it temporarily for debugging, then remove it and restart.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management SSSD is the runtime implementation of enterprise identity integration on Linux — understanding its caching model, failover behavior, and credential storage is foundational to IAM operations
CISSP Domain 3: Security Architecture and Engineering The credential cache design (/var/lib/sss/db/) creates a local credential store with specific security properties — architects need to understand the offline login trade-off
CISSP Domain 7: Security Operations SSSD is a critical security service — monitoring it, understanding its failure modes, and knowing how to recover it quickly are operational security skills

Key Takeaways

  • SSSD is a three-tier system: responders (serve PAM/NSS), providers (talk to AD/LDAP), and a shared LDB cache — each tier is independently restartable
  • Credential caching enables offline logins — the security trade-off is a local hash store in /var/lib/sss/db/
  • sssctl user-checks is the first tool to reach for when a login fails — it simulates the full auth flow and shows exactly where it breaks
  • ldap_id_mapping = true is the right choice for AD environments without POSIX attributes; false requires actual uidNumber/gidNumber in the directory
  • Clock skew over 5 minutes silently breaks Kerberos authentication — time sync is a hard dependency

What’s Next

EP04 showed SSSD’s role as the caching and brokering layer. What it referenced repeatedly — “Kerberos ticket”, “KDC”, “GSSAPI” — is the authentication protocol that sits underneath AD-joined Linux logins. SSSD uses Kerberos to authenticate. LDAP carries the identity data. EP05 explains how Kerberos works.

Next: How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Get EP05 in your inbox when it publishes → linuxcent.com/subscribe

How LDAP Authentication Works on Linux: PAM, NSS, and the Login Stack

Reading Time: 8 minutes

The Identity Stack, Episode 3
EP01: What Is LDAPEP02: LDAP InternalsEP03EP04: SSSD → …


TL;DR

  • LDAP is a directory protocol — it stores identity information and can verify a password via Bind, but authentication on Linux runs through PAM, not directly through LDAP
  • NSS (/etc/nsswitch.conf) answers “who is this user?” — it resolves UIDs, group memberships, and home directories by querying LDAP (or the local files, or SSSD)
  • PAM (/etc/pam.d/) answers “are they allowed in?” — it enforces authentication, account validity, session setup, and password policy
  • pam_ldap (the old way) opened a direct LDAP connection on every login — fragile, no caching, broken when the LDAP server was unreachable
  • pam_sss (the modern way) delegates to SSSD, which caches credentials and handles failover — SSSD is the layer between Linux and the directory
  • Tracing a single SSH login: sshd → PAM → pam_sss → SSSD → LDAP Bind + Search → session created

The Big Picture: One SSH Login, Four Layers

You type: ssh [email protected]

  sshd
    │
    ▼
  PAM  (/etc/pam.d/sshd)          ← "Is this user allowed in?"
    │
    ├── pam_sss    (auth)          ← sends credentials to SSSD
    ├── pam_sss    (account)       ← checks account not expired/locked
    ├── pam_sss    (session)       ← logs the session open/close
    └── pam_mkhomedir (session)    ← creates /home/vamshi if it doesn't exist
    │
    ▼
  SSSD  (/etc/sssd/sssd.conf)     ← "Let me check the directory"
    │
    ├── NSS responder              ← answers getent, id, getpwnam
    └── LDAP/Kerberos provider     ← talks to the actual directory
    │
    ▼
  LDAP Server (AD / OpenLDAP)
    │
    ├── Bind: uid=vamshi + password (or Kerberos ticket)
    └── Search: posixAccount attrs for uid=vamshi
    │
    ▼
  Linux session created
  UID=1001, GID=1001, HOME=/home/vamshi, SHELL=/bin/bash

EP02 showed what the directory contains and what travels on the wire. What it left open is how Linux uses that to grant a login — and why LDAP is not, by itself, an authentication protocol.


Why LDAP Is Not an Authentication Protocol

This is the confusion that trips people most. LDAP can verify a password — the Bind operation does exactly that. But authentication on Linux means something broader: checking credentials, checking account validity, enforcing password policy, setting up a session, creating a home directory. LDAP handles one piece of that. PAM handles the rest.

More precisely: LDAP doesn’t know what a Linux session is. It doesn’t know about /etc/pam.d/. It doesn’t enforce login hours, account expiry, or concurrent session limits. It returns directory entries and verifies binds. The intelligence about what to do with those results lives in the Linux authentication stack.

When you run ssh vamshi@server, the OS doesn’t open an LDAP connection and ask “can this user log in?” It calls PAM. PAM consults its configuration, and PAM decides whether to call LDAP (directly or via SSSD), whether to check the shadow file, whether to enforce MFA. LDAP is one possible backend. It’s not the gatekeeper.


NSS: The Traffic Controller

Before PAM runs, Linux needs to know if the user exists at all. That’s NSS’s job.

/etc/nsswitch.conf is a routing table for name resolution. It tells the OS where to look when something asks “who is UID 1001?” or “what groups is vamshi in?”:

# /etc/nsswitch.conf

passwd:     files sss        ← user lookups: check /etc/passwd first, then SSSD
group:      files sss        ← group lookups: check /etc/group first, then SSSD
shadow:     files sss        ← shadow password lookups
hosts:      files dns        ← hostname lookups (not identity-related)
netgroup:   sss              ← NIS netgroups from SSSD only
automount:  sss              ← autofs maps from SSSD

Every call to getpwnam(), getpwuid(), getgrnam(), getgrgid() in any process — including sshd — goes through NSS. The entries in nsswitch.conf control which backends are tried in order.

With passwd: files sss, a lookup for user vamshi:
1. Checks /etc/passwd — not found (vamshi is a domain user, not in local files)
2. Queries SSSD — SSSD checks its cache, or queries LDAP, and returns the posixAccount attributes

Without the sss entry in passwd:, domain users don’t exist on the system — getent passwd vamshi returns nothing, id vamshi fails, SSH login never gets to PAM’s authentication step.

# Verify NSS is routing to SSSD correctly
getent passwd vamshi
# vamshi:*:1001:1001:Vamshi K:/home/vamshi:/bin/bash

# If this returns nothing, NSS isn't reaching SSSD
# Check: systemctl status sssd && grep passwd /etc/nsswitch.conf

# See what groups the user is in (NSS group lookup)
id vamshi
# uid=1001(vamshi) gid=1001(engineers) groups=1001(engineers),1002(ops)

PAM: The Real Gatekeeper

PAM (Pluggable Authentication Modules) is the framework that lets Linux swap authentication backends without recompiling anything. Every service that needs to authenticate users — sshd, sudo, login, su, gdm — has a PAM configuration file in /etc/pam.d/.

Each PAM config defines four stacks:

auth        ← verify credentials (password, key, MFA)
account     ← check if the account is valid (not expired, not locked, login hours)
password    ← password change policy
session     ← set up/tear down the session (home dir, limits, logging)

A typical /etc/pam.d/sshd on a system joined to AD via SSSD:

# /etc/pam.d/sshd

# auth stack — verify the user's credentials
auth    required      pam_sepermit.so
auth    substack      password-auth   ← usually includes pam_sss.so

# account stack — check account validity
account required      pam_nologin.so
account include       password-auth

# password stack — handle password changes
password include      password-auth

# session stack — set up the session
session required      pam_selinux.so close
session required      pam_loginuid.so
session optional      pam_keyinit.so force revoke
session include       password-auth
session optional      pam_motd.so
session optional      pam_mkhomedir.so skel=/etc/skel/ umask=0077
session required      pam_selinux.so open

The include and substack directives pull in shared stacks from other files (like /etc/pam.d/password-auth). On a system with SSSD, password-auth contains:

auth    required      pam_env.so
auth    sufficient    pam_sss.so      ← try SSSD first
auth    required      pam_deny.so     ← if pam_sss fails, deny

account required      pam_unix.so
account sufficient    pam_localuser.so
account sufficient    pam_sss.so      ← SSSD account check
account required      pam_permit.so

session optional      pam_sss.so      ← SSSD session tracking

The sufficient flag means: if this module succeeds, stop checking this stack and consider it passed. required means: this must pass (but continue checking other modules and report failure at the end). requisite means: if this fails, stop immediately.


PAM Control Flags at a Glance

required   — must succeed; failure reported after remaining modules run
requisite  — must succeed; failure reported immediately, stack stops
sufficient — if success, stop stack (ignore remaining); failure continues
optional   — result ignored unless it's the only module in the stack

This matters for debugging. If pam_sss.so is sufficient and SSSD is down, PAM falls through to pam_deny.so — login denied. If it were optional, the login would proceed to the next module. The control flag is the policy decision.


The Old Way: pam_ldap

Before SSSD, Linux systems used pam_ldap and nss_ldap directly:

# Old /etc/pam.d/common-auth (Ubuntu pre-SSSD era)
auth    sufficient    pam_ldap.so    ← direct LDAP connection per login
auth    required      pam_unix.so nullok_secure

# Old /etc/nsswitch.conf
passwd: files ldap    ← nss_ldap for user lookups
group:  files ldap

pam_ldap opened a fresh LDAP connection on every login attempt. No caching. If the LDAP server was unreachable for 3 seconds, the login hung for 3 seconds — sometimes much longer. If the LDAP server was down, all domain logins failed immediately. Previously logged-in users with active sessions were fine; new logins simply didn’t work.

nss_ldap had the same problem for NSS lookups: every getpwnam() call hit the LDAP server directly. On a busy system with many processes doing user lookups, this meant hundreds of LDAP queries per second, no connection reuse, and no way to survive a brief network blip.

The problems were structural:
– No credential caching — offline logins impossible
– No connection pooling — LDAP server saw one connection per login attempt
– No failover logic — one LDAP server down meant all logins down
– Slow timeouts that blocked login sessions

SSSD was built to fix all of this.


The Modern Way: pam_sss + SSSD

pam_sss doesn’t talk to LDAP directly. It’s a thin client that passes authentication requests to SSSD over a Unix domain socket. SSSD manages the LDAP connection, the credential cache, and the failover logic.

sshd  →  PAM (pam_sss)  →  SSSD (Unix socket)  →  LDAP server
                                   │
                                   └── credential cache
                                       (survives brief LDAP outages)

When pam_sss sends a credential to SSSD:
1. SSSD checks its in-memory cache — if the credential hash matches a recent successful auth, it can satisfy the request without hitting LDAP
2. If not cached (or cache expired), SSSD sends a Bind to the LDAP server
3. On success, SSSD caches the result and returns success to pam_sss
4. pam_sss returns PAM_SUCCESS, and the auth stack continues

The credential cache is what enables offline logins. If the LDAP server is unreachable and a user has authenticated successfully within the cache TTL (default: 1 day for credentials, configurable via cache_credentials = True in sssd.conf), SSSD satisfies the auth from cache and the login succeeds. The user never knows the LDAP server was down.


Tracing a Full SSH Login

Here’s every step of an SSH login for a domain user, in order:

1.  sshd accepts the TCP connection
2.  sshd calls PAM: pam_start("sshd", "vamshi", ...)

3.  PAM auth stack runs pam_sss:
      pam_sss sends credentials to SSSD via /var/lib/sss/pipes/pam

4.  SSSD auth provider:
      a. Check credential cache — miss (first login)
      b. Resolve user: NSS lookup for uid=vamshi
         → SSSD LDAP provider searches dc=corp,dc=com for (uid=vamshi)
         → Returns: uidNumber=1001, gidNumber=1001, homeDirectory=/home/vamshi
      c. Authenticate: LDAP Simple Bind as uid=vamshi,ou=engineers,dc=corp,dc=com
         → Server returns: success
      d. Cache the credential hash + POSIX attrs

5.  SSSD returns PAM_SUCCESS to pam_sss

6.  PAM account stack runs pam_sss:
      SSSD checks: account not expired, not locked, login permitted
      → PAM_ACCT_MGMT success

7.  PAM session stack:
      pam_loginuid sets /proc/self/loginuid = 1001
      pam_mkhomedir creates /home/vamshi if missing
      pam_sss opens session (records in SSSD session tracking)

8.  sshd creates the shell, sets environment:
      USER=vamshi, HOME=/home/vamshi, SHELL=/bin/bash, LOGNAME=vamshi

9.  Shell prompt appears

Steps 4b and 4c are the only two LDAP operations in the entire login flow: one Search to resolve the user’s attributes, one Bind to verify the password. Everything else is PAM and SSSD.


Debugging the Stack

When a login fails, the failure could be in any layer. Work top-down:

# 1. Does NSS resolve the user at all?
getent passwd vamshi
# If empty: NSS isn't reaching SSSD, or SSSD isn't finding the user in LDAP

# 2. Is SSSD running and healthy?
systemctl status sssd
sssctl domain-status corp.com      # shows SSSD's view of domain connectivity

# 3. What does SSSD think about the user?
sssctl user-checks vamshi          # runs auth + account checks internally
id vamshi                          # forces NSS resolution and shows group memberships

# 4. What does SSSD's log say?
journalctl -u sssd -f              # tail SSSD logs live, then attempt login

# 5. Can you reach the LDAP server at all?
ldapsearch -x -H ldap://dc.corp.com \
  -D "cn=svc-ldap,ou=services,dc=corp,dc=com" \
  -w "password" \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)" dn

# 6. Force a cache flush if entries are stale
sss_cache -u vamshi                # invalidate this user's cache entry
sss_cache -G engineers             # invalidate a group

The sssctl user-checks command is the single most useful diagnostic — it simulates the full PAM auth + account check flow without actually creating a session, and prints exactly what SSSD would do on a real login attempt.


⚠ Common Misconceptions

“If ldapsearch works, SSH login should work.” Not necessarily. ldapsearch tests the LDAP layer. An SSH login requires NSS to resolve the user, PAM to authenticate, SSSD to be running and configured correctly, and pam_mkhomedir to create the home directory if it’s the first login. Any of these can fail independently.

“pam_ldap and pam_sss do the same thing.” They have the same job (authenticate via LDAP) but completely different architectures. pam_ldap is a direct-connect, no-cache module. pam_sss is a client of SSSD, which provides caching, connection pooling, failover, and offline support. On any modern system, you want pam_sss.

“nsswitch.conf order doesn’t matter much.” It matters exactly as much as the order suggests. passwd: files sss means local /etc/passwd is always checked first — if a domain username collides with a local user, the local account wins. This is the intended behavior (local accounts should always be reachable), but it means you’ll never override a local account with a directory entry.

“SSSD cache = security risk.” The cache stores a credential hash, not the cleartext password. An attacker with access to the SSSD cache database (/var/lib/sss/db/) would see hashed credentials — the same situation as /etc/shadow. The real concern is whether offline authentication is appropriate for your security posture; it can be disabled with offline_credentials_expiration = 0.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management PAM is the enforcement layer for authentication policy on Linux — understanding its stack is foundational to any Linux IAM deployment
CISSP Domain 3: Security Architecture and Engineering The separation between NSS (resolution) and PAM (authentication) is an architectural boundary — misunderstanding it leads to misconfigured systems where account checks are bypassed
CISSP Domain 4: Communications and Network Security pam_ldap vs pam_sss affects whether credentials travel over a direct LDAP connection (one socket per login, no TLS guarantee) or through SSSD’s managed, pooled connection

Key Takeaways

  • LDAP alone is not an authentication protocol for Linux — authentication flows through PAM, and LDAP is one of PAM’s possible backends
  • NSS (/etc/nsswitch.conf) resolves user identity (who is UID 1001?); PAM enforces it (are they allowed in?)
  • pam_ldap talks to LDAP directly — no cache, no failover, login blocked when LDAP is unreachable
  • pam_sss delegates to SSSD — credential caching, connection pooling, offline login, and failover are all built in
  • A full SSH login touches LDAP exactly twice: one Search for POSIX attributes, one Bind to verify the password
  • When login fails, debug top-down: NSS resolution → SSSD status → LDAP reachability → PAM config

What’s Next

EP03 showed how authentication reaches LDAP — through PAM, through SSSD, through a Bind. What it assumed is that SSSD is healthy and the LDAP server is reachable. The moment either goes wrong, the behavior depends entirely on how SSSD is configured — its cache TTLs, its failover order, its offline credential policy.

EP04 goes inside SSSD: the architecture, the sssd.conf knobs that matter, how to read the logs, and how to break it intentionally and fix it.

Next: SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe