How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Reading Time: 6 minutes

The Identity Stack, Episode 9
EP08: FreeIPAEP09EP10: SAML/OIDC → …

Focus Keyphrase: Active Directory LDAP
Search Intent: Informational
Meta Description: Active Directory is LDAP + Kerberos + DNS + Group Policy — all tightly integrated. Here’s how AD replication, Sites, GPO, and Linux domain join actually work. (160 chars)


TL;DR

  • Active Directory is not a product that happens to use LDAP — it is an LDAP directory with a Microsoft-extended schema, a built-in Kerberos KDC, and DNS tightly integrated
  • Replication uses USNs (Update Sequence Numbers) and GUIDs — the Knowledge Consistency Checker (KCC) automatically builds the replication topology
  • Sites and site links tell AD which DCs are physically close — AD prefers to authenticate users against a DC in the same site to minimize WAN latency
  • Group Policy Objects (GPOs) are stored as LDAP entries (in the CN=Policies container) and Sysvol files — LDAP tells clients which GPOs apply; Sysvol delivers the policy files
  • Linux joins AD via realm join (uses adcli + SSSD) or net ads join (Samba + winbind) — both register a machine account in AD and get a Kerberos keytab
  • The difference between Linux in AD and Linux in FreeIPA: AD is optimized for Windows; FreeIPA is optimized for Linux — both interoperate

The Big Picture: What AD Actually Is

Active Directory Domain: corp.com
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  LDAP directory          Kerberos KDC                      │
│  ─────────────           ──────────                        │
│  Schema: 1000+ classes   Realm: CORP.COM                   │
│  Objects: users, groups, Issues TGTs + service tickets     │
│  computers, GPOs, OUs    Uses LDAP as the account DB       │
│                                                            │
│  DNS                     Sysvol (DFS share)                │
│  ────                    ────────────────                  │
│  SRV records for KDC     GPO templates                     │
│  and LDAP discovery      Login scripts                     │
│                          Replicated via DFSR               │
│                                                            │
│  Replication engine: USN + GUID + KCC                      │
└────────────────────────────────────────────────────────────┘
          │ replicates to          │ replicates to
          ▼                        ▼
   DC: dc02.corp.com        DC: dc03.corp.com

EP08 showed FreeIPA as the Linux-native answer to enterprise identity. AD is the Microsoft answer — and because most enterprises run Windows clients, understanding AD is unavoidable for Linux infrastructure engineers. This episode goes behind the LDAP and Kerberos protocols to explain what makes AD specifically work.


The AD Schema: LDAP With 1000+ Object Classes

AD’s schema extends the base LDAP schema with Microsoft-specific classes and attributes. Every user object is a user class (which extends organizationalPerson which extends person which extends top) with additional attributes like:

sAMAccountName   ← the pre-Windows 2000 login name (vamshi)
userPrincipalName ← the modern UPN ([email protected])
objectGUID       ← a globally unique 128-bit identifier (never changes, even if DN changes)
objectSid        ← Windows Security Identifier (used for ACL enforcement on Windows)
whenCreated      ← creation timestamp
pwdLastSet       ← password change timestamp
userAccountControl ← bitmask: disabled, locked, password never expires, etc.
memberOf         ← back-link: groups this user belongs to

objectGUID is the authoritative identifier in AD — not the DN. When a user is renamed or moved to a different OU, the GUID stays the same. Applications that store a user’s DN will break on rename; applications that store the GUID won’t.

userAccountControl is the bitmask that controls account state:

Flag          Value   Meaning
ACCOUNTDISABLE  2     Account disabled
LOCKOUT         16    Account locked out
PASSWD_NOTREQD  32    Password not required
NORMAL_ACCOUNT  512   Normal user account (set on almost all accounts)
DONT_EXPIRE_PASSWD 65536  Password never expires
# Query AD from a Linux machine
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName userPrincipalName objectGUID memberOf userAccountControl

Replication: USN + GUID + KCC

AD replication is multi-master — every DC accepts writes. The replication engine uses:

USN (Update Sequence Number) — a per-DC counter that increments on every local write. Each attribute in the directory stores the USN at which it was last modified (uSNChanged, uSNCreated). When DC-A replicates to DC-B, DC-B asks: “give me everything you’ve changed since the last USN I saw from you.”

GUID — each object has a globally unique identifier. If the same attribute is modified on two DCs before replication (a conflict), the conflict is resolved: last-writer-wins at the attribute level, based on the modification timestamp. If timestamps are equal, the attribute value from the DC with the lexicographically higher GUID wins.

KCC (Knowledge Consistency Checker) — a component that runs on every DC and automatically constructs the replication topology. You don’t configure which DCs replicate to which — the KCC builds a minimum spanning tree that ensures every DC is connected to every other within a set number of hops. You configure Sites and site links; the KCC does the rest.

# Check replication status from a Linux machine (requires rpcclient or adcli)
# Or on the DC: repadmin /showrepl (Windows tool)

# Simulate: query the highestCommittedUSN from a DC
ldapsearch -x -H ldap://dc.corp.com \
  -D "[email protected]" -w password \
  -b "" -s base highestCommittedUSN

Sites are AD’s concept of physical network topology. A site is a set of IP subnets with high-bandwidth connectivity between them. Site links represent the WAN connections between sites.

Site: Mumbai              Site: Hyderabad
┌────────────────┐        ┌────────────────┐
│ DC: dc-mum-01  │        │ DC: dc-hyd-01  │
│ DC: dc-mum-02  │        │ DC: dc-hyd-02  │
│ subnet: 10.1/16│        │ subnet: 10.2/16│
└───────┬────────┘        └────────┬───────┘
        │                          │
        └──── Site Link ───────────┘
              Cost: 100
              Replication interval: 15 min

When a user in Mumbai authenticates, AD’s KDC locates a DC in the same site using DNS SRV records. The SRV records include the site name in the service name: _ldap._tcp.Mumbai._sites.dc._msdcs.corp.com. SSSD and Windows clients query site-local SRV records first.

If no DC is available in the local site, authentication falls back to a DC in another site across the WAN link. Configuring sites correctly prevents remote authentication failures from killing local operations.


Group Policy: LDAP + Sysvol

GPOs are stored in two places:

LDAP — the CN=Policies,CN=System,DC=corp,DC=com container holds GPO metadata objects. Each GPO has a GUID, a display name, and version numbers. The gPLink attribute on OUs and the domain root links GPOs to where they apply.

Sysvol — the actual policy templates and scripts live in \\corp.com\SYSVOL\corp.com\Policies\{GPO-GUID}\. Sysvol is a DFS-R (Distributed File System Replication) share replicated to every DC.

When a Windows client applies Group Policy:
1. LDAP query: what GPOs are linked to my OU chain?
2. Sysvol fetch: download the policy templates from the GPO’s Sysvol path
3. Apply: process Registry settings, Security settings, Scripts

Linux clients don’t process GPOs natively. The adcli and sssd tools interpret a small subset of AD policy (password policy, account lockout) via LDAP. Full GPO processing on Linux requires Samba’s samba-gpupdate or third-party tools.


Joining Linux to AD

# Install required packages
dnf install -y realmd sssd adcli samba-common

# Discover the domain
realm discover corp.com
# corp.com
#   type: kerberos
#   realm-name: CORP.COM
#   domain-name: corp.com
#   configured: no
#   server-software: active-directory
#   client-software: sssd

# Join
realm join corp.com -U Administrator
# Prompts for Administrator password
# Creates machine account in AD
# Configures sssd.conf, krb5.conf, nsswitch.conf, pam.d automatically

# Verify
realm list
id [email protected]

What the join does:

  1. Creates a machine account HOSTNAME$ in CN=Computers,DC=corp,DC=com
  2. Sets a machine password (rotated automatically by SSSD)
  3. Retrieves a Kerberos keytab to /etc/krb5.keytab
  4. Configures SSSD with id_provider = ad, auth_provider = ad
  5. Updates /etc/nsswitch.conf to include sss
  6. Updates /etc/pam.d/ to include pam_sss

After joining, SSSD uses the machine’s Kerberos keytab to authenticate to the DC and query LDAP — no hardcoded service account credentials required.


LDAP Queries Against AD from Linux

# Find a user (after kinit or with -w password)
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(sAMAccountName=vamshi)" \
  sAMAccountName mail memberOf

# Find all members of a group
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(cn=engineers)" \
  member

# Find all AD-joined Linux machines
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(&(objectClass=computer)(operatingSystem=*Linux*))" \
  cn operatingSystem lastLogonTimestamp

# Find disabled accounts
ldapsearch -Y GSSAPI -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(userAccountControl:1.2.840.113556.1.4.803:=2)" \
  sAMAccountName

The last filter uses an LDAP extensible match (1.2.840.113556.1.4.803 is the OID for bitwise AND). userAccountControl:1.2.840.113556.1.4.803:=2 means “entries where userAccountControl AND 2 equals 2” — i.e., the ACCOUNTDISABLE bit is set. This is a Microsoft AD extension not in standard LDAP.


⚠ Common Misconceptions

“AD is just Microsoft’s LDAP.” AD is LDAP + Kerberos + DNS + DFS-R + GPO, all tightly integrated and with a schema that the Microsoft ecosystem depends on. You can query AD with standard ldapsearch. You cannot replace it with OpenLDAP without breaking every Windows client.

“Linux machines in AD get GPO.” Linux machines appear in AD and can be organized into OUs. Standard GPOs don’t apply to them. Samba’s samba-gpupdate can process a subset of AD policy for Linux — mostly Registry and Security settings mapped to Linux equivalents.

“realm leave removes the machine cleanly.” realm leave removes local configuration but does not delete the machine account from AD. The stale computer object stays in CN=Computers until an AD admin deletes it. Always run realm leave && adcli delete-computer -U Administrator for a clean removal.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management AD is the dominant enterprise identity store — understanding its LDAP structure, Kerberos realm, and GPO model is essential for IAM in mixed environments
CISSP Domain 4: Communications and Network Security AD replication traffic (RPC, LDAP, Kerberos) is a significant portion of enterprise WAN traffic — Sites and site links are a network security and performance design decision
CISSP Domain 3: Security Architecture and Engineering AD forest/domain/OU hierarchy is an architectural decision with long-term security consequences — getting OU structure wrong constrains GPO delegation for years

Key Takeaways

  • AD is LDAP + Kerberos + DNS + GPO + DFS-R — not a product that “uses” these; they’re the implementation
  • Replication is multi-master via USN + GUID; the KCC builds the topology automatically from Sites configuration
  • objectGUID is the stable identifier — not the DN, which changes on rename/move
  • realm join is the correct way to join Linux to AD — it configures SSSD, Kerberos, PAM, and NSS correctly in one command
  • userAccountControl is the bitmask that controls account state — (userAccountControl:1.2.840.113556.1.4.803:=2) finds disabled accounts

What’s Next

EP09 covered AD — LDAP and Kerberos inside the corporate network. EP10 covers what happens when identity needs to work across the internet, where Kerberos doesn’t reach: SAML, OAuth2, and OIDC — the protocols that let identity leave the building.

Next: SAML vs OIDC vs OAuth2: Which Protocol Handles Which Identity Problem

Get EP10 in your inbox when it publishes → linuxcent.com/subscribe

FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Reading Time: 5 minutes

The Identity Stack, Episode 8
EP07: LDAP HAEP08EP09: Active Directory → …

Focus Keyphrase: FreeIPA setup
Search Intent: Investigational
Meta Description: FreeIPA integrates 389-DS, MIT Kerberos, Dogtag PKI, and SSSD into one Linux identity stack. Here’s what it gives you and how to use it effectively. (153 chars)


TL;DR

  • FreeIPA is 389-DS (LDAP) + MIT Kerberos + Dogtag PKI + Bind DNS + SSSD — one ipa-server-install command gets you an enterprise identity platform
  • Host-Based Access Control (HBAC) lets you define centrally: which users can SSH to which hosts — no more managing /etc/security/access.conf per machine
  • Sudo rules from the directory: define sudo policy centrally, have every machine pull it — no /etc/sudoers.d/ files scattered across the fleet
  • ipa CLI is the management interface — ipa user-add, ipa group-add, ipa hbacrule-add — everything that took five LDAP commands takes one ipa command
  • FreeIPA trusts with Active Directory let Linux machines authenticate AD users without joining the AD domain
  • The right choice for Linux-centric environments; AD is the right choice when Windows clients dominate

The Big Picture: What FreeIPA Integrates

┌─────────────────────────────────────────────────────────┐
│                    FreeIPA Server                        │
│                                                         │
│  389-DS (LDAP)    MIT Kerberos    Dogtag PKI            │
│  ─────────────    ───────────     ─────────             │
│  User/group       TGT + service   Machine certs         │
│  storage          ticket issuing  User certs             │
│                                   OCSP / CRL            │
│  Bind DNS         SSSD (client)   Apache (WebUI)        │
│  ──────────       ────────────    ──────────────        │
│  SRV records      Enrollment      Management UI         │
│  for KDC/LDAP     automation      REST API              │
└─────────────────────────────────────────────────────────┘
              ▲                  ▲
              │ enrollment       │ SSH + sudo rules
   ┌──────────┴──────────┐  ┌───┴──────────────────┐
   │  Linux client        │  │  Linux client         │
   │  (ipa-client-install)│  │  (ipa-client-install) │
   └─────────────────────┘  └──────────────────────┘

EP06 and EP07 built OpenLDAP from components. FreeIPA gives you all of that plus Kerberos, PKI, DNS, and HBAC — opinionated, integrated, and managed through a single CLI and WebUI. This episode shows what you actually get from it.


Why FreeIPA Instead of Bare OpenLDAP

Running bare OpenLDAP requires you to:
– Configure schema for POSIX accounts, SSH keys, sudo rules, HBAC manually
– Set up MIT Kerberos separately and integrate it with LDAP
– Build your own PKI for machine certificates
– Maintain DNS SRV records for Kerberos discovery
– Write client enrollment scripts
– Build a management interface (or live in LDIF)

FreeIPA does all of this in one installer, with a consistent data model across all components. The trade-off is opacity — FreeIPA makes decisions for you (schema, replication topology, Kerberos realm name) that bare OpenLDAP leaves to you.


Installing FreeIPA Server

# RHEL / Rocky / AlmaLinux
dnf install -y freeipa-server freeipa-server-dns

# Run the installer (interactive)
ipa-server-install

# Or non-interactive:
ipa-server-install \
  --realm=CORP.COM \
  --domain=corp.com \
  --ds-password=DM_password \
  --admin-password=Admin_password \
  --setup-dns \
  --forwarder=8.8.8.8 \
  --unattended

# After install: get an admin Kerberos ticket
kinit admin

The installer creates:
– 389-DS instance with the FreeIPA schema
– MIT KDC with realm CORP.COM
– Dogtag CA and all certificate infrastructure
– Bind DNS with SRV records for the KDC and LDAP server
– Apache WebUI at https://ipa.corp.com/ipa/ui/
– SSSD configured on the server itself

Time: 5–10 minutes. What used to take a week of manual configuration.


The ipa CLI

Every management action goes through ipa. It talks to the IPA server’s REST API and handles Kerberos authentication transparently (it uses your kinit session).

# Users
ipa user-add vamshi \
  --first=Vamshi --last=Krishna \
  [email protected] \
  --password

ipa user-show vamshi
ipa user-find --all              # search all users
ipa user-disable vamshi          # lock account without deleting
ipa user-mod vamshi --shell=/bin/zsh

# Groups
ipa group-add engineers --desc "Engineering team"
ipa group-add-member engineers --users=vamshi,alice

# Password policy
ipa pwpolicy-mod --minlength=12 --maxlife=90 --history=10

# SSH public keys — stored centrally, pushed to every host
ipa user-mod vamshi --sshpubkey="ssh-ed25519 AAAA..."
# SSSD on enrolled hosts will use this key for SSH login — no authorized_keys file needed

Host-Based Access Control (HBAC)

HBAC is the feature that justifies FreeIPA for most Linux shops. It lets you define centrally: which users (or groups) can log in to which hosts (or host groups), using which services (SSH, sudo, FTP).

Without HBAC, access control is per-machine: /etc/security/access.conf or PAM pam_access rules, replicated across every server, managed inconsistently.

With HBAC: one rule, enforced everywhere.

# Create host groups
ipa hostgroup-add production-servers --desc "Production Linux hosts"
ipa hostgroup-add-member production-servers --hosts=web01.corp.com,db01.corp.com

# Create user groups
ipa group-add sre-team
ipa group-add-member sre-team --users=vamshi,alice

# Create an HBAC rule
ipa hbacrule-add allow-sre-to-prod \
  --desc "SRE team can SSH to production"
ipa hbacrule-add-user allow-sre-to-prod --groups=sre-team
ipa hbacrule-add-host allow-sre-to-prod --hostgroups=production-servers
ipa hbacrule-add-service allow-sre-to-prod --hbacsvcs=sshd

# Test the rule before applying
ipa hbactest \
  --user=vamshi \
  --host=web01.corp.com \
  --service=sshd
# Access granted: True
# Matched rules: allow-sre-to-prod

SSSD on each enrolled host enforces the HBAC rules at login time by querying the IPA server. No per-machine configuration. Add a new server to the production-servers host group and the HBAC rules apply immediately.


Sudo Rules from the Directory

# Create a sudo rule
ipa sudorule-add allow-sre-sudo \
  --cmdcat=all \
  --desc "SRE team gets full sudo on production"
ipa sudorule-add-user allow-sre-sudo --groups=sre-team
ipa sudorule-add-host allow-sre-sudo --hostgroups=production-servers

# Or a scoped rule — only specific commands
ipa sudorule-add allow-service-restart
ipa sudocmdgroup-add service-commands
ipa sudocmd-add /usr/bin/systemctl
ipa sudocmdgroup-add-member service-commands --sudocmds="/usr/bin/systemctl"
ipa sudorule-add-allow-command allow-service-restart --sudocmdgroups=service-commands

On enrolled hosts, SSSD’s sssd_sudo responder pulls these rules and the sudo command evaluates them locally. No /etc/sudoers.d/ files. Central policy, local enforcement.


Enrolling a Client

# On the client machine
dnf install -y freeipa-client

ipa-client-install \
  --domain=corp.com \
  --server=ipa.corp.com \
  --realm=CORP.COM \
  --principal=admin \
  --password=Admin_password \
  --unattended

# What this does:
# 1. Registers the host in IPA as a machine principal
# 2. Retrieves a host Kerberos keytab (/etc/krb5.keytab)
# 3. Configures SSSD (sssd.conf, nsswitch.conf, pam.d)
# 4. Configures Kerberos (/etc/krb5.conf)
# 5. Optionally configures NTP and DNS

After enrollment: getent passwd vamshi returns the IPA user. SSH with an IPA password works. HBAC rules are enforced. Sudo rules from the directory apply. SSH public keys from the user’s IPA profile work without authorized_keys files.


FreeIPA Trust with Active Directory

In mixed environments (Linux servers + Windows clients), you can establish a trust between FreeIPA and AD without joining the Linux servers to the AD domain directly.

# On the IPA server (after installing ipa-server-trust-ad)
ipa-adtrust-install --netbios-name=CORP

# Establish the trust
ipa trust-add ad.corp.com \
  --admin=Administrator \
  --password \
  --type=ad

# AD users can now log in to IPA-enrolled Linux hosts
# They appear as: CORP.COM\username or [email protected]

Under the hood: FreeIPA acts as an SSSD-enabled Samba DC for the trust relationship. AD users’ Kerberos tickets from the AD KDC are accepted by the FreeIPA KDC, which maps them to POSIX attributes stored in IPA (or automatically generated via ID mapping).


⚠ Common Misconceptions

“FreeIPA is just OpenLDAP with a UI.” FreeIPA uses 389-DS (not OpenLDAP), adds a full Kerberos KDC, a certificate authority, DNS, HBAC enforcement, and sudo management — all with a consistent schema designed for these use cases. It’s an integrated identity platform, not a wrapper.

“HBAC rules replace firewall rules.” HBAC controls who can log in to a host at the authentication layer — not network access. A blocked HBAC rule means the SSH session is rejected after TCP connection. You still need firewall rules to block TCP access.

“FreeIPA replicas are identical.” FreeIPA uses 389-DS Multi-Supplier replication. All replicas accept reads and writes. But the CA is separate — only the initial server (and explicitly designated CA replicas) run the CA. If the CA goes down, certificate operations stop; authentication does not.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management FreeIPA is an enterprise IAM platform — HBAC, sudo policy, SSH key management, and certificate-based authentication are all IAM controls
CISSP Domain 3: Security Architecture and Engineering FreeIPA’s integrated CA enables certificate-based authentication for machines and users — a stronger authentication factor than passwords
CISSP Domain 1: Security and Risk Management Centralized HBAC and sudo policy reduces the attack surface of privilege escalation — no more inconsistent sudoers files that drift across the fleet

Key Takeaways

  • FreeIPA = 389-DS + MIT Kerberos + Dogtag PKI + Bind DNS — one installer, one management interface
  • HBAC rules define centrally who can SSH to which host groups — enforced by SSSD on every enrolled client, no per-machine config
  • Sudo rules from the directory replace scattered /etc/sudoers.d/ files — central policy, SSSD-enforced locally
  • ipa hbactest lets you verify access rules before a user hits a blocked login — use it before every policy change
  • For Linux-centric environments: FreeIPA. For Windows-dominant environments: AD. For mixed: FreeIPA trust with AD.

What’s Next

FreeIPA is the Linux answer to enterprise identity. EP09 covers the Microsoft answer — Active Directory — which extended LDAP and Kerberos into a complete enterprise platform with Group Policy, Sites, and a replication model built for global scale.

Next: How Active Directory Works: LDAP, Kerberos, and Group Policy Under the Hood

Get EP09 in your inbox when it publishes → linuxcent.com/subscribe

LDAP High Availability: Load Balancing and Production Architecture

Reading Time: 6 minutes

The Identity Stack, Episode 7
EP06: OpenLDAPEP07EP08: FreeIPA → …

Focus Keyphrase: LDAP high availability
Search Intent: Informational
Meta Description: Design LDAP high availability for production: HAProxy load balancing, read/write split, connection pooling, monitoring with cn=monitor, and 389-DS at scale. (157 chars)


TL;DR

  • LDAP HA means multiple directory servers behind a load balancer — clients connect to a VIP, not to individual servers
  • Read/write split: all writes go to the provider, reads are distributed across consumers — the load balancer enforces this by routing on port or backend check
  • SSSD handles multi-server failover natively (ldap_uri accepts a comma-separated list) — for apps without built-in failover, HAProxy with health checks does the work
  • Connection pooling is critical at scale — nss_ldap and pam_ldap opened a new connection per login; SSSD maintains a pool; apps that use libldap directly must implement their own
  • cn=monitor is the built-in monitoring endpoint — exposes connection counts, operation rates, and backend stats readable via ldapsearch
  • 389-DS (Red Hat Directory Server) is the production choice for >1M entries — purpose-built for large directories with a dedicated replication engine

The Big Picture: Production LDAP Topology

         Clients (SSSD, apps, VPN concentrators)
                      │
              ┌───────▼───────┐
              │   HAProxy VIP  │   ← single endpoint, port 389/636
              │  10.0.0.10     │
              └───────┬───────┘
                      │
          ┌───────────┼───────────┐
          ▼           ▼           ▼
   ldap1.corp.com  ldap2.corp.com  ldap3.corp.com
   (Provider)      (Consumer)      (Consumer)
   Reads + Writes  Reads only      Reads only
          │           ▲               ▲
          └───────────┴───────────────┘
               SyncRepl replication

EP06 built a two-node replicated directory. This episode covers what happens when the directory becomes infrastructure — when it needs to survive a node failure, handle thousands of connections, and be monitored like any other critical service.


HAProxy for LDAP

HAProxy is the standard choice for LDAP load balancing. Unlike HTTP, LDAP is a stateful protocol — once a client binds, subsequent operations on that connection share the authenticated session. The load balancer must use connection persistence, not per-request routing.

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 50000

defaults
    mode tcp                  # LDAP is TCP, not HTTP
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    option tcplog

# ── LDAP read/write split ─────────────────────────────────────────────

# Writes → provider only
frontend ldap-write
    bind *:389
    default_backend ldap-provider

backend ldap-provider
    balance first                   # always use first available (provider)
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s rise 2 fall 3
    server ldap2 ldap2.corp.com:389 check inter 5s rise 2 fall 3 backup

# Reads → all nodes round-robin
frontend ldap-read
    bind *:3389                     # internal read port
    default_backend ldap-consumers

backend ldap-consumers
    balance roundrobin
    option tcp-check
    tcp-check connect
    server ldap1 ldap1.corp.com:389 check inter 5s
    server ldap2 ldap2.corp.com:389 check inter 5s
    server ldap3 ldap3.corp.com:389 check inter 5s

# LDAPS (TLS)
frontend ldaps
    bind *:636
    default_backend ldap-consumers-tls

backend ldap-consumers-tls
    balance roundrobin
    server ldap1 ldap1.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem
    server ldap2 ldap2.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem

The health check (tcp-check connect) just verifies TCP connectivity. For a more precise check — verifying that slapd is actually responding to LDAP requests — use a custom script that runs ldapsearch and checks the result code.


SSSD Multi-Server Failover

SSSD has native failover — no load balancer required for SSSD-based clients:

# /etc/sssd/sssd.conf
[domain/corp.com]
ldap_uri = ldap://ldap1.corp.com, ldap://ldap2.corp.com, ldap://ldap3.corp.com
# SSSD tries them in order; switches to next on failure
# Switches back to primary after ldap_recovery_interval (default: 30s)

# For AD, discovery via DNS SRV records is even better:
ad_server = _srv_
# SSSD queries _ldap._tcp.corp.com SRV records and gets all DCs automatically

SSSD monitors the connection health. If the current server becomes unreachable, it switches to the next in the list within seconds. Existing cached data keeps serving during the switchover. Clients using SSSD don’t need a load balancer for basic HA.


Connection Pooling

Every LDAP bind creates an authenticated session on the server. A server with connection limits (olcConnMaxPending, olcConnMaxPendingAuth in OLC) will reject new connections when those limits are hit.

The problem: applications that use libldap directly tend to open a new connection per operation. At 500 requests/second, that’s 500 new TCP connections, 500 binds, 500 TLS handshakes per second — a directory that can handle 5000 concurrent connections starts refusing new ones.

The solutions:

SSSD — handles this automatically. SSSD maintains one or a small number of persistent connections per domain and multiplexes all PAM/NSS queries through them.

Application-level pooling — frameworks like python-ldap with connection pooling, ldap3 with connection strategies, or dedicated middleware like 389-DS‘s Directory Proxy Server.

ldap_maxconnections in OpenLDAP — sets a hard limit. When hit, new connections block until existing ones close. Set this to something reasonable (olcConnMaxPending: 100 in OLC) so you get a controlled failure mode instead of unbounded queuing.


Monitoring with cn=monitor

OpenLDAP exposes live operational statistics via the cn=monitor database — a virtual LDAP subtree that reflects the server’s current state. Enable it:

# enable-monitor.ldif
dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/ldap
olcModuleLoad: back_monitor

dn: olcDatabase=monitor,cn=config
objectClass: olcDatabaseConfig
olcDatabase: monitor
olcAccess: to *
  by dn="cn=admin,dc=corp,dc=com" read
  by * none

Query it:

# Overall statistics
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=monitor" -s sub "(objectClass=*)" \
  monitorOpInitiated monitorOpCompleted

# Connection counts
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Connections,cn=monitor" -s one \
  monitorConnectionNumber

# Operations by type
ldapsearch -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "cn=Operations,cn=monitor" -s one \
  monitorOpInitiated monitorOpCompleted

Useful metrics to export to Prometheus (via prometheus-openldap-exporter or similar):
monitorOpCompleted per operation type (bind, search, modify)
monitorConnectionNumber — current connection count
– Backend-specific: olmMDBEntries, olmMDBPagesMax, olmMDBPagesUsed


389-DS: LDAP at Scale

OpenLDAP is excellent for directories up to a few million entries. When you need:
– 10M+ entries
– High write throughput (more than a few hundred writes/second)
– Fine-grained replication filtering
– A dedicated web-based admin UI

…389-DS (Red Hat Directory Server, community edition) is the production answer. It’s what FreeIPA uses under the hood.

Key architectural differences from OpenLDAP:

Multi-supplier replication — 389-DS’s replication engine uses a dedicated changelog (stored in LMDB) and Change Sequence Numbers (CSNs) for conflict resolution. Multi-supplier (multi-master) replication is first-class, not a bolted-on feature.

Changelog — every change is written to a persistent changelog before being applied. This enables precise replication: a consumer can reconnect after a network partition and get exactly the changes it missed, rather than doing a full resync.

Plugin architecture — 389-DS functionality (replication, managed entries, DNA for automatic UID allocation, memberOf, password policy) is all implemented as plugins that can be enabled/disabled per directory instance.

# Install 389-DS
dnf install -y 389-ds-base

# Create a new instance
dscreate interactive
# — or use a template:
dscreate from-file /path/to/instance.inf

# Manage with dsctl
dsctl slapd-corp status
dsctl slapd-corp start
dsctl slapd-corp stop

# Admin with dsconf
dsconf slapd-corp backend suffix list
dsconf slapd-corp replication status -suffix "dc=corp,dc=com"

The dsconf replication status command gives a live view of replication lag across all suppliers and consumers — something OpenLDAP requires you to compute manually from contextCSN comparisons.


Global Catalog: Cross-Domain Search in AD

When your directory spans multiple AD domains in a forest, the Global Catalog solves a specific problem: a user in emea.corp.com needs to be found by an app that only knows corp.com.

Forest: corp.com
  ├── corp.com       → DC port 389    full directory: 500K entries
  ├── emea.corp.com  → DC port 389    full directory: 200K entries
  └── Global Catalog → GC port 3268  partial replica: 700K entries
                                       (not all attributes — just the most queried ones)

The GC replicates a subset of attributes from every domain in the forest. By default: cn, mail, sAMAccountName, userPrincipalName, memberOf, and about 150 others. Attributes marked with isMemberOfPartialAttributeSet in the schema are replicated to the GC.

If an application is configured to use port 3268 instead of 389, it’s using the GC — and it won’t see attributes not included in the partial attribute set. This surprises teams that add a custom attribute to AD and then wonder why their application can’t see it on 3268 but can on 389.


⚠ Production Gotchas

HAProxy TCP health checks don’t verify LDAP is responsive. A server can accept TCP connections but have slapd in a degraded state (database corruption, out-of-memory). Build a proper LDAP health check: a script that binds and searches a known entry and checks the result.

replication lag under write load. SyncRepl consumers can fall behind under sustained write load. Monitor the contextCSN difference between provider and consumers. If consumers are more than a few seconds behind, investigate the provider’s write throughput and the consumer’s processing speed.

Directory size and the MDB mapsize. LMDB requires a pre-configured maximum database size (olcDbMaxSize). If the database grows beyond this, slapd starts failing writes. Set it to 2–4x your expected data size and monitor olmMDBPagesUsed / olmMDBPagesMax.


Key Takeaways

  • HAProxy in TCP mode provides LDAP load balancing — use balance first for write routing (provider only), balance roundrobin for reads
  • SSSD has native failover via ldap_uri — for SSSD clients, a load balancer adds HA but isn’t strictly required
  • cn=monitor is the built-in OpenLDAP monitoring endpoint — export its counters to Prometheus for operational visibility
  • 389-DS is the right choice for >1M entries, high write throughput, or multi-supplier replication as a first-class feature
  • Global Catalog (port 3268/3269) is a partial replica of all AD domains — useful for forest-wide searches, but missing non-replicated attributes

What’s Next

EP07 covers the infrastructure layer. EP08 zooms out to FreeIPA — what you get when LDAP, Kerberos, DNS, PKI, and HBAC are integrated into a single Linux-native identity stack, and why most Linux shops running their own directory should be running FreeIPA instead of bare OpenLDAP.

Next: FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

OpenLDAP Setup and Replication: Running Your Own Directory

Reading Time: 5 minutes

The Identity Stack, Episode 6
EP01 → … → EP05: KerberosEP06EP07: LDAP HA → …

Focus Keyphrase: OpenLDAP setup
Search Intent: Navigational
Meta Description: Set up OpenLDAP with the MDB backend, configure it via cn=config (OLC), and wire up SyncRepl replication — a complete walkthrough for running your own directory. (162 chars)


TL;DR

  • OpenLDAP’s server process is slapd — the backend that stores data is MDB (LMDB), a memory-mapped B-tree that replaced the old Berkeley DB backend
  • Configuration lives in the directory itself: cn=config (OLC — Online Configuration) lets you modify slapd at runtime without restarting
  • SyncRepl is the replication protocol: a consumer subscribes to a provider and stays in sync via either polling (refreshOnly) or a persistent connection (refreshAndPersist)
  • Multi-Provider (formerly Multi-Master) lets multiple nodes accept writes — conflict resolution uses CSN (Change Sequence Number), last-writer-wins
  • The essential tools: slapd, ldapadd, ldapmodify, ldapsearch, slapcat, slaptest
  • Always build indexes on the attributes you search most — uid, cn, memberOf — or every search is a full scan

The Big Picture: slapd Architecture

ldapsearch / ldapadd / SSSD / any LDAP client
              │ TCP 389 / 636
              ▼
         ┌─────────────────────────────────┐
         │  slapd (OpenLDAP server)         │
         │                                 │
         │  Frontend (protocol layer)       │
         │    • parse BER requests          │
         │    • ACL enforcement             │
         │    • schema validation           │
         │                                 │
         │  Backend (storage layer)         │
         │    • MDB (LMDB) — default       │
         │    • memory-mapped file I/O      │
         │    • ACID transactions           │
         └────────────┬────────────────────┘
                      │
              /var/lib/ldap/
              data.mdb   (the directory data)
              lock.mdb   (LMDB lock file)

EP05 showed Kerberos in isolation. OpenLDAP is where you run the identity store that Kerberos references — and where SSSD looks up user and group attributes. This episode builds a working two-node replicated directory from scratch.


Installation

# Ubuntu / Debian
apt-get install -y slapd ldap-utils

# RHEL / Rocky / AlmaLinux
dnf install -y openldap-servers openldap-clients

# After install — Ubuntu runs a configuration wizard
# Skip it: dpkg-reconfigure slapd
# Or answer it and then switch to OLC management

On RHEL-family systems, slapd is not configured after install — you work entirely through OLC from the start.


OLC: The Directory Configures Itself

The old way was slapd.conf — a static file that required a full restart on every change. OLC (Online Configuration) replaced it: slapd‘s own configuration is stored as LDAP entries under cn=config. You modify configuration the same way you modify data — with ldapmodify. Changes take effect immediately.

cn=config                        ← root config entry
├── cn=schema,cn=config          ← schema definitions
│     ├── cn={0}core             ← core schema
│     ├── cn={1}cosine           ← RFC 1274 attributes
│     └── cn={2}inetorgperson    ← inetOrgPerson object class
├── olcDatabase={-1}frontend     ← default settings for all databases
├── olcDatabase={0}config        ← the config database itself
└── olcDatabase={1}mdb           ← your actual directory data
      ├── olcAccess              ← ACLs
      ├── olcSuffix              ← base DN (e.g., dc=corp,dc=com)
      └── olcDbIndex             ← search indexes

Everything under cn=config has attributes prefixed with olc (OpenLDAP Configuration). You query and modify it just like any other LDAP subtree — with one restriction: only the cn=config admin (usually gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth — the local root via SASL EXTERNAL) can write to it.


Bootstrapping a Directory

The quickest way to get a working directory is a set of LDIF files applied in order.

1. Load schemas

# Apply the schemas OpenLDAP ships with
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/inetorgperson.ldif
ldapadd -Y EXTERNAL -H ldapi:/// \
  -f /etc/ldap/schema/nis.ldif       # adds posixAccount, posixGroup

2. Configure the MDB database

# mdb-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=corp,dc=com
-
replace: olcRootDN
olcRootDN: cn=admin,dc=corp,dc=com
-
replace: olcRootPW
olcRootPW: {SSHA}hashed_password_here

Generate the hash: slappasswd -s yourpassword

ldapmodify -Y EXTERNAL -H ldapi:/// -f mdb-config.ldif

3. Add indexes

# indexes.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcDbIndex
olcDbIndex: uid eq,pres
olcDbIndex: cn eq,sub
olcDbIndex: sn eq,sub
olcDbIndex: mail eq
olcDbIndex: memberOf eq
olcDbIndex: entryCSN eq
olcDbIndex: entryUUID eq

The last two (entryCSN, entryUUID) are required for SyncRepl replication to work efficiently.

4. Load initial data

# base.ldif
dn: dc=corp,dc=com
objectClass: top
objectClass: dcObject
objectClass: organization
o: Corp
dc: corp

dn: ou=people,dc=corp,dc=com
objectClass: organizationalUnit
ou: people

dn: ou=groups,dc=corp,dc=com
objectClass: organizationalUnit
ou: groups

dn: uid=vamshi,ou=people,dc=corp,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
cn: Vamshi Krishna
sn: Krishna
uid: vamshi
uidNumber: 1001
gidNumber: 1001
homeDirectory: /home/vamshi
loginShell: /bin/bash
mail: [email protected]
userPassword: {SSHA}hashed_password_here
ldapadd -x -H ldap://localhost \
  -D "cn=admin,dc=corp,dc=com" \
  -w adminpassword \
  -f base.ldif

ACLs: Who Can Read What

OpenLDAP ACLs are evaluated top-to-bottom; first match wins.

# acls.ldif — set via OLC
dn: olcDatabase={1}mdb,cn=config
changetype: modify
replace: olcAccess
# Users can change their own passwords
olcAccess: to attrs=userPassword
  by self write
  by anonymous auth
  by * none
# Users can read their own entry
olcAccess: to dn.base="ou=people,dc=corp,dc=com"
  by self read
  by users read
  by * none
# Service accounts can read everything (for SSSD)
olcAccess: to *
  by dn="cn=svc-ldap,ou=services,dc=corp,dc=com" read
  by self read
  by * none

A service account (cn=svc-ldap) that SSSD uses to search the directory needs read access to ou=people and ou=groups. Never give SSSD admin (write) access.


SyncRepl Replication

SyncRepl is a pull-based replication protocol built on the LDAP Sync operation (RFC 4533). A consumer connects to a provider and requests changes. The provider sends them. The consumer stays in sync.

On the Provider: Enable the syncprov overlay

# syncprov.ldif
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
olcSpCheckpoint: 100 10     # checkpoint every 100 ops or 10 minutes
olcSpSessionLog: 100        # keep last 100 changes for delta-sync
ldapadd -Y EXTERNAL -H ldapi:/// -f syncprov.ldif

On the Consumer: Configure syncrepl

# consumer-config.ldif
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcSyncrepl
olcSyncrepl: rid=001
  provider=ldap://ldap1.corp.com:389
  bindmethod=simple
  binddn="cn=repl-svc,dc=corp,dc=com"
  credentials=replication-password
  searchbase="dc=corp,dc=com"
  scope=sub
  schemachecking=on
  type=refreshAndPersist    # persistent connection (vs refreshOnly = polling)
  retry="5 5 60 +"          # retry: 5 times every 5s, then every 60s forever
  interval=00:00:05:00      # (for refreshOnly) sync every 5 minutes
-
add: olcUpdateRef
olcUpdateRef: ldap://ldap1.corp.com   # redirect writes to provider

refreshAndPersist keeps a persistent connection open. Changes replicate within milliseconds. refreshOnly polls on an interval — simpler, but adds latency.

Verify Replication

# On provider: check the contextCSN (the sync state token)
ldapsearch -x -H ldap://ldap1.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# contextCSN: 20260427010000.000000Z#000000#000#000000

# On consumer: should match after sync
ldapsearch -x -H ldap://ldap2.corp.com \
  -D "cn=admin,dc=corp,dc=com" -w password \
  -b "dc=corp,dc=com" -s base contextCSN
# Same CSN = in sync

Multi-Provider: Accepting Writes on Both Nodes

Standard SyncRepl has one provider and one or more consumers — only the provider accepts writes. Multi-Provider (formerly Multi-Master) lets every node accept writes.

# On each node — add mirrormode to the database config
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcMirrorMode
olcMirrorMode: TRUE

With mirrormode enabled and each node configured as both provider and consumer of the other, writes on either node replicate to the other. Conflict resolution is CSN-based (Change Sequence Number) — a monotonically increasing timestamp. Last write wins at the attribute level.

Multi-Provider does not prevent split-brain conflicts — if two clients write the same attribute on two different nodes during a network partition, the higher CSN wins when the partition heals. For most directory use cases (user passwords, group memberships), this is acceptable. For others, it requires careful thought.


⚠ Production Gotchas

MDB data file grows monotonically. LMDB never shrinks the data file automatically. Deleted entries leave free space inside the file that gets reused, but the file on disk doesn’t shrink. Use slapcat to export and slapadd to reimport if you need to reclaim disk space.

slapcat is the only safe backup. slapcat reads the MDB database directly and exports LDIF — it does not go through slapd. Run it while slapd is running (LMDB is MVCC-safe for readers), but never copy the raw MDB files while slapd is running.

Schema changes on a replicated directory require coordination. Load the new schema on the provider first. SyncRepl will propagate it to consumers — but if a consumer gets a new entry using the new schema before the schema itself is replicated, the import will fail. Load schemas manually on all nodes before adding entries that use them.


Key Takeaways

  • OpenLDAP uses LMDB (MDB backend) — a memory-mapped, ACID-compliant storage engine with no external dependency
  • OLC (cn=config) is the right way to configure slapd — changes apply without restarts
  • SyncRepl pulls changes from a provider to a consumer — refreshAndPersist for near-real-time, refreshOnly for poll-based
  • Always index uid, cn, entryCSN, and entryUUID — unindexed searches are full scans
  • Multi-Provider allows writes on all nodes with CSN-based last-write-wins conflict resolution

What’s Next

A single OpenLDAP server works. Two nodes with SyncRepl work better. EP07 goes further: how you put multiple LDAP servers behind a load balancer, how connection pooling works, what to monitor, and how 389-DS handles directories with tens of millions of entries.

Next: LDAP High Availability: Load Balancing and Production Architecture

Get EP07 in your inbox when it publishes → linuxcent.com/subscribe

How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Reading Time: 7 minutes

The Identity Stack, Episode 5
EP01EP02EP03EP04: SSSDEP05EP06: OpenLDAP → …

Focus Keyphrase: how Kerberos works
Search Intent: Informational
Meta Description: How Kerberos works: the KDC, ticket-granting tickets, and the three-step flow that lets enterprises authenticate without sending passwords on the wire. (157 chars)


TL;DR

  • Kerberos is a network authentication protocol — it proves identity without sending passwords over the network, using time-limited cryptographic tickets
  • Three actors: the client, the KDC (Key Distribution Center), and the service — the KDC issues tickets; clients use tickets to authenticate to services
  • The ticket flow: AS-REQ (get a TGT) → TGS-REQ (exchange TGT for a service ticket) → AP-REQ (present service ticket to the target service)
  • A TGT (Ticket-Granting Ticket) is a session credential — it lets you request service tickets without re-entering your password for the lifetime of the ticket (default 10 hours)
  • LDAP + Kerberos together: LDAP stores identity (who you are), Kerberos authenticates it (proves you are who you say you are) — Active Directory is exactly this combination
  • kinit, klist, kdestroy are the hands-on tools — run them and read the ticket output

The Big Picture: Three Actors, Three Steps

         1. AS-REQ / AS-REP
Client ◄────────────────────► AS (Authentication Server)
  │                                     │
  │    (part of KDC)                    │
  │                                     ▼
  │         2. TGS-REQ / TGS-REP   TGS (Ticket-Granting Server)
  ├───────────────────────────────────►│
  │         (part of KDC)              │
  │                                    │
  │    3. AP-REQ / AP-REP              │
  └─────────────────────────────► Service (SSH, LDAP, NFS, HTTP...)

KDC = AS + TGS (usually the same process, same machine)

EP04 mentioned Kerberos tickets and clock skew requirements without explaining the protocol. This episode explains why Kerberos was invented, what a ticket actually is, and how the three-step flow works — so that when SSSD says “KDC unreachable” or kinit fails with “pre-authentication required,” you know exactly what’s happening.


The Problem Kerberos Was Built to Solve

MIT’s Project Athena started in 1983 — a campus-wide computing initiative giving students access to thousands of workstations. The problem: how do you authenticate a student at workstation 847 to a file server across campus without sending their password over the network?

In 1988, Steve Miller and Clifford Neuman published Kerberos version 4. The core insight: a trusted third party (the KDC) can issue cryptographic proof that a user has authenticated, and that proof can be presented to any service on the network without the service ever seeing the user’s password.

The password never leaves the client machine after the initial authentication. Every subsequent authentication — to a different service, to the same service again — uses a ticket. The KDC knows both the client and the service. The client and service only need to trust the KDC.


Keys, Tickets, and Sessions

Before the protocol, the primitives:

Long-term keys — derived from passwords. When you set a password in Kerberos, it’s hashed into a key stored in the KDC database (in the krbtgt account on AD, in /var/lib/krb5kdc/principal on MIT Kerberos). The client also derives this key from the password at authentication time. Neither ever sends the raw password.

Session keys — temporary symmetric keys created by the KDC for a specific session. They’re valid for the ticket’s lifetime. After the ticket expires, the session key is useless.

Tickets — encrypted blobs issued by the KDC. A ticket contains the session key, the client identity, the expiry time, and optional flags. It’s encrypted with the target service’s long-term key — only the service can decrypt it. The client carries the ticket but can’t read the contents.


The Three-Step Flow

Step 1: AS-REQ / AS-REP — Getting a TGT

Client                        KDC (AS component)
  │                                │
  │── AS-REQ ──────────────────────►
  │   {username, timestamp}         │
  │   (timestamp encrypted with     │
  │    client's long-term key)       │
  │                                 │
  │   KDC verifies: decrypts        │
  │   timestamp with stored key.    │
  │   If valid → issues TGT         │
  │                                 │
  ◄── AS-REP ──────────────────────│
      {session_key_enc_with_client, │
       TGT_enc_with_krbtgt_key}     │

The client decrypts the session key using its long-term key (derived from the password). The TGT is encrypted with the KDC’s own key (krbtgt) — the client can’t read it, but carries it.

This is the step that requires the password. After this, the TGT is what the client uses for everything else.

Step 2: TGS-REQ / TGS-REP — Getting a Service Ticket

Client                        KDC (TGS component)
  │                                │
  │── TGS-REQ ─────────────────────►
  │   {TGT, authenticator,         │
  │    target_service_name}        │
  │   (authenticator encrypted      │
  │    with TGT session key)        │
  │                                 │
  │   KDC: decrypts TGT,           │
  │   verifies authenticator,       │
  │   issues service ticket         │
  │                                 │
  ◄── TGS-REP ────────────────────│
      {service_session_key_enc,    │
       service_ticket_enc_with_    │
       service_long_term_key}      │

No password involved. The client proves its identity by presenting the TGT (which only the KDC can issue) and an authenticator (a timestamp encrypted with the TGT’s session key, proving the client holds the session key without revealing it).

Step 3: AP-REQ / AP-REP — Authenticating to the Service

Client                        Service (sshd, LDAP, NFS...)
  │                                │
  │── AP-REQ ──────────────────────►
  │   {service_ticket,             │
  │    authenticator_enc_with_      │
  │    service_session_key}        │
  │                                 │
  │   Service: decrypts ticket      │
  │   with its long-term key,       │
  │   verifies authenticator        │
  │                                 │
  ◄── AP-REP (optional) ───────────│
      {mutual authentication}       │

The service decrypts the ticket using its own key. It extracts the client identity and session key. It verifies the authenticator. No communication with the KDC required — the service trusts what the KDC signed.


Why Clock Skew Matters

Every Kerberos authenticator contains a timestamp. The service rejects authenticators older than 5 minutes (by default) — this prevents replay attacks where an attacker captures an authenticator and replays it later.

This is why clock skew over 5 minutes breaks Kerberos authentication entirely. If your machine’s clock drifts 6 minutes from the KDC, every authenticator you generate is rejected as too old or too far in the future. No tickets. No AD logins. No SSSD authentication.

# Check time sync status
timedatectl status
chronyc tracking        # if using chrony
ntpq -p                 # if using ntpd

# If clock is off: force a sync
chronyc makestep        # immediate step correction (chrony)

Hands-On: kinit, klist, kdestroy

# Get a TGT (will prompt for password)
kinit [email protected]

# Show current tickets
klist
# Credentials cache: FILE:/tmp/krb5cc_1001
# Principal: [email protected]
#
# Valid starting     Expires            Service principal
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#   renew until 05/04/26 01:00:00

# Show encryption types used (the -e flag)
klist -e
# 04/27/26 01:00:00  04/27/26 11:00:00  krbtgt/[email protected]
#         Etype: aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

# Get a service ticket for a specific service
kvno host/[email protected]
# host/[email protected]: kvno = 3

# Show all tickets including service tickets
klist -f
# Flags: F=forwardable, f=forwarded, P=proxiable, p=proxy, D=postdated,
#        d=postdated, R=renewable, I=initial, i=invalid, H=hardware auth

# Destroy all tickets
kdestroy

The Valid starting and Expires fields are the ticket lifetime. After expiry, you need to re-authenticate (or renew the ticket if it’s within the renew until window). The renew until date is when even renewal stops working.


/etc/krb5.conf

[libdefaults]
    default_realm = CORP.COM
    dns_lookup_realm = false
    dns_lookup_kdc = true         # find KDCs via DNS SRV records
    ticket_lifetime = 10h
    renew_lifetime = 7d
    forwardable = true            # tickets can be forwarded to remote hosts (needed for SSH forwarding)
    rdns = false

[realms]
    CORP.COM = {
        kdc = dc01.corp.com
        kdc = dc02.corp.com       # failover KDC
        admin_server = dc01.corp.com
    }

[domain_realm]
    .corp.com = CORP.COM
    corp.com = CORP.COM

With dns_lookup_kdc = true, Kerberos finds KDCs by querying DNS SRV records (_kerberos._tcp.corp.com). AD sets these up automatically. On MIT Kerberos, you add them manually. DNS-based discovery is the recommended approach for AD environments — it picks up new DCs automatically.


Kerberos + LDAP: Why Enterprises Run Both

LDAP and Kerberos solve different problems and are almost always deployed together:

LDAP answers:  "Who is vamshi? What groups is he in? What's his home directory?"
Kerberos answers: "Is this really vamshi? Prove it without sending a password."

Active Directory is exactly this combination — the directory is LDAP-based, the authentication is Kerberos. When a Linux machine joins an AD domain via realm join or adcli, it gets:
– LDAP access to the AD directory (for NSS: user and group lookups)
– A Kerberos principal registered in AD (for PAM: ticket-based authentication)
– A machine account (the machine’s identity in the directory)

When you SSH into an AD-joined Linux machine:
1. SSSD issues a Kerberos AS-REQ for the user’s TGT
2. SSSD uses the TGT to get a service ticket for the Linux machine’s PAM service
3. Authentication is verified via the service ticket — no LDAP Bind with a password
4. SSSD does an LDAP Search to get POSIX attributes (UID, GID, home dir)

Password-based LDAP Bind is the fallback when Kerberos isn’t available. Kerberos is the default on AD-joined systems — and it’s more secure because the password never leaves the client.


⚠ Common Misconceptions

“Kerberos sends your password to the KDC.” It doesn’t. The client derives a key from the password locally and uses that key to encrypt a timestamp (the pre-authentication data). The KDC verifies the timestamp using the stored key. The raw password never travels.

“Kerberos is an authorization protocol.” Kerberos authenticates — it proves who you are. Authorization (what you can do) is a separate decision, usually handled by ACLs on the service or directory group membership.

“Once you have a TGT, you’re authenticated to everything.” A TGT only proves your identity to the KDC. Each service requires a separate service ticket. The TGT is what lets you get those service tickets without re-entering your password.

“Kerberos requires AD.” MIT Kerberos 5 is a standalone implementation. FreeIPA (EP08) runs MIT Kerberos. Heimdal is another implementation. AD uses a Microsoft-extended version of Kerberos 5, but the core protocol is the same RFC.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management Kerberos is the de facto enterprise authentication protocol — SSO, delegation, and service account authentication all depend on it
CISSP Domain 4: Communications and Network Security Kerberos prevents credential sniffing and replay attacks — two of the core network authentication threat categories
CISSP Domain 3: Security Architecture and Engineering The KDC is a critical single point of trust — its availability, key management, and account (krbtgt) rotation are architectural security decisions

Key Takeaways

  • Kerberos is a ticket-based protocol — the password is used once to get a TGT; from then on, tickets prove identity without the password
  • The three-step flow: get a TGT from the AS, exchange it for a service ticket at the TGS, present the service ticket to the target service
  • Clock skew over 5 minutes breaks Kerberos — time synchronization is a hard dependency
  • LDAP stores identity; Kerberos authenticates it — Active Directory is exactly this combination, and so is FreeIPA
  • klist -e shows the encryption types in use — aes256-cts-hmac-sha1-96 is what you want to see; arcfour-hmac (RC4) is legacy and should be disabled

What’s Next

EP05 covered Kerberos as a protocol. EP06 goes hands-on: building a real LDAP directory with OpenLDAP, configuring replication, and understanding how the server-side components — slapd, the MDB backend, SyncRepl — fit together.

Next: OpenLDAP Setup and Replication: Running Your Own Directory

Get EP06 in your inbox when it publishes → linuxcent.com/subscribe

SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Reading Time: 7 minutes

The Identity Stack, Episode 4
EP01: What Is LDAPEP02: LDAP InternalsEP03: LDAP Auth on LinuxEP04EP05: Kerberos → …

Focus Keyphrase: SSSD Linux
Search Intent: Informational
Meta Description: SSSD powers every enterprise Linux login — but most engineers only interact with it when it breaks. Here’s the architecture, the config knobs that matter, and how to debug it. (185 chars — trim to: SSSD powers every enterprise Linux login. Here’s the architecture, the sssd.conf knobs that matter, and how to debug it when it breaks. (137 chars))
Meta Description (final): SSSD powers every enterprise Linux login. Here’s the architecture, the sssd.conf knobs that matter, and how to debug it when it breaks. (137 chars)


TL;DR

  • SSSD (System Security Services Daemon) is the caching and brokering layer between Linux and directory services — it handles LDAP, Kerberos, and AD so PAM and NSS don’t have to
  • Architecture: three tiers — responders (answer PAM/NSS queries), providers (talk to AD/LDAP/Kerberos), and a shared cache (LDB database on disk)
  • Credential caching means offline logins work — a user who authenticated yesterday can log in today even if the domain controller is unreachable
  • Key config: sssd.conf — the [domain] section is where almost all tuning happens
  • Debugging toolkit: sssctl, sss_cache, id, getent, journalctl -u sssd
  • The most common failure modes are: SSSD not running, stale cache, misconfigured ldap_search_base, and clock skew breaking Kerberos

The Big Picture: SSSD as the Identity Broker

PAM (pam_sss)         NSS (sss module)
      │                      │
      └──────────┬───────────┘
                 ▼
          SSSD Responders
          ┌────────────────────────────────────┐
          │  PAM responder   NSS responder      │
          │  (auth, account, (passwd, group,    │
          │   session)        shadow lookups)   │
          └────────────┬───────────────────────┘
                       │  shared cache (LDB)
                       ▼
          SSSD Providers
          ┌────────────────────────────────────┐
          │  identity provider  auth provider   │
          │  (user/group attrs) (credentials)   │
          └────────────┬───────────────────────┘
                       │
          ┌────────────┼────────────┐
          ▼            ▼            ▼
       LDAP          Kerberos    Local files
    (AD / OpenLDAP)  (KDC / AD)

EP03 showed that SSSD sits between PAM and LDAP. This episode goes inside it — the architecture, the config, and how to tell exactly what it’s doing on any given login attempt.


Why SSSD Exists

The problem before SSSD: nss_ldap and pam_ldap made direct LDAP connections for every query. No caching, no connection pooling, no failover, no offline support. On a system that makes dozens of getpwuid() calls per second (every ls -l, every process spawn), this meant dozens of LDAP roundtrips per second hitting the domain controller.

SSSD solved this with a single daemon that:
– Maintains a persistent connection pool to the directory
– Caches identity and credential data in an LDB (LDAP-like) database on disk
– Handles failover across multiple directory servers
– Satisfies PAM and NSS queries from cache when the directory is unreachable

The credential cache is the key insight. When you authenticate successfully, SSSD stores a hash of your credentials locally. If the domain controller is unreachable on your next login — network outage, laptop offline, VPN not connected — SSSD can verify your credentials against the local cache. You log in. You never knew the DC was down.


SSSD Architecture

SSSD is a set of cooperating processes sharing a cache:

Monitor — the parent process. Starts and restarts all other SSSD processes. If a responder or provider crashes, the monitor restarts it.

Responders — answer queries from PAM and NSS. Each responder handles a specific interface:
sssd_nss — answers getpwnam(), getpwuid(), getgrnam(), initgroups() calls
sssd_pam — handles PAM authentication, account checks, and session management
sssd_autofs, sssd_ssh, sssd_sudo — optional responders for specific services

Providers — the backend processes that talk to the actual directory:
– Each domain gets its own provider process (sssd_be[domain_name])
– The provider connects to LDAP/Kerberos/AD, fetches data, and writes it to the shared cache
– If the provider crashes or loses connectivity, responders fall back to serving from cache

Cache — LDB files in /var/lib/sss/db/. One database per configured domain, plus a cache for negative results (lookups that returned “not found”). The cache is an LDAP-like directory stored on disk — SSSD uses the same hierarchical structure for local storage as the remote directory uses.

# See the cache files
ls -la /var/lib/sss/db/
# cache_corp.com.ldb         ← user/group data for domain corp.com
# ccache_corp.com            ← Kerberos credential cache
# timestamps_corp.com.ldb   ← when entries were last refreshed

sssd.conf: The Config That Matters

/etc/sssd/sssd.conf has a [sssd] section (global) and one [domain/name] section per directory. The domain section is where almost all tuning happens.

[sssd]
services = nss, pam, sudo
domains = corp.com
config_file_version = 2

[domain/corp.com]
# What type of directory this is
id_provider = ad               # or: ldap, ipa, files
auth_provider = ad             # or: ldap, krb5, none
access_provider = ad           # controls who can log in

# The AD/LDAP server (can be a list for failover)
ad_domain = corp.com
ad_server = dc01.corp.com, dc02.corp.com

# Where to look for users and groups
ldap_search_base = dc=corp,dc=com

# Cache behavior
cache_credentials = true       # enable offline login
entry_cache_timeout = 5400     # how long before re-querying (seconds)
offline_credentials_expiration = 1  # days cached credentials stay valid offline

# What uid/gid range belongs to this domain (prevents UID conflicts)
ldap_id_mapping = true         # auto-map AD SIDs to UIDs (no uidNumber needed)
# OR for classical POSIX LDAP:
# ldap_id_mapping = false      # use uidNumber/gidNumber from directory

# Restrict logins to specific AD groups
# access_provider = simple
# simple_allow_groups = linux-admins, sre-team

# Home directory and shell defaults
override_homedir = /home/%u
default_shell = /bin/bash
fallback_homedir = /home/%u

# Enumerate all users (expensive on large dirs — disable unless needed)
enumerate = false

The two most commonly wrong settings:

ldap_search_base — if this doesn’t include the OU where your users live, SSSD won’t find them. On AD, the default searches the entire domain, which is usually correct. On OpenLDAP, you may need ou=people,dc=corp,dc=com.

ldap_id_mapping — on AD, users typically don’t have uidNumber attributes. Setting ldap_id_mapping = true tells SSSD to derive a UID from the user’s SID algorithmically. This produces consistent UIDs across machines. Setting it to false requires actual uidNumber attributes in the directory.


Credential Caching and Offline Logins

The cache is what separates SSSD from a simple proxy. When cache_credentials = true:

  1. On successful authentication, SSSD stores a hash of the credential in the LDB cache
  2. On the next authentication attempt, SSSD first tries the domain controller
  3. If the DC is unreachable, SSSD falls back to the local credential hash
  4. If the hash matches, login succeeds — even with no network

The credential hash is not the cleartext password — it’s a salted hash stored in /var/lib/sss/db/cache_corp.com.ldb. The security model is the same as /etc/shadow: someone with root access to the machine can access the hashes.

offline_credentials_expiration controls how long cached credentials stay valid when the DC is unreachable. 0 means forever (not recommended for high-security environments). 1 means one day — after 24 hours offline, even cached credentials expire and the user must authenticate online.


The Debugging Toolkit

# 1. Is SSSD running?
systemctl status sssd
pgrep -a sssd    # shows all SSSD processes (monitor + responders + providers)

# 2. Domain connectivity status
sssctl domain-status corp.com
# Domain: corp.com
# Active servers:
#   LDAP: dc01.corp.com
#   KDC: dc01.corp.com
# Discovered servers:
#   LDAP: dc01.corp.com, dc02.corp.com

# 3. Can SSSD find a specific user?
sssctl user-checks vamshi
# user: vamshi
# user name: [email protected]
# POSIX attributes: UID=1001, GID=1001, ...
# Authentication: success (uses actual PAM auth stack)

# 4. What does NSS see?
getent passwd vamshi          # full passwd entry
id vamshi                     # uid, gid, groups

# 5. Flush stale cache entries
sss_cache -u vamshi           # invalidate one user
sss_cache -G engineers        # invalidate one group
sss_cache -E                  # invalidate everything (nuclear option)

# 6. Live logs
journalctl -u sssd -f         # tail all SSSD logs
# Then attempt login in another terminal — watch the auth flow in real time

# 7. Increase log verbosity temporarily
sssctl config-check            # validate sssd.conf syntax
# Edit sssd.conf: add debug_level = 6 under [domain/corp.com]
systemctl restart sssd
journalctl -u sssd -f          # now shows LDAP queries, cache hits/misses

The single most useful command is sssctl user-checks <username>. It runs the full NSS + PAM auth stack internally and prints what SSSD would do on a real login — without creating a session or touching the running system.


Breaking SSSD (and What Each Failure Looks Like)

SSSD not running:

ssh vamshi@server
# Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
# getent passwd vamshi → (empty)
# Fix: systemctl start sssd

Stale cache after AD password change:

# User changed password in AD but SSSD still has old credential hash
ssh vamshi@server  # password accepted (wrong!) — cache hit with old hash
# Fix: sss_cache -u vamshi, then attempt login again

Clock skew > 5 minutes (breaks Kerberos):

journalctl -u sssd | grep -i "clock skew\|KDC\|kinit"
# sssd_be[corp.com]: Kerberos authentication failed: Clock skew too great
# Fix: systemctl restart chronyd (or ntpd), verify time sync

ldap_search_base wrong:

getent passwd vamshi  # empty, but user exists in AD
sssctl user-checks vamshi  # "User not found"
# Check: ldap_search_base must include the OU containing users
# Test: ldapsearch -x -H ldap://dc -b "ou=engineers,dc=corp,dc=com" "(uid=vamshi)"

⚠ Common Misconceptions

“Restarting SSSD logs everyone out.” Restarting SSSD doesn’t affect existing authenticated sessions. Active shell sessions, running processes — all unaffected. Only new authentication attempts are disrupted during the restart window, which takes a few seconds.

“sss_cache -E fixes everything.” Flushing the entire cache forces SSSD to re-fetch all entries from the domain controller on the next lookup. On a system with many users or enumeration enabled, this can cause a brief spike in LDAP traffic and slow lookups. Use targeted flushes (-u username, -G group) when possible.

“debug_level should always be high.” SSSD at debug_level = 9 logs every LDAP packet. On a production system with active logins, this generates gigabytes of logs quickly. Set it temporarily for debugging, then remove it and restart.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management SSSD is the runtime implementation of enterprise identity integration on Linux — understanding its caching model, failover behavior, and credential storage is foundational to IAM operations
CISSP Domain 3: Security Architecture and Engineering The credential cache design (/var/lib/sss/db/) creates a local credential store with specific security properties — architects need to understand the offline login trade-off
CISSP Domain 7: Security Operations SSSD is a critical security service — monitoring it, understanding its failure modes, and knowing how to recover it quickly are operational security skills

Key Takeaways

  • SSSD is a three-tier system: responders (serve PAM/NSS), providers (talk to AD/LDAP), and a shared LDB cache — each tier is independently restartable
  • Credential caching enables offline logins — the security trade-off is a local hash store in /var/lib/sss/db/
  • sssctl user-checks is the first tool to reach for when a login fails — it simulates the full auth flow and shows exactly where it breaks
  • ldap_id_mapping = true is the right choice for AD environments without POSIX attributes; false requires actual uidNumber/gidNumber in the directory
  • Clock skew over 5 minutes silently breaks Kerberos authentication — time sync is a hard dependency

What’s Next

EP04 showed SSSD’s role as the caching and brokering layer. What it referenced repeatedly — “Kerberos ticket”, “KDC”, “GSSAPI” — is the authentication protocol that sits underneath AD-joined Linux logins. SSSD uses Kerberos to authenticate. LDAP carries the identity data. EP05 explains how Kerberos works.

Next: How Kerberos Works: Tickets, KDC, and Why Enterprises Use It With LDAP

Get EP05 in your inbox when it publishes → linuxcent.com/subscribe

How LDAP Authentication Works on Linux: PAM, NSS, and the Login Stack

Reading Time: 9 minutes

The Identity Stack, Episode 3
EP01: What Is LDAPEP02: LDAP InternalsEP03EP04: SSSD → …

Focus Keyphrase: LDAP authentication Linux
Search Intent: Informational
Meta Description: Trace a Linux SSH login through PAM, NSS, and LDAP step by step — and understand why LDAP alone is not an authentication protocol. (144 chars)


TL;DR

  • LDAP is a directory protocol — it stores identity information and can verify a password via Bind, but authentication on Linux runs through PAM, not directly through LDAP
  • NSS (/etc/nsswitch.conf) answers “who is this user?” — it resolves UIDs, group memberships, and home directories by querying LDAP (or the local files, or SSSD)
  • PAM (/etc/pam.d/) answers “are they allowed in?” — it enforces authentication, account validity, session setup, and password policy
  • pam_ldap (the old way) opened a direct LDAP connection on every login — fragile, no caching, broken when the LDAP server was unreachable
  • pam_sss (the modern way) delegates to SSSD, which caches credentials and handles failover — SSSD is the layer between Linux and the directory
  • Tracing a single SSH login: sshd → PAM → pam_sss → SSSD → LDAP Bind + Search → session created

The Big Picture: One SSH Login, Four Layers

You type: ssh [email protected]

  sshd
    │
    ▼
  PAM  (/etc/pam.d/sshd)          ← "Is this user allowed in?"
    │
    ├── pam_sss    (auth)          ← sends credentials to SSSD
    ├── pam_sss    (account)       ← checks account not expired/locked
    ├── pam_sss    (session)       ← logs the session open/close
    └── pam_mkhomedir (session)    ← creates /home/vamshi if it doesn't exist
    │
    ▼
  SSSD  (/etc/sssd/sssd.conf)     ← "Let me check the directory"
    │
    ├── NSS responder              ← answers getent, id, getpwnam
    └── LDAP/Kerberos provider     ← talks to the actual directory
    │
    ▼
  LDAP Server (AD / OpenLDAP)
    │
    ├── Bind: uid=vamshi + password (or Kerberos ticket)
    └── Search: posixAccount attrs for uid=vamshi
    │
    ▼
  Linux session created
  UID=1001, GID=1001, HOME=/home/vamshi, SHELL=/bin/bash

EP02 showed what the directory contains and what travels on the wire. What it left open is how Linux uses that to grant a login — and why LDAP is not, by itself, an authentication protocol.


Why LDAP Is Not an Authentication Protocol

This is the confusion that trips people most. LDAP can verify a password — the Bind operation does exactly that. But authentication on Linux means something broader: checking credentials, checking account validity, enforcing password policy, setting up a session, creating a home directory. LDAP handles one piece of that. PAM handles the rest.

More precisely: LDAP doesn’t know what a Linux session is. It doesn’t know about /etc/pam.d/. It doesn’t enforce login hours, account expiry, or concurrent session limits. It returns directory entries and verifies binds. The intelligence about what to do with those results lives in the Linux authentication stack.

When you run ssh vamshi@server, the OS doesn’t open an LDAP connection and ask “can this user log in?” It calls PAM. PAM consults its configuration, and PAM decides whether to call LDAP (directly or via SSSD), whether to check the shadow file, whether to enforce MFA. LDAP is one possible backend. It’s not the gatekeeper.


NSS: The Traffic Controller

Before PAM runs, Linux needs to know if the user exists at all. That’s NSS’s job.

/etc/nsswitch.conf is a routing table for name resolution. It tells the OS where to look when something asks “who is UID 1001?” or “what groups is vamshi in?”:

# /etc/nsswitch.conf

passwd:     files sss        ← user lookups: check /etc/passwd first, then SSSD
group:      files sss        ← group lookups: check /etc/group first, then SSSD
shadow:     files sss        ← shadow password lookups
hosts:      files dns        ← hostname lookups (not identity-related)
netgroup:   sss              ← NIS netgroups from SSSD only
automount:  sss              ← autofs maps from SSSD

Every call to getpwnam(), getpwuid(), getgrnam(), getgrgid() in any process — including sshd — goes through NSS. The entries in nsswitch.conf control which backends are tried in order.

With passwd: files sss, a lookup for user vamshi:
1. Checks /etc/passwd — not found (vamshi is a domain user, not in local files)
2. Queries SSSD — SSSD checks its cache, or queries LDAP, and returns the posixAccount attributes

Without the sss entry in passwd:, domain users don’t exist on the system — getent passwd vamshi returns nothing, id vamshi fails, SSH login never gets to PAM’s authentication step.

# Verify NSS is routing to SSSD correctly
getent passwd vamshi
# vamshi:*:1001:1001:Vamshi K:/home/vamshi:/bin/bash

# If this returns nothing, NSS isn't reaching SSSD
# Check: systemctl status sssd && grep passwd /etc/nsswitch.conf

# See what groups the user is in (NSS group lookup)
id vamshi
# uid=1001(vamshi) gid=1001(engineers) groups=1001(engineers),1002(ops)

PAM: The Real Gatekeeper

PAM (Pluggable Authentication Modules) is the framework that lets Linux swap authentication backends without recompiling anything. Every service that needs to authenticate users — sshd, sudo, login, su, gdm — has a PAM configuration file in /etc/pam.d/.

Each PAM config defines four stacks:

auth        ← verify credentials (password, key, MFA)
account     ← check if the account is valid (not expired, not locked, login hours)
password    ← password change policy
session     ← set up/tear down the session (home dir, limits, logging)

A typical /etc/pam.d/sshd on a system joined to AD via SSSD:

# /etc/pam.d/sshd

# auth stack — verify the user's credentials
auth    required      pam_sepermit.so
auth    substack      password-auth   ← usually includes pam_sss.so

# account stack — check account validity
account required      pam_nologin.so
account include       password-auth

# password stack — handle password changes
password include      password-auth

# session stack — set up the session
session required      pam_selinux.so close
session required      pam_loginuid.so
session optional      pam_keyinit.so force revoke
session include       password-auth
session optional      pam_motd.so
session optional      pam_mkhomedir.so skel=/etc/skel/ umask=0077
session required      pam_selinux.so open

The include and substack directives pull in shared stacks from other files (like /etc/pam.d/password-auth). On a system with SSSD, password-auth contains:

auth    required      pam_env.so
auth    sufficient    pam_sss.so      ← try SSSD first
auth    required      pam_deny.so     ← if pam_sss fails, deny

account required      pam_unix.so
account sufficient    pam_localuser.so
account sufficient    pam_sss.so      ← SSSD account check
account required      pam_permit.so

session optional      pam_sss.so      ← SSSD session tracking

The sufficient flag means: if this module succeeds, stop checking this stack and consider it passed. required means: this must pass (but continue checking other modules and report failure at the end). requisite means: if this fails, stop immediately.


PAM Control Flags at a Glance

required   — must succeed; failure reported after remaining modules run
requisite  — must succeed; failure reported immediately, stack stops
sufficient — if success, stop stack (ignore remaining); failure continues
optional   — result ignored unless it's the only module in the stack

This matters for debugging. If pam_sss.so is sufficient and SSSD is down, PAM falls through to pam_deny.so — login denied. If it were optional, the login would proceed to the next module. The control flag is the policy decision.


The Old Way: pam_ldap

Before SSSD, Linux systems used pam_ldap and nss_ldap directly:

# Old /etc/pam.d/common-auth (Ubuntu pre-SSSD era)
auth    sufficient    pam_ldap.so    ← direct LDAP connection per login
auth    required      pam_unix.so nullok_secure

# Old /etc/nsswitch.conf
passwd: files ldap    ← nss_ldap for user lookups
group:  files ldap

pam_ldap opened a fresh LDAP connection on every login attempt. No caching. If the LDAP server was unreachable for 3 seconds, the login hung for 3 seconds — sometimes much longer. If the LDAP server was down, all domain logins failed immediately. Previously logged-in users with active sessions were fine; new logins simply didn’t work.

nss_ldap had the same problem for NSS lookups: every getpwnam() call hit the LDAP server directly. On a busy system with many processes doing user lookups, this meant hundreds of LDAP queries per second, no connection reuse, and no way to survive a brief network blip.

The problems were structural:
– No credential caching — offline logins impossible
– No connection pooling — LDAP server saw one connection per login attempt
– No failover logic — one LDAP server down meant all logins down
– Slow timeouts that blocked login sessions

SSSD was built to fix all of this.


The Modern Way: pam_sss + SSSD

pam_sss doesn’t talk to LDAP directly. It’s a thin client that passes authentication requests to SSSD over a Unix domain socket. SSSD manages the LDAP connection, the credential cache, and the failover logic.

sshd  →  PAM (pam_sss)  →  SSSD (Unix socket)  →  LDAP server
                                   │
                                   └── credential cache
                                       (survives brief LDAP outages)

When pam_sss sends a credential to SSSD:
1. SSSD checks its in-memory cache — if the credential hash matches a recent successful auth, it can satisfy the request without hitting LDAP
2. If not cached (or cache expired), SSSD sends a Bind to the LDAP server
3. On success, SSSD caches the result and returns success to pam_sss
4. pam_sss returns PAM_SUCCESS, and the auth stack continues

The credential cache is what enables offline logins. If the LDAP server is unreachable and a user has authenticated successfully within the cache TTL (default: 1 day for credentials, configurable via cache_credentials = True in sssd.conf), SSSD satisfies the auth from cache and the login succeeds. The user never knows the LDAP server was down.


Tracing a Full SSH Login

Here’s every step of an SSH login for a domain user, in order:

1.  sshd accepts the TCP connection
2.  sshd calls PAM: pam_start("sshd", "vamshi", ...)

3.  PAM auth stack runs pam_sss:
      pam_sss sends credentials to SSSD via /var/lib/sss/pipes/pam

4.  SSSD auth provider:
      a. Check credential cache — miss (first login)
      b. Resolve user: NSS lookup for uid=vamshi
         → SSSD LDAP provider searches dc=corp,dc=com for (uid=vamshi)
         → Returns: uidNumber=1001, gidNumber=1001, homeDirectory=/home/vamshi
      c. Authenticate: LDAP Simple Bind as uid=vamshi,ou=engineers,dc=corp,dc=com
         → Server returns: success
      d. Cache the credential hash + POSIX attrs

5.  SSSD returns PAM_SUCCESS to pam_sss

6.  PAM account stack runs pam_sss:
      SSSD checks: account not expired, not locked, login permitted
      → PAM_ACCT_MGMT success

7.  PAM session stack:
      pam_loginuid sets /proc/self/loginuid = 1001
      pam_mkhomedir creates /home/vamshi if missing
      pam_sss opens session (records in SSSD session tracking)

8.  sshd creates the shell, sets environment:
      USER=vamshi, HOME=/home/vamshi, SHELL=/bin/bash, LOGNAME=vamshi

9.  Shell prompt appears

Steps 4b and 4c are the only two LDAP operations in the entire login flow: one Search to resolve the user’s attributes, one Bind to verify the password. Everything else is PAM and SSSD.


Debugging the Stack

When a login fails, the failure could be in any layer. Work top-down:

# 1. Does NSS resolve the user at all?
getent passwd vamshi
# If empty: NSS isn't reaching SSSD, or SSSD isn't finding the user in LDAP

# 2. Is SSSD running and healthy?
systemctl status sssd
sssctl domain-status corp.com      # shows SSSD's view of domain connectivity

# 3. What does SSSD think about the user?
sssctl user-checks vamshi          # runs auth + account checks internally
id vamshi                          # forces NSS resolution and shows group memberships

# 4. What does SSSD's log say?
journalctl -u sssd -f              # tail SSSD logs live, then attempt login

# 5. Can you reach the LDAP server at all?
ldapsearch -x -H ldap://dc.corp.com \
  -D "cn=svc-ldap,ou=services,dc=corp,dc=com" \
  -w "password" \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)" dn

# 6. Force a cache flush if entries are stale
sss_cache -u vamshi                # invalidate this user's cache entry
sss_cache -G engineers             # invalidate a group

The sssctl user-checks command is the single most useful diagnostic — it simulates the full PAM auth + account check flow without actually creating a session, and prints exactly what SSSD would do on a real login attempt.


⚠ Common Misconceptions

“If ldapsearch works, SSH login should work.” Not necessarily. ldapsearch tests the LDAP layer. An SSH login requires NSS to resolve the user, PAM to authenticate, SSSD to be running and configured correctly, and pam_mkhomedir to create the home directory if it’s the first login. Any of these can fail independently.

“pam_ldap and pam_sss do the same thing.” They have the same job (authenticate via LDAP) but completely different architectures. pam_ldap is a direct-connect, no-cache module. pam_sss is a client of SSSD, which provides caching, connection pooling, failover, and offline support. On any modern system, you want pam_sss.

“nsswitch.conf order doesn’t matter much.” It matters exactly as much as the order suggests. passwd: files sss means local /etc/passwd is always checked first — if a domain username collides with a local user, the local account wins. This is the intended behavior (local accounts should always be reachable), but it means you’ll never override a local account with a directory entry.

“SSSD cache = security risk.” The cache stores a credential hash, not the cleartext password. An attacker with access to the SSSD cache database (/var/lib/sss/db/) would see hashed credentials — the same situation as /etc/shadow. The real concern is whether offline authentication is appropriate for your security posture; it can be disabled with offline_credentials_expiration = 0.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management PAM is the enforcement layer for authentication policy on Linux — understanding its stack is foundational to any Linux IAM deployment
CISSP Domain 3: Security Architecture and Engineering The separation between NSS (resolution) and PAM (authentication) is an architectural boundary — misunderstanding it leads to misconfigured systems where account checks are bypassed
CISSP Domain 4: Communications and Network Security pam_ldap vs pam_sss affects whether credentials travel over a direct LDAP connection (one socket per login, no TLS guarantee) or through SSSD’s managed, pooled connection

Key Takeaways

  • LDAP alone is not an authentication protocol for Linux — authentication flows through PAM, and LDAP is one of PAM’s possible backends
  • NSS (/etc/nsswitch.conf) resolves user identity (who is UID 1001?); PAM enforces it (are they allowed in?)
  • pam_ldap talks to LDAP directly — no cache, no failover, login blocked when LDAP is unreachable
  • pam_sss delegates to SSSD — credential caching, connection pooling, offline login, and failover are all built in
  • A full SSH login touches LDAP exactly twice: one Search for POSIX attributes, one Bind to verify the password
  • When login fails, debug top-down: NSS resolution → SSSD status → LDAP reachability → PAM config

What’s Next

EP03 showed how authentication reaches LDAP — through PAM, through SSSD, through a Bind. What it assumed is that SSSD is healthy and the LDAP server is reachable. The moment either goes wrong, the behavior depends entirely on how SSSD is configured — its cache TTLs, its failover order, its offline credential policy.

EP04 goes inside SSSD: the architecture, the sssd.conf knobs that matter, how to read the logs, and how to break it intentionally and fix it.

Next: SSSD: The Caching Daemon That Powers Every Enterprise Linux Login

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

One Blueprint, Six Clouds — Multi-Provider OS Image Builds

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**

Focus Keyphrase: multi-cloud OS hardening
Search Intent: Informational
Meta Description: Maintain one OS hardening baseline across AWS, GCP, and Azure without separate scripts that drift. One HardeningBlueprint YAML, six providers, zero duplication. (155 chars)


TL;DR

  • Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
  • A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
  • The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
  • Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
  • One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
  • Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.


We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.


How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.


The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.


What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider OS disk Ephemeral Data
AWS /dev/xvda /dev/xvdb /dev/xvdc+
GCP /dev/sda /dev/sdb+
Azure /dev/sda /dev/sdb (temp disk) /dev/sdc+
DigitalOcean /dev/vda /dev/vdb+

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.


Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.


Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.


Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.


Key Takeaways

  • Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
  • The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
  • Parallel multi-provider builds produce images with identical compliance grades on the same schedule
  • Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
  • Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

LDAP Internals: The Directory Tree, Schema, and What Travels on the Wire

Reading Time: 12 minutes

The Identity Stack, Episode 2
EP01: What Is LDAPEP02EP03: LDAP Authentication on Linux → …

Focus Keyphrase: LDAP internals
Search Intent: Informational
Meta Description: Understand LDAP internals: the Directory Information Tree, DN syntax, object classes, schema, and the BER bytes that travel when you run ldapsearch. (150 chars)


TL;DR

  • The Directory Information Tree (DIT) is the hierarchical database LDAP stores — every entry lives at a unique path described by its Distinguished Name (DN)
  • Object classes define what attributes an entry is allowed or required to have — posixAccount adds UID, GID, and home directory; inetOrgPerson adds email and display name
  • Schema is the rulebook: which attribute types exist across the entire directory, what syntax each follows, and which object classes require or permit them
  • An LDAP Search sends four things: a base DN, a scope (base/one/sub), a filter like (uid=vamshi), and a list of attributes to return — the server traverses the tree and returns LDIF
  • Every LDAP message on the wire is BER-encoded (Basic Encoding Rules, a subset of ASN.1) — a compact binary format, not text
  • ldapsearch output is LDIF (LDAP Data Interchange Format) — the human-readable representation of what the BER payload carried

The Big Picture: From ldapsearch to Directory Entry

ldapsearch -x -H ldap://dc.corp.com -b "dc=corp,dc=com" "(uid=vamshi)" cn mail uidNumber
     │
     │  TCP port 389 (or 636 for LDAPS)
     │  BER-encoded SearchRequest
     ▼
┌─────────────────────────────────────────────────┐
│  LDAP Server (AD / OpenLDAP / 389-DS / FreeIPA)  │
│                                                   │
│  Directory Information Tree                       │
│                                                   │
│  dc=corp,dc=com                    ← search base  │
│    └── ou=engineers                ← scope: sub   │
│          ├── uid=alice                            │
│          └── uid=vamshi  ← filter match           │
│                cn: vamshi                         │
│                mail: [email protected]              │
│                uidNumber: 1001                    │
└─────────────────────────────────────────────────┘
     │
     │  BER-encoded SearchResultEntry
     ▼
# LDIF output on your terminal
dn: uid=vamshi,ou=engineers,dc=corp,dc=com
cn: vamshi
mail: [email protected]
uidNumber: 1001

LDAP internals are the mechanics between the command you type and the directory entry you get back. EP01 explained why LDAP was invented. This episode explains what it actually does when you run it.


The Directory Information Tree

EP01 introduced the DIT as a concept inherited from X.500. Here’s what it actually looks like inside a directory.

Every LDAP directory has a root — the base DN — from which all entries descend. For a company called Corp with a domain corp.com, the base is typically dc=corp,dc=com. Below that, the tree branches into organizational units, and below those, individual entries for people, groups, services, and anything else the directory administrator decided to model.

dc=corp,dc=com                          ← domain root (base DN)
│
├── ou=people                           ← organizational unit: people
│     ├── uid=alice                     ← user entry
│     ├── uid=vamshi
│     └── uid=bob
│
├── ou=groups                           ← organizational unit: groups
│     ├── cn=engineers
│     └── cn=ops
│
├── ou=services                         ← organizational unit: service accounts
│     ├── cn=jenkins
│     └── cn=gitlab-runner
│
└── ou=hosts                            ← organizational unit: machines
      ├── cn=web01.corp.com
      └── cn=db01.corp.com

This hierarchy is not a file system and not a relational database. It is specifically optimized for reads — the query “give me everything about this user” is the operation the protocol is built around. Writes are infrequent. Reads are constant.

Every entry in the tree has exactly one parent. There are no cross-links between branches, no foreign keys. The tree is the structure. An entry’s position in the tree is what defines it.


Distinguished Names: Reading the Path

The Distinguished Name (DN) is how you address any entry in the directory. It reads right-to-left, from the leaf to the root, with each component separated by a comma.

uid=vamshi,ou=engineers,dc=corp,dc=com

Reading right-to-left:
  dc=corp,dc=com       ← domain: corp.com
  ou=engineers         ← organizational unit: engineers
  uid=vamshi           ← this specific entry: user "vamshi"

Each component of a DN — uid=vamshi, ou=engineers, dc=corp — is a Relative Distinguished Name (RDN). The RDN is the attribute-value pair that uniquely identifies the entry within its parent container. Two users in the same ou=engineers cannot both have uid=vamshi — that would create two entries with identical DNs, which the directory won’t allow.

Common RDN attribute types and what they mean:

Attribute Stands for Typical use
dc Domain Component Domain name segments (dc=corp,dc=com = corp.com)
ou Organizational Unit Container for grouping entries
cn Common Name Groups, service accounts, human-readable name
uid User ID Linux username — the standard RDN for user entries
o Organization Top-level org containers (less common in modern setups)

When your Linux system calls getent passwd vamshi, SSSD translates that into an LDAP Search for an entry where uid=vamshi somewhere under the configured base DN. The full DN comes back with the result, but what your system cares about are the attributes inside it.


Object Classes and Schema

Every entry in the directory has a objectClass attribute — usually several values. Object classes define what attributes the entry is allowed or required to have.

# A typical user entry's object classes
dn: uid=vamshi,ou=engineers,dc=corp,dc=com
objectClass: top
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount

Each object class contributes a set of attributes — some required (MUST), some optional (MAY):

objectClass: posixAccount
  MUST: cn, uid, uidNumber, gidNumber, homeDirectory
  MAY:  userPassword, loginShell, gecos, description

objectClass: inetOrgPerson
  MUST: sn (surname), cn
  MAY:  mail, telephoneNumber, displayName, jpegPhoto, ...

objectClass: shadowAccount
  MUST: uid
  MAY:  shadowLastChange, shadowMin, shadowMax, shadowWarning, ...

When Linux authenticates a user via LDAP, it needs the posixAccount attributes: uidNumber (the numeric UID), gidNumber, homeDirectory, and loginShell. Without posixAccount, the user entry exists in the directory but can’t be used for Linux logins — getent passwd will return nothing.

Object classes are grouped into three kinds:

Groups in LDAP use their own object class:

objectClass: groupOfNames
  MUST: cn, member
  MAY:  description, owner, ...

# A group entry looks like this:
dn: cn=engineers,ou=groups,dc=corp,dc=com
objectClass: groupOfNames
cn: engineers
member: uid=vamshi,ou=engineers,dc=corp,dc=com
member: uid=alice,ou=engineers,dc=corp,dc=com

groupOfNames stores members as full DNs — which is why the SSSD group search filter is (member=uid=vamshi,ou=...) rather than (member=vamshi). The directory stores the exact path to each member entry. posixGroup is the alternative, which stores the memberUid as a bare username string instead of a DN — Active Directory uses groupOfNames; pure POSIX environments often use posixGroup.

Object classes are grouped into three kinds:

Structural — defines what the entry fundamentally is. Every entry must have exactly one structural class. posixAccount is structural.

Auxiliary — adds additional attributes to an existing entry. shadowAccount and inetOrgPerson can be auxiliary. You can stack multiple auxiliary classes on a single entry.

Abstract — base classes that other classes inherit from. top is the root abstract class that every entry implicitly has. You never add top to an entry; it’s always there.

Schema: The Directory’s Type System

Schema is the global rulebook for the entire directory. It defines:

  • Attribute type definitions — what each attribute is named, what syntax it uses (a string? an integer? a binary blob?), whether it’s case-sensitive, whether multiple values are allowed
  • Object class definitions — which attributes each class requires or permits
  • Matching rules — how equality comparisons work for each attribute type

The schema is stored in the directory itself, under a special entry at cn=schema,cn=config (OpenLDAP) or cn=Schema,cn=Configuration (Active Directory). You can query it:

# View the schema for the posixAccount object class
ldapsearch -x -H ldap://your-dc \
  -b "cn=schema,cn=config" \
  "(objectClass=olcObjectClasses)" \
  olcObjectClasses | grep -A 10 "posixAccount"

# Output:
# olcObjectClasses: ( 1.3.6.1.1.1.2.0
#   NAME 'posixAccount'
#   DESC 'Abstraction of an account with POSIX attributes'
#   SUP top
#   AUXILIARY
#   MUST ( cn $ uid $ uidNumber $ gidNumber $ homeDirectory )
#   MAY ( userPassword $ loginShell $ gecos $ description ) )

That OID (1.3.6.1.1.1.2.0) is the globally unique identifier for the posixAccount object class. Every object class and attribute type in every LDAP directory on the planet has a unique OID assigned by an authority. This is how schema interoperability works across different directory implementations — OpenLDAP, Active Directory, and 389-DS can all understand each other’s posixAccount entries because they share the same OID.


LDAP Operations: What Actually Runs

LDAP defines eight operations. Day-to-day authentication uses two: Bind and Search.

LDAP Operation Set
──────────────────
Bind        ← authenticate (prove identity)
Search      ← query the directory
Add         ← create a new entry
Modify      ← change attributes on an existing entry
Delete      ← remove an entry
ModifyDN    ← rename or move an entry
Compare     ← test if an attribute has a specific value
Abandon     ← cancel an outstanding operation

Bind: Proving Who You Are

Before any authenticated operation, the client sends a Bind request. There are two types:

Simple Bind — the client sends its DN and password in the clear (or over TLS). This is what -x in ldapsearch means: simple authentication.

# Simple bind as a service account
ldapsearch -x \
  -D "cn=svc-ldap-reader,ou=services,dc=corp,dc=com" \
  -w "service-account-password" \
  -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)"

SASL Bind — the client uses an authentication mechanism registered with SASL (Simple Authentication and Security Layer). Kerberos (via the GSSAPI mechanism) is the most common. EP05 covers Kerberos in detail.

# SASL bind using Kerberos (after kinit)
ldapsearch -Y GSSAPI \
  -H ldap://dc.corp.com \
  -b "dc=corp,dc=com" \
  "(uid=vamshi)"

An anonymous Bind (no DN, no password) is also valid for directories configured to allow anonymous reads. Many public LDAP directories (and some internal ones, misconfigured) allow this.

Search: The Core Operation

A Search request has five required parameters:

baseObject   — where in the DIT to start (e.g., "dc=corp,dc=com")
scope        — how deep to look
               base    = only the base entry itself
               one     = one level below base (immediate children)
               sub     = entire subtree below base (most common)
derefAliases — how to handle alias entries (usually derefAlways)
filter       — what to match (e.g., "(uid=vamshi)")
attributes   — which attributes to return (empty = return all)

When SSSD authenticates a user login, it runs exactly two Search operations:

Search 1 — find the user's entry
  base:       dc=corp,dc=com
  scope:      sub
  filter:     (uid=vamshi)
  attributes: dn, uid, uidNumber, gidNumber, homeDirectory, loginShell

Search 2 — find the user's group memberships
  base:       dc=corp,dc=com
  scope:      sub
  filter:     (member=uid=vamshi,ou=engineers,dc=corp,dc=com)
  attributes: dn, cn, gidNumber

The first search locates the user entry and retrieves the POSIX attributes. The second finds all group entries that contain the user’s DN as a member. These two queries are the complete basis for a Linux login over LDAP.

Search Filters

LDAP filters follow a prefix (Polish notation) syntax. Every filter is wrapped in parentheses:

# Simple equality
(uid=vamshi)

# Presence — entry has this attribute at all
(mail=*)

# Substring match
(cn=vam*)

# Comparison
(uidNumber>=1000)

# Logical AND — both conditions must match
(&(objectClass=posixAccount)(uid=vamshi))

# Logical OR — either condition matches
(|(uid=vamshi)([email protected]))

# Logical NOT
(!(uid=guest))

# Combined — posixAccount entries with UID >= 1000 and no disabled flag
(&(objectClass=posixAccount)(uidNumber>=1000)(!(pwdAccountLockedTime=*)))

The & and | operators take any number of operands. Filter syntax looks strange the first time but is unambiguous and compact — which matters when you’re encoding it into BER for the wire.


What Actually Travels on the Wire

Every LDAP message is encoded in BER (Basic Encoding Rules), a binary subset of ASN.1. LDAP is not a text protocol.

When you run ldapsearch, the tool constructs a BER-encoded SearchRequest message and sends it over TCP. The server responds with one or more SearchResultEntry messages (one per matching entry), followed by a SearchResultDone. All of these are BER.

BER uses a type-length-value (TLV) encoding:

Tag byte(s)    — what type of data this is
Length byte(s) — how many bytes of data follow
Value byte(s)  — the actual data

A minimal LDAP SearchRequest for ldapsearch -x -b "dc=corp,dc=com" "(uid=vamshi)" uid looks like this on the wire:

30 45          ← SEQUENCE (LDAPMessage)
  02 01 01     ← INTEGER 1 (messageID = 1)
  63 40        ← [APPLICATION 3] SearchRequest
    04 11       ← OCTET STRING: baseObject
      64 63 3d  ← "dc=corp,dc=com" (20 bytes)
      63 6f 72
      70 2c 64
      63 3d 63
      6f 6d
    0a 01 02   ← ENUMERATED: scope = wholeSubtree (2)
    0a 01 03   ← ENUMERATED: derefAliases = derefAlways (3)
    02 01 00   ← INTEGER: sizeLimit = 0 (unlimited)
    02 01 00   ← INTEGER: timeLimit = 0 (unlimited)
    01 01 00   ← BOOLEAN: typesOnly = false
    a7 0f      ← [7] equalityMatch filter
      04 03 75 69 64   ← attributeDesc: "uid"
      04 06 76 61 6d   ← assertionValue: "vamshi"
             73 68 69
    30 05      ← SEQUENCE: AttributeDescriptionList
      04 03 75 69 64   ← "uid"

You don’t need to read BER by hand in practice. But knowing it’s binary — not HTTP, not JSON, not plain text — explains some things:

  • Why tcpdump port 389 shows binary output you can’t read directly
  • Why LDAP on port 389 looks different in Wireshark than HTTP traffic
  • Why ldapsearch output (LDIF) is a transformation of the wire data, not the wire data itself

To see the wire protocol in action:

# Run ldapsearch with debug output (level 1 = protocol tracing)
ldapsearch -d 1 -x \
  -H ldap://ldap.forumsys.com \
  -b "dc=example,dc=com" \
  -D "cn=read-only-admin,dc=example,dc=com" \
  -w readonly \
  "(uid=tesla)" cn

# You'll see output like:
# ldap_connect_to_host: TCP ldap.forumsys.com:389
# ldap_new_connection 1 1 0
# ldap_connect_to_host: Trying ldap.forumsys.com:389
# ldap_pvt_connect: fd: 5 tm: -1 async: 0
# TLS: can't connect.
# ldap_open_defconn: successful
# ber_scanf fmt ({it) ber:     ← BER decoding of the response
# ber_scanf fmt ({) ber:
# ber_scanf fmt (W) ber:
# ...

The ber_scanf lines are the BER decoder working through the server’s response. Each line represents one TLV element being read off the wire.


Reading ldapsearch Output: Every Field

ldapsearch output is LDIF (LDAP Data Interchange Format), defined in RFC 2849. It’s the standard text serialization of LDAP entries.

ldapsearch -x \
  -H ldap://ldap.forumsys.com \
  -b "dc=example,dc=com" \
  -D "cn=read-only-admin,dc=example,dc=com" \
  -w readonly \
  "(uid=tesla)" \
  cn mail uid uidNumber objectClass

Output, annotated:

# extended LDIF
#
# LDAPv3                              ← protocol version confirmed
# base <dc=example,dc=com> with scope subtree
# filter: (uid=tesla)                 ← your search filter echoed back
# requesting: cn mail uid uidNumber objectClass
#

# tesla, example.com                  ← comment: CN, base DN
dn: uid=tesla,dc=example,dc=com      ← Distinguished Name — full path in the tree

objectClass: inetOrgPerson           ← structural class: person with org attrs
objectClass: organizationalPerson    ← auxiliary: adds telephoneNumber etc.
objectClass: person                  ← auxiliary: adds sn (surname)
objectClass: top                     ← every entry has this implicitly
cn: Tesla                            ← common name (from inetOrgPerson MUST)
mail: [email protected]        ← email (from inetOrgPerson MAY)
uid: tesla                           ← userid (from inetOrgPerson MAY)

# search result
search: 2                            ← messageID of the SearchResultDone
result: 0 Success                    ← 0 = no error; 32 = no such object; 49 = invalid credentials

# numResponses: 2                    ← 1 result entry + 1 SearchResultDone
# numEntries: 1

The result: line is the one to watch when debugging. LDAP result codes:

Code Meaning What it tells you
0 Success Query ran, results returned (or no results found — check numEntries)
32 No Such Object Base DN doesn’t exist in this directory
49 Invalid Credentials Bind failed — wrong DN, wrong password, or account locked
50 Insufficient Access Your bind DN doesn’t have read permission on these entries
53 Unwilling to Perform Server refused the operation (e.g., password policy, anonymous bind disabled)
65 Object Class Violation Add/Modify would violate schema (missing MUST attribute, unrecognized object class)

Ports: 389, 636, and 3268

Port 389   — LDAP (plaintext, or StartTLS in-session upgrade)
Port 636   — LDAPS (LDAP wrapped in TLS from the start)
Port 3268  — Active Directory Global Catalog (plain)
Port 3269  — Active Directory Global Catalog over TLS

Port 389 vs 636: Both carry the same BER-encoded LDAP protocol. The difference is when TLS starts. On 636 (LDAPS), the TLS handshake happens before the first LDAP message. On 389 with StartTLS, the client sends a plaintext ExtendedRequest with OID 1.3.6.1.4.1.1466.20037 to initiate the TLS upgrade, then both sides continue over TLS. In production, use one or the other — never unencrypted port 389. Your credentials transit the wire on every Bind.

Ports 3268/3269 — Active Directory Global Catalog: AD organizes domains into forests. Each domain controller holds the full LDAP tree for its own domain. The Global Catalog is a read-only, partial replica of every domain in the forest — just the most-queried attributes from every object. When an application needs to find a user across domains in the same forest (not just in one domain), it queries the Global Catalog on 3268/3269 instead of a domain-specific DC on 389/636.

Forest: corp.com
  ├── Domain: corp.com       → DC at port 389/636   (full copy of corp.com)
  ├── Domain: emea.corp.com  → DC at port 389/636   (full copy of emea.corp.com)
  └── Global Catalog        → GC at port 3268/3269  (partial copy of ALL domains)

If your SSSD or application is configured to use port 3268 instead of 389, it’s talking to the Global Catalog — useful for forest-wide user lookups, but missing some less-common attributes that aren’t replicated to the GC.


Try It: ldapsearch Against Your Own Directory

If your Linux machine is joined to AD or connected to an LDAP directory, you can run these right now:

# 1. Confirm your SSSD knows where the LDAP server is
grep -E "ldap_uri|ad_domain|krb5_server" /etc/sssd/sssd.conf

# 2. Look up your own user entry
ldapsearch -x \
  -H ldap://$(grep ldap_uri /etc/sssd/sssd.conf | awk -F= '{print $2}' | tr -d ' ') \
  -b "dc=$(hostname -d | sed 's/\./,dc=/g')" \
  "(uid=$(whoami))" \
  dn objectClass uid uidNumber gidNumber homeDirectory loginShell

# 3. Find the groups you're in
ldapsearch -x \
  -H ldap://your-dc \
  -b "dc=corp,dc=com" \
  "(member=$(ldapsearch -x ... "(uid=$(whoami))" dn | grep ^dn | cut -d' ' -f2-))" \
  cn gidNumber

# 4. Check what object classes your entry has
ldapsearch -x \
  -H ldap://your-dc \
  -b "dc=corp,dc=com" \
  "(uid=$(whoami))" \
  objectClass

On a machine joined to Active Directory, the ldap_uri in sssd.conf is your domain controller’s address. On FreeIPA or OpenLDAP, it’s the directory server. The same ldapsearch commands work against all of them — because they all speak LDAP v3.


⚠ Common Misconceptions

“The DN is like a file path.” The analogy holds for reading it, but the DIT is not a file system. Entries don’t inherit permissions from parent containers the way files inherit from directories. Access control in LDAP is defined by ACLs on the server — not by position in the tree.

“LDAP is case-sensitive.” It depends on the attribute. Most string attributes (like cn and mail) use case-insensitive matching by default — (cn=Vamshi) and (cn=vamshi) return the same results. But some attributes (like userPassword and most binary types) are case-sensitive. The schema’s matching rules define this per-attribute.

“You need the full DN to search for a user.” No. The Search operation with a sub scope searches the entire subtree below the base DN. You search with a filter like (uid=vamshi) without knowing the full DN. The DN comes back in the result.

“LDAP accounts and Linux accounts are the same thing.” An LDAP user entry becomes a Linux account only if the entry has a posixAccount object class with the required POSIX attributes (uidNumber, gidNumber, homeDirectory). An LDAP entry without posixAccount can exist in the directory but getent passwd will not return it.

“The objectClass attribute can be changed freely.” Structural object classes cannot be changed after an entry is created — you’d have to delete and recreate the entry. Auxiliary classes can be added or removed. This is why correctly choosing the structural class at entry creation time matters.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management DIT structure, DN addressing, object classes, and schema are the data model underpinning every enterprise identity store — understanding them is foundational to managing directory-based IAM
CISSP Domain 4: Communications and Network Security BER on port 389 is unencrypted; LDAPS (port 636) or StartTLS is required for production — wire-level understanding informs the transport security decision
CISSP Domain 3: Security Architecture and Engineering Schema design and DIT hierarchy are architectural decisions with security consequences: overly permissive schemas enable privilege escalation; flat DITs make access delegation harder

Key Takeaways

  • The DIT is a hierarchical database — every entry has a unique DN that describes its path from leaf to root
  • Object classes define the schema rules for each entry: what attributes are required (MUST) vs optional (MAY), and what the entry fundamentally is
  • For a user to be usable for Linux logins, the directory entry needs the posixAccount object class with uidNumber, gidNumber, and homeDirectory populated
  • An LDAP login is two operations: a Bind (authenticate), then a Search (retrieve POSIX attributes and group memberships)
  • Everything on the wire is BER-encoded binary — ldapsearch output is LDIF, a human-readable transformation of what the wire actually carries
  • LDAP result code 0 means success; 49 means bad credentials; 32 means the base DN doesn’t exist — these are the three you’ll debug most often


Run ldapsearch against your own directory and look at the object classes on your entry. Does it have posixAccount? Does it have shadowAccount? What attributes is your SSSD actually reading on every login — and what does it do when the LDAP server is unreachable? 👇


What’s Next

EP02 showed what’s inside the directory: the tree structure, the schema, the operations, and the wire protocol. What it left open is how Linux actually uses this information to grant a login.

LDAP is not, by itself, an authentication protocol. The Bind operation can verify a password — but that’s a tiny piece of what happens when you SSH into a machine joined to Active Directory. The full login flow runs through PAM, NSS, and SSSD before LDAP ever gets queried. EP03 traces that path.

Next: LDAP Authentication on Linux: PAM, NSS, and the Login Stack

Get EP03 in your inbox when it publishes → linuxcent.com/subscribe

Hardening Blueprint as Code — Declare Your OS Baseline in YAML

Reading Time: 6 minutes

OS Hardening as Code, Episode 2
Cloud AMI Security Risks · Linux Hardening as Code**

Focus Keyphrase: Linux hardening as code
Search Intent: Informational
Meta Description: Stop relying on hardening runbooks that get skipped at 2am. Declare your Linux OS baseline as a YAML blueprint — and build images where skipping a step is structurally impossible. (155 chars)


TL;DR

  • A hardening runbook is a list of steps someone runs. A HardeningBlueprint YAML is a build artifact — if it wasn’t applied, the image doesn’t exist
  • Linux hardening as code means declaring your entire OS security baseline in a single YAML file and building it reproducibly across any provider
  • stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws either produces a hardened image or fails — there is no partial state
  • The blueprint includes: target OS/provider, compliance benchmark, Ansible roles, and per-control overrides with documented reasons
  • One blueprint file = one source of truth for your hardening posture, version-controlled and reviewable like any other infrastructure code
  • Post-build OpenSCAP scan runs automatically — the image only snapshots if it passes

The Problem: A Runbook That Gets Skipped Once Is a Runbook That Gets Skipped

Hardening runbook
       │
       ▼
  Human executes
  steps manually
       │
       ├─── 47 deployments: followed correctly
       │
       └─── 1 deployment at 2am: step 12 skipped
                    │
                    ▼
           Instance in production
           without audit logging,
           SSH password auth enabled,
           unnecessary services running

Linux hardening as code eliminates the human decision point. If the blueprint wasn’t applied, the image doesn’t exist.

EP01 showed that default cloud AMIs arrive pre-broken — unnecessary services, no audit logging, weak kernel parameters, SSH configured for convenience not security. The obvious response is a hardening script. But a script run by a human is still a process step. It can be skipped. It can be done halfway. It can drift across different engineers who each interpret “run the hardening script” slightly differently.


A production deployment last year. The platform team had a solid CIS L1 hardening runbook — 68 steps, well-documented, followed consistently. Then a critical incident at 2am required three new instances to be deployed on short notice. The engineer on call ran the provisioning script and, under pressure, skipped the hardening step with the intention of running it the next morning.

They didn’t. The three instances stayed in production unhardened for six weeks before an automated scan caught them. Audit logging wasn’t configured. SSH was accepting password authentication. Two unnecessary services were running that weren’t in the approved software list.

Nothing was breached. But the finding went into the next compliance report as a gap, the team spent a week remediating, and the post-mortem conclusion was “we need better runbook discipline.”

That’s the wrong conclusion. The runbook isn’t the problem. The problem is that hardening was a process step instead of a build constraint.


What Linux Hardening as Code Actually Means

Linux hardening as code is the same principle as infrastructure as code applied to OS security posture: the desired state is declared in a file, the file is the source of truth, and the execution is deterministic and repeatable.

HardeningBlueprint YAML
         │
         ▼
  stratum build
         │
  ┌──────┴──────────────────┐
  │  Provider Layer          │
  │  (cloud-init, disk       │
  │   names, metadata        │
  │   endpoint per provider) │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  Ansible-Lockdown        │
  │  (CIS L1/L2, STIG —      │
  │   the hardening steps)   │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  OpenSCAP Scanner        │
  │  (post-build verify)     │
  └──────┬──────────────────┘
         │
         ▼
  Golden Image (AMI/GCP image/Azure image)
  + Compliance grade in image metadata

The YAML file is what you write. Stratum handles the rest.


The HardeningBlueprint YAML

The blueprint is the complete, auditable declaration of your OS security posture:

# ubuntu22-cis-l1.yaml
name: ubuntu22-cis-l1
description: Ubuntu 22.04 CIS Level 1 baseline for production workloads
version: "1.0"

target:
  os: ubuntu
  version: "22.04"
  provider: aws
  region: ap-south-1
  instance_type: t3.medium

compliance:
  benchmark: cis-l1
  controls: all

hardening:
  - ansible-lockdown/UBUNTU22-CIS
  - role: custom-audit-logging
    vars:
      audit_log_retention_days: 90
      audit_max_log_file: 100

filesystem:
  tmp:
    type: tmpfs
    options: [nodev, nosuid, noexec]
  home:
    options: [nodev]

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"
  - id: 5.2.4
    override: compliant
    reason: "SSH timeout managed by session manager policy, not sshd_config"

Each section is explicit:

target — which OS, which version, which provider. This is the only provider-specific section. The compliance intent below it is portable.

compliance — which benchmark and which controls to apply. controls: all means every CIS L1 control. You can also specify controls: [1.x, 2.x] to scope to specific sections.

hardening — which Ansible roles to run. ansible-lockdown/UBUNTU22-CIS is the community CIS hardening role. You can add custom roles alongside it.

controls — documented exceptions. Not suppressions — overrides with a recorded reason. This is the difference between “we turned off this control” and “this control is satisfied by an equivalent implementation, documented here.”


Building the Image

# Validate the blueprint before building
stratum blueprint validate ubuntu22-cis-l1.yaml

# Build — this will take 15-20 minutes
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws

# Output:
# [15:42:01] Launching build instance...
# [15:42:45] Running ansible-lockdown/UBUNTU22-CIS (144 tasks)...
# [15:51:33] Running custom-audit-logging role...
# [15:52:11] Running post-build OpenSCAP scan (benchmark: cis-l1)...
# [15:54:08] Grade: A (98/100 controls passing)
# [15:54:09] 2 controls overridden (documented in blueprint)
# [15:54:10] Creating AMI snapshot: ami-0a7f3c9e82d1b4c05
# [15:54:47] Done. AMI tagged with compliance grade: cis-l1-A-98

If the post-build scan comes back below a configurable threshold, the build fails — no AMI is created. The instance is terminated. The image does not exist.

That is the structural guarantee. You cannot skip a build step at 2am because at 2am you’re calling stratum build, not running steps manually.


The Control Override Mechanism

The override mechanism is what separates this from checkbox compliance.

Every security benchmark has controls that conflict with how production environments actually work. CIS L1 recommends /tmp on a separate partition. Many cloud instances use tmpfs with equivalent nodev, nosuid, noexec mount options. The intent of the control is satisfied. The literal implementation differs.

Without an override mechanism, you have two bad options: fail the scan (noisy, meaningless), or configure the scanner to ignore the control (undocumented, invisible to auditors).

The blueprint’s controls section gives you a third option: record the override, document the reason, and let the scanner count it as compliant. The SARIF output and the compliance grade both reflect the documented state.

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"

This appears in the build log, in the SARIF export, and in the image metadata. An auditor reading the output sees: control 1.1.2 — compliant, documented exception, reason recorded. Not: control 1.1.2 — ignored.


What the Blueprint Gives You That a Script Doesn’t

Hardening script HardeningBlueprint YAML
Version-controlled Possible but not enforced Always — it’s a file
Auditable exceptions Typically not Built-in override mechanism
Post-build verification Manual or none Automatic OpenSCAP scan
Image exists only if hardened No Yes — build fails if scan fails
Multi-cloud portability Requires separate scripts Provider flag, same YAML
Drift detection Not possible Rescan instance against original grade
Skippable at 2am Yes No — you’d have to change the build process

The last row is the one that matters. A script is skippable because there’s a human in the loop. A blueprint is a build artifact — you can’t deploy the image without the blueprint having been applied, because the image is what the blueprint produces.


Validating a Blueprint Before Building

# Syntax and schema validation
stratum blueprint validate ubuntu22-cis-l1.yaml

# Dry-run — show what Ansible tasks will run, what controls will be checked
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws --dry-run

# Show all available controls for a benchmark
stratum blueprint controls --benchmark cis-l1 --os ubuntu --version 22.04

# Show what a specific control checks
stratum blueprint controls --id 1.1.2 --benchmark cis-l1

The dry-run output shows every Ansible task that will run, every OpenSCAP check that will fire, and flags any controls that might conflict with the provider environment before you’ve launched a build instance.


Production Gotchas

Build time is 15–25 minutes. Ansible-Lockdown applies 144+ tasks for CIS L1. Build this into your pipeline timing — don’t expect golden images in 3 minutes.

Cloud-init ordering matters. On AWS, certain hardening steps (sysctl tuning, PAM configuration) interact with cloud-init. The Stratum provider layer handles sequencing — but if you add custom hardening roles, test the cloud-init interaction explicitly.

Some CIS controls conflict with managed service requirements. AWS Systems Manager Session Manager requires specific SSH configuration. RDS requires specific networking settings. Use the controls override section to document these — don’t suppress them silently.

Kernel parameter hardening requires a reboot. Controls in the 3.x (network parameters) and 1.5.x (kernel modules) sections apply sysctl changes that take effect on reboot. The Stratum build process reboots the instance before the OpenSCAP scan — don’t skip the reboot if you’re building manually.


Key Takeaways

  • Linux hardening as code means the blueprint YAML is the build artifact — the image either exists and is hardened, or it doesn’t exist
  • The controls override mechanism is the difference between undocumented suppressions and auditable, reasoned exceptions
  • Post-build OpenSCAP scan runs automatically — a failing grade blocks image creation
  • One blueprint file is portable across providers (EP03 covers this): the compliance intent stays in the YAML, the cloud-specific details go in the provider layer
  • Version-controlling the blueprint gives you a complete history of what your OS security posture was at any point in time — the same way Terraform state tracks infrastructure

What’s Next

One blueprint, one provider. EP02 showed that the skip-at-2am problem is solved when hardening is a build artifact rather than a process step.

What it didn’t address: what happens when you expand to a second cloud. GCP uses different disk names. Azure cloud-init fires in a different order. The AWS metadata endpoint IP is different from every other provider. If you maintain separate hardening scripts per cloud, they drift within a month.

EP03 covers multi-cloud OS hardening: the same blueprint, six providers, no drift.

Next: multi-cloud OS hardening — one blueprint for AWS, GCP, and Azure

Get EP03 in your inbox when it publishes → linuxcent.com/subscribe

What Is LDAP — and Why It Was Invented to Replace Something Worse

Reading Time: 9 minutes

The Identity Stack, Episode 1
EP01EP02: LDAP Internals → EP03 → …

Focus Keyphrase: what is LDAP
Search Intent: Informational
Meta Description: LDAP solved 1980s authentication chaos and still powers enterprise logins today. Learn what it replaced, how it works, and why it’s still in your stack. (155 chars)


TL;DR

  • LDAP (Lightweight Directory Access Protocol) is a protocol for reading and writing directory information — most commonly, who is allowed to do what
  • It was built in 1993 as a “lightweight” alternative to X.500/DAP, which ran over the full OSI stack and was impossible to deploy on anything but mainframe hardware
  • Before LDAP, every server had its own /etc/passwd — 50 machines meant 50 separate user databases, managed manually
  • NIS (Network Information Service) was the first attempt to centralize this — it worked, then became a cleartext-credentials security liability
  • LDAP v3 (RFC 2251, 1997) is the version still in production today — 27 years of backwards compatibility
  • Everything you use today — Active Directory, Okta, Entra ID — is built on top of, or speaks, LDAP

The Big Picture: 50 Years of “Who Are You?”

1969–1980s   /etc/passwd — per-machine, no network auth
     │        50 servers = 50 user databases, managed manually
     │
     ▼
1984         Sun NIS / Yellow Pages — first centralized directory
     │        broadcast-based, no encryption, flat namespace
     │        Revolutionary for its era. A liability by the 1990s.
     │
     ▼
1988         X.500 / DAP — enterprise-grade directory services
     │        OSI protocol stack. Powerful. Impossible to deploy.
     │        Mainframe-class infrastructure required just to run it.
     │
     ▼
1993         RFC 1487 — LDAP v1
     │        Tim Howes, University of Michigan.
     │        Lightweight. TCP/IP. Actually deployable.
     │
     ▼
1997         RFC 2251 — LDAP v3
     │        SASL authentication. TLS. Controls. Referrals.
     │        The version still in production today.
     │
     ▼
2000s–now    Active Directory, OpenLDAP, 389-DS, FreeIPA
             Okta, Entra ID, Google Workspace
             LDAP DNA in every identity system on the planet.

What is LDAP? It’s the protocol that solved one of the most boring and consequential problems in computing: how do you know who someone is, across machines, at scale, without sending their password in cleartext?


The World Before LDAP

Before you understand why LDAP was invented, you need to feel the problem it solved.

Every Unix machine in the 1970s and 1980s managed its own users. When you created an account on a server, your username, UID, and hashed password went into /etc/passwd on that machine. Another machine had no idea you existed. If you needed access to ten servers, an administrator created ten separate accounts — manually, one by one. When you changed your password, each account had to be updated separately.

For a university with 200 machines and 10,000 students, this was chaos. For a company with offices in three cities, it was a full-time job for multiple sysadmins.

Machine A           Machine B           Machine C
/etc/passwd         /etc/passwd         /etc/passwd
vamshi:x:1001       (vamshi unknown)    vamshi:x:1004
alice:x:1002        alice:x:1001        alice:x:1003
bob:x:1003          bob:x:1002          (bob unknown)

Same people, different UIDs, different machines, no central truth.
File permissions become meaningless when UID 1001 means
different users on different hosts.

For every new hire, an admin SSHed to every machine and ran useradd. When someone left, you hoped whoever ran the offboarding remembered all the machines. Most organizations didn’t know their own attack surface because there was no single place to look.


Sun NIS: The First Attempt at Centralization

Sun Microsystems released NIS (Network Information Service) in 1984, originally called Yellow Pages — a name they had to drop after a trademark dispute with British Telecom. The idea was elegant: one server holds the authoritative /etc/passwd (and /etc/group, /etc/hosts, and a dozen other maps), and client machines query it instead of reading local files.

For the first time, you could create an account once and have it work across your entire network. For a generation of Unix administrators, NIS was liberating.

       NIS Master Server
       /var/yp/passwd.byname
              │
    ┌─────────┼──────────┐
    ▼         ▼          ▼
 Client A   Client B   Client C
 (query NIS — no local /etc/passwd needed)

NIS worked well — until it didn’t. The failure modes were structural:

No encryption. NIS responses were cleartext UDP. An attacker on the same network segment could capture the full password database with a packet sniffer. In 1984, “the network” meant a trusted corporate LAN. By the mid-1990s, it meant ethernet segments that included lab workstations, and the assumptions no longer held.

Broadcast-based discovery. NIS clients found servers by broadcasting on the local network. This worked on a single flat ethernet. It failed completely across routers, across buildings, and across WAN links. Multi-site organizations ended up running separate NIS domains with no connection between them — which partially defeated the purpose.

Flat namespace. NIS had no organizational hierarchy. One domain. Everything flat. You couldn’t have engineering and finance as separate administrative units. You couldn’t delegate user management to a department. One person — usually one overworked sysadmin — managed the whole thing.

UIDs had to match across all machines. If alice was UID 1002 on one server but UID 1001 on another, NFS file ownership became wrong. NIS enforced consistency, but onboarding a new machine into an existing network required manually auditing UID conflicts across the entire directory. Get one wrong and files end up owned by the wrong person.

NIS worked for thousands of installations from 1984 to the mid-1990s. It also ended careers when it failed. What the industry needed was a hierarchical, structured, encrypted, scalable directory service.


X.500 and DAP: The Right Idea, Wrong Protocol

The OSI (Open Systems Interconnection) standards body had an answer: X.500 directory services. X.500 was comprehensive, hierarchical, globally federated. The ITU-T published the standard in 1988, and it looked like exactly what enterprises needed.

X.500 Directory Information Tree (DIT)
              c=US                   ← country
                │
         o=University                ← organization
                │
         ┌──────┴──────┐
     ou=CS           ou=Physics      ← organizational units
         │
     cn=Tim Howes                    ← common name (person)
     telephoneNumber: +1-734-...
     mail: [email protected]

This data model — the hierarchy, the object classes, the distinguished names — is exactly what LDAP inherited. The DIT, the cn=, ou=, dc= notation in every LDAP query you’ve ever read: all of it came from X.500.

The problem was DAP: the Directory Access Protocol that X.500 used to communicate.

DAP ran over the full OSI protocol stack. Not TCP/IP — OSI. Seven layers, all of which required specialized software that in 1988 only mainframe and minicomputer vendors had implemented. A university department wanting to run X.500 needed hardware and software licenses that cost as much as a small car. The vast majority of workstations couldn’t speak OSI at all.

The data model was sound. The transport was impractical.

X.500 / DAP (1988)              LDAP v1 (1993)
──────────────────              ──────────────
Full OSI stack (7 layers)  →    TCP/IP only
Mainframe-class hardware   →    Any Unix box with a TCP stack
$50,000+ deployment cost   →    Free (reference implementation)
Vendor-specific OSI impl.  →    Standard socket API
Zero internet adoption     →    Universities deployed immediately

The Invention: LDAP at the University of Michigan

Tim Howes was at the University of Michigan in the early 1990s. The university was running X.500 for its directory — faculty, staff, student contact information, credentials. The data model was good. The protocol was the problem.

His insight, working with colleagues Wengyik Yeong and Steve Kille: strip X.500 down to what actually needs to function over a TCP/IP connection. Keep the hierarchical data model. Throw away the OSI transport. The result was the Lightweight Directory Access Protocol.

RFC 1487, published July 1993, described LDAP v1. It preserved the X.500 directory information model — the hierarchy, the object classes, the distinguished name format — and mapped it onto a protocol that could run over a simple TCP socket on port 389.

No specialized hardware. No OSI. If you had a Unix machine and TCP/IP, you could run LDAP. By 1993, that meant virtually every workstation and server in every university and most enterprises.

The University of Michigan deployed it immediately. Within two years, organizations across the internet were running the reference implementation.

LDAP v2 (RFC 1777, 1995) cleaned up the protocol. LDAP v3 (RFC 2251, 1997) is the version in production today — adding SASL authentication (which enables Kerberos integration), TLS support, referrals for federated directories, and extensible controls for server-side operations. The RFC that standardized the internet’s primary identity protocol is 27 years old and still running.


What LDAP Actually Is

LDAP is a client-server protocol for reading and writing a directory — a structured, hierarchical database optimized for reads.

Every entry in the directory has a Distinguished Name (DN) that describes its position in the hierarchy, and a set of attributes defined by its object classes. A person entry looks like this:

dn: cn=vamshi,ou=engineers,dc=linuxcent,dc=com

objectClass: inetOrgPerson
objectClass: posixAccount
cn: vamshi
uid: vamshi
uidNumber: 1001
gidNumber: 1001
homeDirectory: /home/vamshi
loginShell: /bin/bash
mail: [email protected]

The DN reads right-to-left: domain linuxcent.com (dc=linuxcent,dc=com) → organizational unit engineers → common name vamshi. Every entry in the directory has a unique path through the tree — there’s no ambiguity about which vamshi you mean.

LDAP defines eight operations: Bind (authenticate), Search, Add, Modify, Delete, ModifyDN (rename), Compare, and Abandon. Most of what a Linux authentication system does with LDAP reduces to two: Bind (prove you are who you say you are) and Search (tell me everything you know about this user).

When your Linux machine authenticates an SSH login against LDAP:

1. User types password
2. PAM calls pam_sss (or pam_ldap on older systems)
3. SSSD issues a Bind to the LDAP server: "I am cn=vamshi, and here is my credential"
4. LDAP server verifies the bind → success or failure
5. SSSD issues a Search: "give me the posixAccount attributes for uid=vamshi"
6. LDAP returns uidNumber, gidNumber, homeDirectory, loginShell
7. PAM creates the session with those attributes

The entire login flow is two LDAP operations: one Bind, one Search.


Try It Right Now

You don’t need to set up an LDAP server to run your first query. There’s a public test LDAP directory at ldap.forumsys.com:

# Query a public LDAP server — no setup required
ldapsearch -x \
  -H ldap://ldap.forumsys.com \
  -b "dc=example,dc=com" \
  -D "cn=read-only-admin,dc=example,dc=com" \
  -w readonly \
  "(objectClass=inetOrgPerson)" \
  cn mail uid

# What you get back (abbreviated):
# dn: uid=tesla,dc=example,dc=com
# cn: Tesla
# mail: [email protected]
# uid: tesla
#
# dn: uid=einstein,dc=example,dc=com
# cn: Albert Einstein
# mail: [email protected]
# uid: einstein

Decode what you just ran:

  • -x — simple authentication (username/password bind, not Kerberos/SASL)
  • -H ldap://ldap.forumsys.com — the LDAP server URI, port 389
  • -b "dc=example,dc=com" — the base DN, the top of the subtree to search
  • -D "cn=read-only-admin,dc=example,dc=com" — the bind DN (who you’re authenticating as)
  • -w readonly — the bind password
  • "(objectClass=inetOrgPerson)" — the search filter: return entries that are people
  • cn mail uid — the attributes to return (default returns all)

That’s a live LDAP query returning real directory entries from a server running RFC 2251 — the same protocol Tim Howes designed in 1993.

On your own Linux system, if you’re joined to AD or LDAP, you can query it the same way with your domain credentials.


Why It Never Went Away

LDAP v3 was finalized in 1997. In 2024, it’s still the protocol every enterprise directory speaks. Why?

Because it became the lingua franca of enterprise identity before any replacement existed. Every application that needs to authenticate users — VPN concentrators, mail servers, network switches, web applications, HR systems — implemented LDAP support. Every directory service Microsoft, Red Hat, Sun, and Novell shipped stored data in an LDAP-accessible tree.

When Microsoft built Active Directory in 1999, they built it on top of LDAP + Kerberos. When your Linux machine joins an AD domain, it speaks LDAP to enumerate users and groups, and Kerberos to verify credentials. When Okta or Entra ID syncs with your on-premises directory, it uses LDAP Sync (or a modern protocol that maps directly to LDAP semantics).

The protocol is old. The ecosystem built on top of it is so deep that replacing LDAP would mean simultaneously replacing every enterprise application that depends on it. Nobody has done that. Nobody has had to.

What happened instead is the stack got taller. LDAP at the bottom, Kerberos for network authentication, SSSD as the local caching daemon, PAM as the Linux integration layer, SAML and OIDC at the top for web-based federation. The directory is still LDAP. The interfaces above it evolved.

That full stack — from the directory at the bottom to Zero Trust at the top — is what this series covers.


⚠ Common Misconceptions

“LDAP is an authentication protocol.” LDAP is a directory protocol. It stores identity information and can verify credentials (via Bind). Authentication in modern stacks is typically Kerberos or OIDC — LDAP provides the directory backing it.

“LDAP is obsolete.” LDAP is the storage layer for Active Directory, OpenLDAP, 389-DS, FreeIPA, and every enterprise IdP’s on-premises sync. It is ubiquitous. What’s changed is the interface layer above it.

“You need Active Directory to run LDAP.” Active Directory uses LDAP. OpenLDAP, 389-DS, FreeIPA, and Apache Directory Server are all standalone LDAP implementations. You can run a directory without Microsoft.

“LDAP and LDAPS are different protocols.” LDAP is the protocol. LDAPS is LDAP over TLS on port 636. StartTLS is LDAP on port 389 with an in-session upgrade to TLS. Same protocol, different transport security.


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management LDAP is the foundational directory protocol for centralized identity stores — the base layer of every enterprise IAM stack
CISSP Domain 4: Communications and Network Security Port 389 (LDAP), 636 (LDAPS), 3268/3269 (AD Global Catalog) — transport security decisions affect every directory deployment
CISSP Domain 3: Security Architecture and Engineering DIT hierarchy, schema design, replication topology — directory structure is an architectural security decision
NIST SP 800-63B LDAP as a credential service provider (CSP) backing enterprise authenticators

Key Takeaways

  • LDAP was invented to solve a real, painful problem: the authentication chaos that NIS couldn’t fix and X.500/DAP was too expensive to deploy
  • It inherited the right thing from X.500 (the hierarchical data model) and replaced the right thing (the impractical OSI transport with TCP/IP)
  • NIS was the predecessor that worked until it didn’t — its failure modes (no encryption, flat namespace, broadcast discovery) are exactly what LDAP was designed to fix
  • LDAP v3 (RFC 2251, 1997) is still the production standard — 27 years later
  • Active Directory, OpenLDAP, FreeIPA, Okta, Entra ID — every enterprise identity system either runs LDAP or speaks it
  • The full authentication stack is deeper than LDAP: the next 12 episodes peel it apart layer by layer

What’s Next

EP01 stayed at the design level — the problem, the predecessor failures, the invention, the data model.

EP02 goes inside the wire. The DIT structure, DN syntax, object classes, schema, and the BER-encoded bytes that actually travel from the server to your authentication daemon. Run ldapsearch against your own directory and read every line of what comes back.

Next: LDAP Internals: The Directory Tree, Schema, and What Travels on the Wire

Get EP02 in your inbox when it publishes → linuxcent.com/subscribe