The Identity Stack, Episode 7
EP06: OpenLDAP → EP07 → EP08: FreeIPA → …
Focus Keyphrase: LDAP high availability
Search Intent: Informational
Meta Description: Design LDAP high availability for production: HAProxy load balancing, read/write split, connection pooling, monitoring with cn=monitor, and 389-DS at scale. (157 chars)
TL;DR
- LDAP HA means multiple directory servers behind a load balancer — clients connect to a VIP, not to individual servers
- Read/write split: all writes go to the provider, reads are distributed across consumers — the load balancer enforces this by routing on port or backend check
- SSSD handles multi-server failover natively (
ldap_uriaccepts a comma-separated list) — for apps without built-in failover, HAProxy with health checks does the work - Connection pooling is critical at scale —
nss_ldapandpam_ldapopened a new connection per login; SSSD maintains a pool; apps that use libldap directly must implement their own cn=monitoris the built-in monitoring endpoint — exposes connection counts, operation rates, and backend stats readable vialdapsearch- 389-DS (Red Hat Directory Server) is the production choice for >1M entries — purpose-built for large directories with a dedicated replication engine
The Big Picture: Production LDAP Topology
Clients (SSSD, apps, VPN concentrators)
│
┌───────▼───────┐
│ HAProxy VIP │ ← single endpoint, port 389/636
│ 10.0.0.10 │
└───────┬───────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
ldap1.corp.com ldap2.corp.com ldap3.corp.com
(Provider) (Consumer) (Consumer)
Reads + Writes Reads only Reads only
│ ▲ ▲
└───────────┴───────────────┘
SyncRepl replication
EP06 built a two-node replicated directory. This episode covers what happens when the directory becomes infrastructure — when it needs to survive a node failure, handle thousands of connections, and be monitored like any other critical service.
HAProxy for LDAP
HAProxy is the standard choice for LDAP load balancing. Unlike HTTP, LDAP is a stateful protocol — once a client binds, subsequent operations on that connection share the authenticated session. The load balancer must use connection persistence, not per-request routing.
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
maxconn 50000
defaults
mode tcp # LDAP is TCP, not HTTP
timeout connect 5s
timeout client 30s
timeout server 30s
option tcplog
# ── LDAP read/write split ─────────────────────────────────────────────
# Writes → provider only
frontend ldap-write
bind *:389
default_backend ldap-provider
backend ldap-provider
balance first # always use first available (provider)
option tcp-check
tcp-check connect
server ldap1 ldap1.corp.com:389 check inter 5s rise 2 fall 3
server ldap2 ldap2.corp.com:389 check inter 5s rise 2 fall 3 backup
# Reads → all nodes round-robin
frontend ldap-read
bind *:3389 # internal read port
default_backend ldap-consumers
backend ldap-consumers
balance roundrobin
option tcp-check
tcp-check connect
server ldap1 ldap1.corp.com:389 check inter 5s
server ldap2 ldap2.corp.com:389 check inter 5s
server ldap3 ldap3.corp.com:389 check inter 5s
# LDAPS (TLS)
frontend ldaps
bind *:636
default_backend ldap-consumers-tls
backend ldap-consumers-tls
balance roundrobin
server ldap1 ldap1.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem
server ldap2 ldap2.corp.com:636 check inter 5s ssl verify required ca-file /etc/ssl/certs/ca.pem
The health check (tcp-check connect) just verifies TCP connectivity. For a more precise check — verifying that slapd is actually responding to LDAP requests — use a custom script that runs ldapsearch and checks the result code.
SSSD Multi-Server Failover
SSSD has native failover — no load balancer required for SSSD-based clients:
# /etc/sssd/sssd.conf
[domain/corp.com]
ldap_uri = ldap://ldap1.corp.com, ldap://ldap2.corp.com, ldap://ldap3.corp.com
# SSSD tries them in order; switches to next on failure
# Switches back to primary after ldap_recovery_interval (default: 30s)
# For AD, discovery via DNS SRV records is even better:
ad_server = _srv_
# SSSD queries _ldap._tcp.corp.com SRV records and gets all DCs automatically
SSSD monitors the connection health. If the current server becomes unreachable, it switches to the next in the list within seconds. Existing cached data keeps serving during the switchover. Clients using SSSD don’t need a load balancer for basic HA.
Connection Pooling
Every LDAP bind creates an authenticated session on the server. A server with connection limits (olcConnMaxPending, olcConnMaxPendingAuth in OLC) will reject new connections when those limits are hit.
The problem: applications that use libldap directly tend to open a new connection per operation. At 500 requests/second, that’s 500 new TCP connections, 500 binds, 500 TLS handshakes per second — a directory that can handle 5000 concurrent connections starts refusing new ones.
The solutions:
SSSD — handles this automatically. SSSD maintains one or a small number of persistent connections per domain and multiplexes all PAM/NSS queries through them.
Application-level pooling — frameworks like python-ldap with connection pooling, ldap3 with connection strategies, or dedicated middleware like 389-DS‘s Directory Proxy Server.
ldap_maxconnections in OpenLDAP — sets a hard limit. When hit, new connections block until existing ones close. Set this to something reasonable (olcConnMaxPending: 100 in OLC) so you get a controlled failure mode instead of unbounded queuing.
Monitoring with cn=monitor
OpenLDAP exposes live operational statistics via the cn=monitor database — a virtual LDAP subtree that reflects the server’s current state. Enable it:
# enable-monitor.ldif
dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/ldap
olcModuleLoad: back_monitor
dn: olcDatabase=monitor,cn=config
objectClass: olcDatabaseConfig
olcDatabase: monitor
olcAccess: to *
by dn="cn=admin,dc=corp,dc=com" read
by * none
Query it:
# Overall statistics
ldapsearch -x -H ldap://localhost \
-D "cn=admin,dc=corp,dc=com" -w password \
-b "cn=monitor" -s sub "(objectClass=*)" \
monitorOpInitiated monitorOpCompleted
# Connection counts
ldapsearch -x -H ldap://localhost \
-D "cn=admin,dc=corp,dc=com" -w password \
-b "cn=Connections,cn=monitor" -s one \
monitorConnectionNumber
# Operations by type
ldapsearch -x -H ldap://localhost \
-D "cn=admin,dc=corp,dc=com" -w password \
-b "cn=Operations,cn=monitor" -s one \
monitorOpInitiated monitorOpCompleted
Useful metrics to export to Prometheus (via prometheus-openldap-exporter or similar):
– monitorOpCompleted per operation type (bind, search, modify)
– monitorConnectionNumber — current connection count
– Backend-specific: olmMDBEntries, olmMDBPagesMax, olmMDBPagesUsed
389-DS: LDAP at Scale
OpenLDAP is excellent for directories up to a few million entries. When you need:
– 10M+ entries
– High write throughput (more than a few hundred writes/second)
– Fine-grained replication filtering
– A dedicated web-based admin UI
…389-DS (Red Hat Directory Server, community edition) is the production answer. It’s what FreeIPA uses under the hood.
Key architectural differences from OpenLDAP:
Multi-supplier replication — 389-DS’s replication engine uses a dedicated changelog (stored in LMDB) and Change Sequence Numbers (CSNs) for conflict resolution. Multi-supplier (multi-master) replication is first-class, not a bolted-on feature.
Changelog — every change is written to a persistent changelog before being applied. This enables precise replication: a consumer can reconnect after a network partition and get exactly the changes it missed, rather than doing a full resync.
Plugin architecture — 389-DS functionality (replication, managed entries, DNA for automatic UID allocation, memberOf, password policy) is all implemented as plugins that can be enabled/disabled per directory instance.
# Install 389-DS
dnf install -y 389-ds-base
# Create a new instance
dscreate interactive
# — or use a template:
dscreate from-file /path/to/instance.inf
# Manage with dsctl
dsctl slapd-corp status
dsctl slapd-corp start
dsctl slapd-corp stop
# Admin with dsconf
dsconf slapd-corp backend suffix list
dsconf slapd-corp replication status -suffix "dc=corp,dc=com"
The dsconf replication status command gives a live view of replication lag across all suppliers and consumers — something OpenLDAP requires you to compute manually from contextCSN comparisons.
Global Catalog: Cross-Domain Search in AD
When your directory spans multiple AD domains in a forest, the Global Catalog solves a specific problem: a user in emea.corp.com needs to be found by an app that only knows corp.com.
Forest: corp.com
├── corp.com → DC port 389 full directory: 500K entries
├── emea.corp.com → DC port 389 full directory: 200K entries
└── Global Catalog → GC port 3268 partial replica: 700K entries
(not all attributes — just the most queried ones)
The GC replicates a subset of attributes from every domain in the forest. By default: cn, mail, sAMAccountName, userPrincipalName, memberOf, and about 150 others. Attributes marked with isMemberOfPartialAttributeSet in the schema are replicated to the GC.
If an application is configured to use port 3268 instead of 389, it’s using the GC — and it won’t see attributes not included in the partial attribute set. This surprises teams that add a custom attribute to AD and then wonder why their application can’t see it on 3268 but can on 389.
⚠ Production Gotchas
HAProxy TCP health checks don’t verify LDAP is responsive. A server can accept TCP connections but have slapd in a degraded state (database corruption, out-of-memory). Build a proper LDAP health check: a script that binds and searches a known entry and checks the result.
replication lag under write load. SyncRepl consumers can fall behind under sustained write load. Monitor the contextCSN difference between provider and consumers. If consumers are more than a few seconds behind, investigate the provider’s write throughput and the consumer’s processing speed.
Directory size and the MDB mapsize. LMDB requires a pre-configured maximum database size (olcDbMaxSize). If the database grows beyond this, slapd starts failing writes. Set it to 2–4x your expected data size and monitor olmMDBPagesUsed / olmMDBPagesMax.
Key Takeaways
- HAProxy in TCP mode provides LDAP load balancing — use
balance firstfor write routing (provider only),balance roundrobinfor reads - SSSD has native failover via
ldap_uri— for SSSD clients, a load balancer adds HA but isn’t strictly required cn=monitoris the built-in OpenLDAP monitoring endpoint — export its counters to Prometheus for operational visibility- 389-DS is the right choice for >1M entries, high write throughput, or multi-supplier replication as a first-class feature
- Global Catalog (port 3268/3269) is a partial replica of all AD domains — useful for forest-wide searches, but missing non-replicated attributes
What’s Next
EP07 covers the infrastructure layer. EP08 zooms out to FreeIPA — what you get when LDAP, Kerberos, DNS, PKI, and HBAC are integrated into a single Linux-native identity stack, and why most Linux shops running their own directory should be running FreeIPA instead of bare OpenLDAP.
Next: FreeIPA: LDAP + Kerberos + PKI in a Single Linux Identity Stack
Get EP08 in your inbox when it publishes → linuxcent.com/subscribe