Zero Trust Archives

Cybersecurity Architecture Principles: Beyond the Castle-and-Moat

July 7, 2026 by Vamshi Krishna Santhapuri

Reading Time: 6 minutes

Zero to Hero: Cybersecurity Architecture Masterclass, Module 1
← All Masterclass Modules · Module 1: Core Mental Models · Module 2: Proactive Design →

12 min read

Introduction

Modern cybersecurity architecture principles trace back to a single admission: in 2010, Google published the “BeyondCorp” whitepaper, the first high-profile confession from a tech giant that the corporate network — the “internal” network everyone trusted by default — was no longer safe. For decades, security was built on the Castle-and-Moat model: a hardened perimeter (the firewall) protecting a soft, trusted interior.

If you were inside the moat, you were trusted. If you were outside, you were a threat.

The rise of cloud, mobile, and sophisticated lateral-movement attacks has rendered this model obsolete. If an attacker compromises a single developer’s laptop or a single vulnerable Jenkins server, they are “inside the castle.” In a legacy architecture, the game is over.

Module 1 of the Masterclass establishes the core cybersecurity architecture principles required to move beyond the perimeter. We redefine the CIA Triad for the cloud era and establish the foundational shift to Zero Trust.

TL;DR

The CIA Triad is no longer enough: Modern architecture requires the Extended CIA Triad, adding Authenticity and Non-Repudiation to Confidentiality, Integrity, and Availability.
Defense-in-Depth is about redundant layers: A single failure (e.g., a leaked IAM key) should not lead to a total breach.
Zero Trust rejects implicit trust: No network location is trusted. Every request is verified explicitly based on identity, device posture, and context.
Security is a Product Requirement: Architectural security must be integrated into the SDLC (Software Development Lifecycle) from the “Definition” phase, not bolted on at “Deployment.”

The Big Picture: From Castle-and-Moat to Zero Trust

The fundamental shift in architecture is the transition from Network-Centric Trust to Identity-Centric Trust.

┌─────────────────────────────────────────────────────────────────────────────┐
│                   THE ARCHITECTURAL SHIFT: PERIMETER TO IDENTITY            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  LEGACY: CASTLE-AND-MOAT                  MODERN: ZERO TRUST ARCHITECTURE   │
│  (Implicit Trust)                         (Explicit Verification)           │
│                                                                             │
│  [ External ]                             [ External ]                      │
│       │                                        │                            │
│  ┌────▼────┐                              ┌────▼────────┐                   │
│  │ FIREWALL│ (The Moat)                   │ IDENTITY    │                   │
│  └────┬────┘                              │ PROVIDER    │                   │
│       │                                   └────┬────────┘                   │
│  ┌────▼──────────────┐                         │                            │
│  │ TRUSTED INTERIOR  │                    ┌────▼────────┐                   │
│  │ (soft center)     │                    │ POLICY      │                   │
│  │ [App] [DB] [Log]  │                    │ ENGINE      │                   │
│  └───────────────────┘                    └────┬────────┘                   │
│                                                │ (Always Verify)            │
│       FAILURE MODE:                       ┌────▼────────┐                   │
│       Compromised VPN =                   │ RESOURCE    │                   │
│       Full Access                         │ [App] [DB]  │                   │
│                                           └─────────────┘                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

1. The Extended CIA Triad Deep-Dive

Every security decision you make as an architect eventually maps back to the CIA Triad. But for modern systems, the “Classic CIA” (Confidentiality, Integrity, Availability) is missing the two pillars that handle identity and accountability.

Confidentiality (Protecting the Data)

At Rest: AES-256 encryption for S3 buckets or RDS instances.
In Transit: TLS 1.3 for every internal and external API call.
In Execution: Using Trusted Execution Environments (TEEs) or eBPF-based visibility to ensure memory isn’t being scraped.

Integrity (Trusting the Data)

Hashing: Using SHA-256/512 to verify that the container image you pulled is the exact one you built.
Digital Signatures: Signing your CI/CD artifacts so the production cluster only runs code signed by your build system.
FIM (File Integrity Monitoring): Detecting when a binary in /usr/bin is modified on a live node.

Availability (Ensuring Access)

Resilience: Multi-AZ deployments and automated failover.
Protection: AWS Shield or Cloudflare to absorb L3/L4 and L7 DDoS attacks.
Immutable Backups: Protecting data from ransomware using WORM (Write Once, Read Many) storage.

Authenticity & Non-Repudiation (The “Extended” Pillars)

Authenticity: Proving the caller is who they say they are (MFA, Client Certificates).
Non-Repudiation: Ensuring an action cannot be denied later. This is where Secure Audit Logs (CloudTrail, Kubernetes Audit) become architectural requirements, not just compliance checkboxes.

2. Core Architecture Principles: Defense-in-Depth

Defense-in-Depth is often misunderstood as “buying more tools.” In architecture, it means Functional Redundancy of Controls.

Think of it as a series of checks where no single check is the “God Gate.”

Policy Layer: SCPs (Service Control Policies) that disable entire AWS regions.
Perimeter Layer: WAF rules blocking SQL injection at the edge.
Identity Layer: MFA required for every console and CLI session.
Network Layer: Security Groups and Micro-segmentation (Cilium/Istio).
Endpoint Layer: EDR (CrowdStrike/Tetragon) monitoring for anomalous process execution.
Data Layer: Encryption with KMS keys that the application role must explicitly be granted access to.

Practitioner Depth: A classic failure is relying on a VPN for access control. If the VPN is breached, the “Depth” is revealed to be zero. A true Defense-in-Depth architecture assumes the VPN is breached and relies on the subsequent layers (Identity and Data encryption) to stop the attacker.

3. Dismantling the Castle-and-Moat (Zero Trust)

The architectural shift from perimeter to identity — legacy castle-and-moat versus modern zero trust architecture — Left: castle-and-moat — one firewall decision grants access to the whole trusted interior. Right: zero trust — every request is verified against identity, policy, and context before reaching an isolated resource.

Zero Trust is the architectural implementation of the principle: “Never Trust, Always Verify.”

The Three Pillars of ZTA (NIST SP 800-207)

Continuous Verification: You don’t just verify at login. You verify every single request.
Limit Blast Radius (Micro-segmentation): If a web server is compromised, it should have no network path to the database except on the specific port required for the application.
Automate Context-Aware Response: If a user logs in from a new country and immediately tries to delete an S3 bucket, the architecture should automatically step up to MFA or revoke the session.

Zero Trust for IAM: We covered this extensively in IAM Episode 12. In architecture, this means moving the “Trust Boundary” from the edge of the VPC to the edge of the individual service or container.

4. Integration with the Software Lifecycle (SDLC)

Security architecture that exists only on a whiteboard is a liability. It must be integrated into the product management and development workflow.

The “Shift Left” Myth

Many teams talk about “shifting left” (moving security earlier in the cycle) but only implement it as a “pre-commit hook” or a “CI scan.”

True Shift Left is Architectural:
– Module 2 of this series covers Threat Modeling. This happens during the Design phase, before code exists.
– Module 3 covers Hardening. This happens during the Infrastructure-as-Code phase.

If you are catching architectural flaws during a “Penetration Test” (Shift Right), you have already failed Module 1.

Quick Check: Is Your Architecture “Leaky”?

Run these three checks on your environment to see if you are still relying on implicit trust:

# 1. Check for wide-open S3 buckets (Network-level trust check)
aws s3api get-public-access-block --bucket <your-bucket>
# Success: BlockPublicAcls/Policy/RestrictPublicBuckets should all be TRUE.

# 2. Check if your nodes can reach the IMDSv1 endpoint (Metadata spoofing check)
# Run this from INSIDE a pod:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Success: Should return a 403 or hang if IMDSv2 is enforced (Module 3).

# 3. Check for "God Roles" in your K8s cluster
kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name=="cluster-admin")'
# Success: Only your cluster management tool (e.g., ArgoCD) should be listed.

Production Gotchas

Latency vs. Security: Deep Packet Inspection (DPI) in a WAF or a Service Mesh (Istio) adds latency. You must architect for this by using Fast-Path hooks like XDP (covered in eBPF Episode 07) where possible.
The “Admin” Trap: Most breaches don’t happen because of a complex exploit; they happen because an administrator turned off MFA to “debug” a problem and never turned it back on. Architecture must enforce Non-Bypassable Controls.
Audit Logs are a DDoS Vector: If you log every packet at the kernel level without sampling, you will crash your logging pipeline before the attacker even finishes their scan.

Framework Alignment

Framework	Control / ID	Architectural Mapping
NIST CSF 2.0	GV.PO-01	Establish cybersecurity policy integrated with organizational SDLC.
NIST SP 800-207	Zero Trust	No implicit trust; identity-based access; continuous verification.
ISO 27001:2022	5.15	Access control must be based on business and security requirements.
SOC 2	CC6.1	Logical access controls must restrict access to authorized users/processes.

Key Takeaways

The Perimeter is a myth: Assume the attacker is already in your network.
Extended CIA: Authenticity and Non-Repudiation are the modern requirements for identity-based architecture.
Defense-in-Depth: Functional redundancy means no single control failure leads to a total breach.
Zero Trust: Move the trust boundary to the resource level, not the network level.

What’s Next

Foundational models are the “Why.” Module 2 covers the “How”—specifically, how to systematically identify threats using the STRIDE framework and calculate risk using DREAD.

Threat modeling is the single most important skill for a Security Architect. It’s how you stop vulnerabilities before they are even typed into an IDE.

Next: Module 2: Proactive Design — Threat Modeling with STRIDE

Get the full masterclass in your inbox → linuxcent.com/subscribe

Zero Trust Identity: SPIFFE, SPIRE, mTLS, and Continuous Verification

May 10, 2026May 9, 2026 by Vamshi Krishna Santhapuri

Reading Time: 7 minutes

The Identity Stack, Episode 13
EP12: Entra ID + Linux → EP13

TL;DR

Zero Trust means “never trust, always verify” — identity is verified continuously, not just at login time; network location provides no implicit trust
Human identity (users) and workload identity (services, pods, jobs) are separate problems — LDAP/Kerberos/OIDC solve the human side; SPIFFE/SPIRE solve the workload side
SPIFFE (Secure Production Identity Framework For Everyone) defines a standard for workload identity — a SPIFFE ID is a URI like spiffe://corp.com/ns/prod/sa/payments-svc
SPIRE (SPIFFE Runtime Environment) issues short-lived X.509 SVIDs (SPIFFE Verifiable Identity Documents) to workloads — certificates that rotate automatically, every hour
mTLS (mutual TLS) is how workloads prove identity to each other — both sides present certificates; no passwords, no API keys
The evolution: /etc/passwd (1970) → NIS → LDAP → Kerberos → SAML → OIDC → SPIFFE/SPIRE — the problem has always been the same; the trust boundary keeps moving outward

The Big Picture: From /etc/passwd to Zero Trust

1970s  /etc/passwd              ← trust: the local machine
       One machine, one user list

1984   NIS / Yellow Pages       ← trust: the local network
       Centralized, but cleartext, flat

1993   LDAP                     ← trust: the directory server
       Hierarchical, scalable, encrypted (eventually)

1988   Kerberos                 ← trust: the KDC
       Tickets instead of passwords, network-wide

2002   SAML                     ← trust: the IdP assertion
       Identity crosses the internet

2014   OIDC / OAuth2            ← trust: the JWT signature
       API-native, mobile-native, developer-native

2017   SPIFFE / SPIRE           ← trust: the workload certificate
       Automated identity for services, not humans

2026   Zero Trust               ← trust: nothing, verify everything
       Continuous verification, short-lived credentials,
       device posture, behavioral signals

EP01 of this series started with the chaos of per-machine /etc/passwd. This episode — EP13 — closes the loop: from that chaos to a model where identity is verified continuously, credentials expire in hours not years, and the network provides no implicit trust.

The Assumption That Zero Trust Rejects

Traditional security assumed: if you’re on the internal network, you’re trusted. A VPN user was treated as equivalent to someone at a desk in the office. A service running on the same Kubernetes node as another service was implicitly trusted.

That assumption broke in practice:

Compromised VPN credentials gave attackers full internal access
Lateral movement after initial compromise was easy — once inside, everything trusted you
Perimeter-based security had no visibility into east-west traffic (service-to-service)

Zero Trust inverts the model: the network provides no trust. Every access request is verified — user or service, internal or external, first request or hundredth. Trust is dynamic, contextual, and short-lived.

Human Zero Trust: Continuous Verification

For human users, Zero Trust extends OIDC and Conditional Access:

Short-lived tokens. Access tokens expire in 1 hour (OIDC standard). Refresh tokens are revocable. A user who is terminated can have their refresh tokens revoked in Entra ID — the next time their app tries to use the refresh token, it fails. The maximum blast radius of a stolen token is bounded by its lifetime.

Device posture. The device the user authenticates from is part of the identity assertion. Conditional Access can require: device is managed (Intune-enrolled), device is compliant (no malware, full-disk encryption enabled, OS patched). A valid user credential from an unmanaged device is denied.

Behavioral signals. Entra ID Identity Protection and similar systems analyze login patterns — unusual location, impossible travel (login from Mumbai, then New York 5 minutes later), unfamiliar device. High-risk sign-ins trigger step-up authentication or are blocked automatically.

Privileged Access Management (PAM). For privileged operations (production shell access, AD admin), Zero Trust adds time-bounded just-in-time access:

Request:  "I need admin access to db01.corp.com for 2 hours to investigate an incident"
Approval: Manager approves via Slack/email/ticketing system
Grant:    Temporary role assignment or password checkout from the PAM vault
Access:   User SSHes with a one-time or time-limited credential
Expire:   Credential automatically revoked after 2 hours
Audit:    Full session recording available for review

CyberArk, BeyondTrust, and HashiCorp Vault implement this model. Vault’s SSH Secrets Engine issues short-lived SSH certificates:

# Request a signed SSH certificate (valid 30 minutes)
vault ssh \
  -role=prod-admin \
  -mode=ca \
  -mount-point=ssh-client-signer \
  [email protected]

# Vault issues a certificate signed by the server's trusted CA
# sshd on db01 trusts that CA — no authorized_keys needed
# Certificate expires in 30 minutes — no cleanup required

Workload Identity: The Non-Human Problem

Services don’t have passwords they can type. A microservice calling another microservice needs to prove its identity — but you can’t give a Kubernetes pod a static API key (it’ll be in a config file, in a git repo, or in a crash dump within 6 months).

Workload identity solves this with short-lived, automatically rotated certificates — the service’s identity is its certificate, issued by a trusted CA, expiring in minutes to hours.

Traditional:                     Zero Trust:
  payments-svc → orders-svc        payments-svc → orders-svc
  Authentication: API key           Authentication: mTLS (X.509 cert)
  "Bearer sk_live_abc123"           cert: spiffe://corp.com/ns/prod/sa/payments-svc
  Rotation: manual (rarely done)    Rotation: automatic, every hour
  Revocation: change the key        Revocation: cert expires; new cert issued
  Audit: "API key was used"         Audit: "spiffe://payments-svc → spiffe://orders-svc"

SPIFFE: The Standard

SPIFFE (Secure Production Identity Framework For Everyone) defines what a workload identity looks like. The core concept is the SPIFFE ID — a URI in the format:

spiffe://<trust-domain>/<workload-path>

Examples:
  spiffe://corp.com/ns/prod/sa/payments-svc
  spiffe://corp.com/region/us-east/service/auth-api
  spiffe://corp.com/k8s/cluster-prod/namespace/payments/pod/payments-svc-abc123

The trust domain (corp.com) is the organizational boundary. The path is the workload identifier — typically encoding namespace, service account, or cluster information.

A SPIFFE ID is embedded in an SVID (SPIFFE Verifiable Identity Document) — either an X.509 certificate (X.509-SVID) or a JWT (JWT-SVID). The X.509-SVID is the standard form: the SPIFFE ID appears in the certificate’s Subject Alternative Name (SAN) field.

X.509 Certificate (SVID):
  Subject: CN=payments-svc
  SAN: URI=spiffe://corp.com/ns/prod/sa/payments-svc
  Validity: 1 hour
  Issuer: SPIRE Intermediate CA
  Signed by: corp.com trust bundle

Any service that has the corp.com trust bundle (the CA certificate chain) can verify that a certificate with spiffe://corp.com/... in the SAN was issued by the authorized CA for that trust domain.

SPIRE: The Runtime

SPIRE (SPIFFE Runtime Environment) is the reference implementation that issues SVIDs to workloads.

SPIRE Server
  ├── Node attestation: verifies the identity of the node/VM
  │   (AWS instance identity document, GCP service account, k8s node SA)
  └── Workload attestation: verifies the identity of the process
      (Kubernetes SA, Unix UID/GID, Docker container labels)
         │
         │ issues X.509 SVIDs (short-lived, auto-rotated)
         ▼
SPIRE Agent (runs on every node)
         │
         │ SPIFFE Workload API (Unix socket)
         ▼
Workload (your service)
  → gets its own certificate
  → gets the trust bundle (CA certs of trusted domains)
  → uses cert for mTLS with other services

The workload fetches its identity via the Workload API socket — no environment variables, no file mounts. The SPIRE Agent pushes new certificates before the old ones expire. Rotation is transparent to the workload.

# On a node with SPIRE Agent running:
# Fetch the SVID for the current workload
spire-agent api fetch x509 \
  -socketPath /run/spire/sockets/agent.sock

# Output shows:
# SPIFFE ID: spiffe://corp.com/ns/prod/sa/payments-svc
# Certificate: (PEM)
# Trust bundle: (PEM of issuing CA chain)
# Expires: 2026-04-27T02:00:00Z (1 hour from now)

mTLS: Both Sides Show ID

Mutual TLS (mTLS) is what makes SPIFFE useful operationally. In standard TLS, only the server presents a certificate — the client just verifies it. In mTLS, both sides present certificates. Both sides verify the other’s certificate against the trust bundle.

payments-svc → orders-svc connection:

TLS handshake:
  payments-svc presents: spiffe://corp.com/ns/prod/sa/payments-svc cert
  orders-svc presents:   spiffe://corp.com/ns/prod/sa/orders-svc cert

  Both verify:
    • cert signed by trusted CA (the corp.com SPIRE CA)
    • cert not expired
    • SPIFFE ID in SAN matches what's expected

  After handshake: encrypted channel, both sides verified
  Authorization: orders-svc checks its policy:
    "is spiffe://corp.com/ns/prod/sa/payments-svc allowed to call /api/orders?"

Service meshes (Istio, Linkerd, Consul Connect) implement mTLS transparently — the application doesn’t handle certificates; the sidecar proxy does. In Istio’s case, Citadel (now istiod) acts as the SPIFFE-compatible CA, issuing certificates to envoy sidecars. The application code doesn’t change.

Open Policy Agent: Authorization After Identity

Zero Trust separates identity from authorization. Once you know who the caller is (SPIFFE ID, OIDC token, user cert), a policy engine decides what they can do.

OPA (Open Policy Agent) is the standard for this:

# opa-policy.rego
package authz

# payments-svc can read orders; nothing else can write orders
allow {
  input.caller == "spiffe://corp.com/ns/prod/sa/payments-svc"
  input.method == "GET"
  startswith(input.path, "/api/orders")
}

default allow = false

The service checks OPA on each request: “caller=X wants to do Y to Z — allowed?” OPA evaluates the policy and returns a decision. The policy is version-controlled, tested, and deployed independently of the service.

⚠ Common Misconceptions

“Zero Trust means no trust.” Zero Trust means trust is earned dynamically through verification, not granted by network location. A verified user with a valid, compliant device and MFA is trusted — for the scope and duration of the verified session. The “zero” refers to implicit trust, not trust itself.

“SPIFFE replaces OIDC.” SPIFFE is for workload (service) identity. OIDC is for human (user) identity. They complement each other — a service has a SPIFFE identity; a user has an OIDC identity; the authorization layer accepts both.

“mTLS is complex to implement.” With a service mesh (Istio, Linkerd), mTLS is transparent — the sidecar handles it. Without a service mesh, the application needs to use the SPIFFE Workload API. The complexity is real but manageable, especially compared to the alternative of static API keys.

Framework Alignment

Domain	Relevance
CISSP Domain 5: Identity and Access Management	Zero Trust extends IAM to workloads (SPIFFE) and continuous verification (short-lived tokens, device posture) — it’s the current frontier of identity architecture
CISSP Domain 3: Security Architecture and Engineering	The separation of identity (SPIFFE ID), authentication (mTLS), and authorization (OPA) is a clean architectural decomposition that scales to complex multi-service environments
CISSP Domain 4: Communications and Network Security	mTLS encrypts and authenticates every service-to-service connection — it eliminates the assumption that east-west traffic on the internal network is safe
CISSP Domain 1: Security and Risk Management	Zero Trust is a risk management posture — it accepts that perimeter breach is inevitable and limits blast radius through continuous verification and least-privilege

Key Takeaways

Zero Trust rejects network-based implicit trust — every request is verified regardless of source
Human identity: short-lived OIDC tokens, device posture checks, Conditional Access, JIT privileged access (Vault, CyberArk)
Workload identity: SPIFFE IDs in X.509 certificates, issued by SPIRE, rotated automatically every hour — no static API keys
mTLS lets services verify each other’s identity at the TLS layer — service meshes (Istio, Linkerd) implement it transparently
OPA handles authorization after identity is established — who you are ≠ what you can do
The series arc: /etc/passwd → NIS → LDAP → Kerberos → SAML → OIDC → SPIFFE/SPIRE — the problem has always been “how do you know who someone is, at scale, without trusting the network?” The answer keeps getting better.

What does identity look like at your organization — still static API keys and shared service accounts, or moving toward SPIFFE and short-lived credentials? 👇

The Identity Stack: From LDAP to Zero Trust — 13 episodes complete.

Start from EP01: What Is LDAP →

Zero Trust Access in the Cloud: How the Evaluation Loop Actually Works

July 6, 2026April 20, 2026 by Vamshi Krishna Santhapuri

Reading Time: 10 minutes

What Is Cloud IAM → Authentication vs Authorization → IAM Roles vs Policies → AWS IAM Deep Dive → GCP Resource Hierarchy IAM → Azure RBAC Scopes → OIDC Workload Identity → AWS IAM Privilege Escalation → AWS Least Privilege Audit → SAML vs OIDC Federation → Kubernetes RBAC and AWS IAM → Zero Trust Access in the Cloud

TL;DR

Zero Trust: trust nothing implicitly, verify everything explicitly, minimize blast radius by assuming you will be breached
Network location is not identity — VPN is authentication for the tunnel, not authorization for the resource
JIT privilege elevation removes standing admin access: engineers request elevation for a specific purpose, scoped to a specific duration
Device posture is an access signal — a compromised endpoint with valid credentials is still a threat; Conditional Access gates on device compliance
Continuous session validation re-evaluates signals throughout the session — device falls out of compliance, sessions revoke in minutes, not at expiry
The highest-ROI early moves: eliminate machine static credentials, enforce MFA on all human access, federate to a single IdP

The Big Picture

  ZERO TRUST IAM — EVERY REQUEST EVALUATED INDEPENDENTLY

  API call arrives
         │
         ▼
  Identity verified? ──── No ────► DENY
         │
        Yes
         │
         ▼
  Device compliant? ───── No ────► DENY (or step-up MFA)
         │
        Yes
         │
         ▼
  Policy allows this  ─── No ────► DENY
  action on this ARN?
         │
        Yes
         │
         ▼
  Conditions met? ─────── No ────► DENY
  (time, IP, MFA age,              (e.g., outside business hours,
   risk score, session)             impossible travel detected)
         │
        Yes
         │
         ▼
       ALLOW ──────────────────────► LOG every decision (allow and deny)
         │
         └── Continuous re-evaluation:
             device state changes → revoke
             anomaly detected → revoke or step-up
             credential age → require re-auth

Introduction

The perimeter model of network security made a bet: inside the network is trusted, outside is not. Lock down the perimeter tightly enough and you’re safe. VPN in, and you’re one of us.

I grew up professionally in that model. Firewalls, DMZs, trusted zones. The idea had intuitive appeal — you build walls, you control what crosses them. For a while it worked reasonably well.

Then I watched it fail, repeatedly, in ways that were predictable in hindsight. An engineer’s laptop gets compromised at a coffee shop. They VPN in. Now the attacker is “inside.” A contractor account gets phished. They have valid Active Directory credentials. They’re inside. A cloud service gets misconfigured and exposes a management interface. There’s no perimeter for that to be inside of.

The perimeter model failed not because the walls weren’t strong enough, but because the premise was wrong. There is no inside. There is no perimeter that reliably separates trusted from untrusted. In a world of remote work, cloud services, contractor access, and API integrations, the attack surface doesn’t respect network boundaries.

Zero Trust is the architecture built on a different premise: trust nothing implicitly. Verify everything explicitly. Minimize blast radius by assuming you will be breached.

This isn’t a product you buy. It’s a set of principles applied to how you design, build, and operate your IAM. This episode is how those principles translate to concrete practices — building on everything we’ve covered in this series.

The Three Principles

Verify Explicitly

Every request must carry verifiable identity and context. Network location is not identity.

Old model: request from 10.0.0.0/8 → trusted, proceed
Zero Trust: request from 10.0.0.0/8 → still must present verifiable identity
                                       still must pass authorization check
                                       still must pass context evaluation
                                       then proceed (or deny)

In cloud IAM terms: every API call carries identity claims (IAM role ARN, federated identity, managed identity), and those claims are verified against policy on every single request. There’s no concept of “once authenticated, trusted until logout.” In cloud IAM, this already exists natively. Every API call is authenticated and authorized independently. The challenge is extending this model to internal services, internal APIs, and human access patterns.

Implementation in practice:
– mTLS for service-to-service communication — both sides present certificates; identity is the certificate, not the network path
– Bearer tokens on every internal API call — no session cookies, no “we’re on the same VPC so it’s fine”
– Short-lived credentials everywhere — a compromised credential expires, not “after the session times out in 8 hours”

Use Least Privilege — Just-in-Time, Just-Enough

No standing access to sensitive resources. Access granted when needed, for the minimum scope, for the minimum duration.

Old model: alice is in the DBA group → permanent access to all databases
Zero Trust: alice requests access to production DB →
            verified: alice's device is enrolled in MDM and compliant
            verified: alice has an open change ticket for this task
            verified: current time is within business hours
            granted: connection to this specific database, from alice's specific IP
                     for 2 hours, then revoked automatically

This is JIT access. It reduces the window where a compromised credential can cause damage. It requires a change in how engineers think about access: access is not a property you have, it’s something you request when you need it. The operational friction is a feature, not a bug. Justifying each elevated access request is what keeps the access model honest.

Assume Breach

Design systems as if the attacker is already inside. This drives different decisions:

Micro-segmentation: one role per service, minimum permissions per role. If one service is compromised, it can’t pivot to everything else.
Log everything: every authorization decision, allow or deny. When you’re investigating an incident, you need to know what happened, not just that something happened.
Automate response: anomalous API call pattern → trigger automated credential revocation or session termination. Don’t wait for a human to notice.

Building Zero Trust IAM — Block by Block

Block 1: Strong Identity Foundation

You can’t verify explicitly without strong authentication. The starting point:

# AWS: require MFA for all IAM operations — enforce via SCP across the org
{
  "Effect": "Deny",
  "Action": "*",
  "Resource": "*",
  "Condition": {
    "BoolIfExists": {
      "aws:MultiFactorAuthPresent": "false"
    },
    "StringNotLike": {
      "aws:PrincipalArn": [
        "arn:aws:iam::*:role/AWSServiceRole*",
        "arn:aws:iam::*:role/OrganizationAccountAccessRole"
      ]
    }
  }
}

# GCP: enforce OS Login for VM SSH (ties SSH access to Google identity, not SSH keys)
gcloud compute project-info add-metadata \
  --metadata enable-oslogin=TRUE

# This means: SSH to a VM requires your Google identity to have roles/compute.osLogin
# or roles/compute.osAdminLogin. No more managing ~/.authorized_keys files on instances.

For human access: hardware FIDO2 keys (YubiKey, Google Titan) rather than TOTP where possible. TOTP codes can be phished in real-time adversary-in-the-middle attacks. Hardware keys cannot — the cryptographic challenge-response is bound to the origin URL.

Block 2: Device Posture as an Access Signal

In a Zero Trust model, the identity of the user is necessary but not sufficient. The state of the device matters too — a compromised endpoint with valid credentials is still a threat.

# Azure Conditional Access: block access from non-compliant devices
# (configures in Entra ID Conditional Access portal)
conditions:
  clientAppTypes: [browser, mobileAppsAndDesktopClients]
  devices:
    deviceFilter:
      mode: exclude
      rule: "device.isCompliant -eq True and device.trustType -eq 'AzureAD'"
grantControls:
  builtInControls: [compliantDevice]

# AWS Verified Access: identity + device posture for application access — no VPN
aws ec2 create-verified-access-instance \
  --description "Zero Trust app access"

# Attach identity trust provider (Okta OIDC)
aws ec2 create-verified-access-trust-provider \
  --trust-provider-type user \
  --user-trust-provider-type oidc \
  --oidc-options IssuerURL=https://company.okta.com,ClientId=...,ClientSecret=...,Scope=openid

# Attach device trust provider (Jamf, Intune, or CrowdStrike)
aws ec2 create-verified-access-trust-provider \
  --trust-provider-type device \
  --device-trust-provider-type jamf \
  --device-options TenantId=JAMF_TENANT_ID

AWS Verified Access allows users to reach internal applications by verifying both their identity (via OIDC) and their device health (via MDM) — without a VPN. The access gateway evaluates both signals on every connection, not just at login.

Block 3: Just-in-Time Privilege Elevation

No standing elevated access. Engineers are eligible for elevated roles; they activate them when needed.

# Azure PIM: engineer activates an eligible privileged role
az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/roleManagement/directory/roleAssignmentScheduleRequests" \
  --body '{
    "action": "selfActivate",
    "principalId": "USER_OBJECT_ID",
    "roleDefinitionId": "ROLE_DEF_ID",
    "directoryScopeId": "/",
    "justification": "Investigating security alert in tenant — incident ticket INC-2026-0411",
    "scheduleInfo": {
      "startDateTime": "2026-04-11T09:00:00Z",
      "expiration": {"type": "AfterDuration", "duration": "PT4H"}
    }
  }'
# Access activates, lasts 4 hours, then automatically removed

# AWS: temporary account assignment via Identity Center
# (typically triggered by ITSM workflow integration, not manual CLI)
aws sso-admin create-account-assignment \
  --instance-arn "arn:aws:sso:::instance/ssoins-xxx" \
  --target-id ACCOUNT_ID \
  --target-type AWS_ACCOUNT \
  --permission-set-arn "arn:aws:sso:::permissionSet/ssoins-xxx/ps-yyy" \
  --principal-type USER \
  --principal-id USER_ID

# Schedule deletion (using EventBridge + Lambda in a real deployment)
aws sso-admin delete-account-assignment \
  --instance-arn "arn:aws:sso:::instance/ssoins-xxx" \
  --target-id ACCOUNT_ID \
  --target-type AWS_ACCOUNT \
  --permission-set-arn "arn:aws:sso:::permissionSet/ssoins-xxx/ps-yyy" \
  --principal-type USER \
  --principal-id USER_ID

The operational change this requires: engineers stop thinking of access as something they hold permanently and start thinking of it as something they request for a specific purpose.

This feels like friction until you’re investigating an incident and you have a precise record of who activated what elevated access and why.

Block 4: Continuous Session Validation

Traditional auth: verify once at login, trust the session until timeout.
Zero Trust auth: re-evaluate access signals continuously throughout the session.

Session starts: identity verified + device compliant + IP in expected range
                → access granted

15 minutes later: impossible travel detected (IP changes to different country)
                  → step-up authentication required, or session terminated

Later: device compliance state changes (EDR detects malware)
       → all active sessions for this device revoked immediately

This requires integration between your identity platform and your device management / EDR tooling. Entra ID Conditional Access with Continuous Access Evaluation (CAE) implements this natively. When certain events occur — device compliance change, IP anomaly, token revocation — access tokens are invalidated within minutes rather than waiting for natural expiry.

// GCP: bind IAM access to an Access Context Manager access level
// Access level enforces device compliance — if device falls out of compliance,
// the access level is no longer satisfied and requests fail immediately
gcloud projects add-iam-policy-binding my-project \
  --member="user:[email protected]" \
  --role="roles/bigquery.admin" \
  --condition="expression=request.auth.access_levels.exists(x, x == 'accessPolicies/POLICY_NUM/accessLevels/corporate_compliant_device'),title=Compliant device required"

Block 5: Micro-Segmented Permissions

Every service has its own identity. Every identity has only what it needs. Compromise of one service cannot propagate to others.

# Terraform: IAM as code — each service gets a dedicated, scoped role
resource "aws_iam_role" "order_processor" {
  name                 = "svc-order-processor"
  permissions_boundary = aws_iam_policy.service_boundary.arn

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "order_processor" {
  name   = "order-processor-policy"
  role   = aws_iam_role.order_processor.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["sqs:ReceiveMessage", "sqs:DeleteMessage", "sqs:GetQueueAttributes"]
        Resource = aws_sqs_queue.orders.arn
      },
      {
        Effect   = "Allow"
        Action   = ["dynamodb:PutItem", "dynamodb:GetItem", "dynamodb:UpdateItem"]
        Resource = aws_dynamodb_table.orders.arn
      }
    ]
  })
}

# Open Policy Agent: enforce IAM standards at the policy level
# Run this in CI/CD — fail the build if any policy statement has wildcard actions
package iam.policy

deny[msg] {
  input.Statement[i].Effect == "Allow"
  input.Statement[i].Action == "*"
  msg := sprintf("Statement %d has wildcard Action — not allowed", [i])
}

deny[msg] {
  input.Statement[i].Effect == "Allow"
  input.Statement[i].Resource == "*"
  endswith(input.Statement[i].Action, "Delete")
  msg := sprintf("Statement %d allows Delete on all resources — requires specific ARN", [i])
}

Block 6: Universal Audit Trail

Zero Trust without logging is just obscurity. Every authorization decision — allow and deny — must be logged, retained, and queryable.

# AWS: verify CloudTrail is comprehensive
aws cloudtrail get-trail-status --name management-trail
# Must have: LoggingEnabled=true, IsMultiRegionTrail=true, IncludeGlobalServiceEvents=true

# Verify no management events are excluded
aws cloudtrail get-event-selectors --trail-name management-trail \
  | jq '.EventSelectors[] | {ReadWrite: .ReadWriteType, Mgmt: .IncludeManagementEvents}'
# ReadWriteType should be "All"; IncludeManagementEvents should be true

# GCP: ensure Data Access audit logs are enabled for IAM
gcloud projects get-iam-policy my-project --format=json | jq '.auditConfigs'
# Should see auditLogConfigs for cloudresourcemanager.googleapis.com and iam.googleapis.com
# with both DATA_READ and DATA_WRITE enabled

# Azure: route Entra ID logs to Log Analytics for long-term retention and querying
az monitor diagnostic-settings create \
  --name entra-audit-to-la \
  --resource "/tenants/TENANT_ID/providers/microsoft.aad/domains/company.com" \
  --logs '[{"category":"AuditLogs","enabled":true},{"category":"SignInLogs","enabled":true}]' \
  --workspace /subscriptions/SUB_ID/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/security-logs

Framework Alignment

Zero Trust IAM isn’t a framework itself — it’s a design philosophy. But it maps cleanly onto the controls that compliance frameworks are pushing organizations toward:

Framework	Reference	What It Covers Here
CISSP	Domain 5 — IAM	Zero Trust reframes IAM as continuous, context-aware verification rather than perimeter-based trust
CISSP	Domain 1 — Security & Risk Management	Assume breach as a risk management posture; blast radius minimization through least privilege
CISSP	Domain 7 — Security Operations	Continuous monitoring, anomaly detection, and automated response are operational requirements of Zero Trust
ISO 27001:2022	5.15 Access control	Zero Trust access policy: verify explicitly, least privilege, assume breach
ISO 27001:2022	8.16 Monitoring activities	Continuous session validation and universal audit trail — all authorization decisions logged
ISO 27001:2022	8.20 Networks security	Micro-segmentation and mTLS replace implicit network trust with verified identity at every hop
ISO 27001:2022	5.23 Information security for cloud services	Zero Trust architecture applied to cloud IAM across AWS, GCP, and Azure
SOC 2	CC6.1	Zero Trust logical access controls — JIT, device posture, context-aware authorization
SOC 2	CC6.7	Continuous session validation and transmission controls across all system components
SOC 2	CC7.1	Threat detection through universal audit trails and anomaly-triggered automated response
SOC 2	CC7.2	Incident response — automated revocation and session termination on anomaly detection

Zero Trust Maturity — Where to Start

In practice, most organizations think about Zero Trust as a destination — a large, multi-year program. The reality is it’s a direction. Any movement in that direction reduces risk.

Level	Where You Are	What to Build Next
1 — Initial	Some MFA; static credentials for machines; no centralized IdP	Eliminate machine static keys → workload identity
2 — Managed	Centralized IdP; SSO for most systems; some MFA enforcement	Close SSO gaps; enforce MFA everywhere; federate to cloud
3 — Defined	Least privilege being enforced; audit tooling in use; JIT for some privileged access	Expand JIT; policy-as-code in CI/CD; quarterly access reviews
4 — Contextual	Device posture in access decisions; conditional access policies	Continuous session evaluation; automated anomaly response
5 — Optimizing	Policy-as-code everywhere; automated right-sizing; anomaly-triggered revocation	Refine and maintain — Zero Trust is never “done”

The jump from Level 1 to Level 3 delivers the most security value per unit of effort. Start there. Don’t defer least privilege enforcement while you build a sophisticated device posture integration.

The Practical Sequence

If you’re building Zero Trust IAM from where most organizations are, this is the order that maximizes early security value:

Inventory all identities — human and machine. You cannot secure what you can’t see. Build a complete picture before changing anything.
Eliminate static credentials for machines — replace access keys and SA key files with workload identity. This is the highest-ROI change in most environments.
Enforce MFA for all human access — especially cloud consoles, IdP admin, and VPN. Hardware keys for privileged accounts.
Federate human identity — single IdP, SSO to cloud and major applications. Centralize the revocation path.
Right-size IAM permissions — use last-accessed data and IAM Recommender to find and remove unused permissions. This is a continuous discipline, not a one-time clean-up.
JIT for privileged access — Azure PIM, AWS Identity Center assignment automation, or equivalent for all elevated roles. No standing admin.
IAM as code — all IAM changes via Terraform/Pulumi/CDK, reviewed in pull requests, validated by Access Analyzer or OPA in CI/CD, applied through automation.
Continuous monitoring — alerts on IAM mutations, anomalous API call patterns, new cross-account trust relationships, new public resource exposures.
Add context signals — Conditional Access policies incorporating device posture. Access Context Manager in GCP. AWS Verified Access for application access.
Automated response — anomaly detected → automatic credential suspension or session termination. Close the window between detection and containment.

Core Curriculum Complete

These twelve episodes covered Cloud IAM from the question “what even is IAM?” to Zero Trust architecture:

Episode	Topic	The Core Lesson
EP01	What is IAM?	Access management is deny-by-default; every grant is an explicit decision
EP02	AuthN vs AuthZ	Two separate gates; passing one doesn’t open the other
EP03	Roles, Policies, Permissions	Structure prevents drift; wildcards accumulate into exposure
EP04	AWS IAM Deep Dive	Trust policies and permission policies are both required; the evaluation chain has six layers
EP05	GCP IAM Deep Dive	Hierarchy inheritance is a feature that needs careful handling; service account keys are an antipattern
EP06	Azure RBAC and Entra ID	Two separate authorization planes; managed identities are the right model for workloads
EP07	Workload Identity	Static credentials for machines are solvable at the root; OIDC token exchange replaces them
EP08	IAM Attack Paths	The attack chain runs through IAM; `iam:PassRole` and its equivalents are privilege escalation primitives
EP09	Least Privilege Auditing	5% utilization is the average; the 95% excess is attack surface — and it’s measurable
EP10	Federation, OIDC, SAML	The IdP is the trust anchor; everything downstream is bounded by its security
EP11	Kubernetes RBAC	Two separate IAM layers; both must be secured; `cluster-admin` is the first thing to audit
EP12	Zero Trust IAM	Trust nothing implicitly; verify everything explicitly; minimize blast radius through least privilege at every layer

IAM is not a feature you configure. It’s a practice you maintain. The organizations that operate with genuinely low cloud IAM risk don’t have fewer identities — they have better visibility into what those identities can do, and why, and what happened when something went wrong.

That’s the foundation this series has been building toward — but the curriculum doesn’t stop here. AWS, GCP, and Azure ship new services constantly, and every new service ships new IAM permissions. Scoping a policy correctly on day one is a different skill than auditing it after the fact — that’s where this series goes next.

The full series is at linuxcent.com/cloud-iam-series. Subscribe to get new Cloud IAM episodes as new cloud permissions land — plus the eBPF series running in parallel, covering what’s actually running in kernel space when Cilium, Falco, and Tetragon do their work.

Subscribe → linuxcent.com/subscribe

SAML vs OIDC: Which Federation Protocol Belongs in Your Cloud?

May 10, 2026April 20, 2026 by Vamshi Krishna Santhapuri

Reading Time: 10 minutes

TL;DR

Federation means downstream systems trust the IdP’s signed assertion — they never see credentials and don’t manage them independently
SAML is XML-based, browser-oriented, the enterprise standard; OIDC is JWT-based, API-native, the modern protocol for workload identity and consumer SSO
In OIDC trust policies, the sub condition is the security boundary — omitting it means any GitHub Actions workflow in any repository can assume your role
Validate all JWT claims: signature, iss, aud, exp, sub — libraries do this, but need correct configuration (especially aud)
The IdP is the trust anchor: compromise the IdP and every downstream system is compromised. Treat IdP admin access with the same controls as your most sensitive system.
JIT provisioning and Conditional Access extend federation from “who are you” to “are you in an appropriate context right now”

The Big Picture

  FEDERATION: HOW TRUST FLOWS FROM IdP TO DOWNSTREAM SYSTEMS

  Identity Provider  (Okta / Entra ID / Google / AD FS)
  ┌──────────────────────────────────────────────────────────────────┐
  │  User or workload authenticates → IdP issues signed assertion   │
  │                                                                  │
  │  ┌──────────────────────────┐  ┌───────────────────────────┐   │
  │  │  SAML Assertion (XML)    │  │  OIDC ID Token (JWT)       │   │
  │  │  RSA-signed, 5–10 min    │  │  RS256-signed, ~1 hr      │   │
  │  │  Audience: SP entity ID  │  │  aud: client ID           │   │
  │  │  Subject: user identity  │  │  sub: specific workload   │   │
  │  └───────────┬──────────────┘  └──────────┬────────────────┘   │
  └─────────────────────────────────────────────────────────────────┘
                 │  human SSO                  │  workload identity
                 ▼                             ▼
  ┌─────────────────────────┐  ┌───────────────────────────────────┐
  │ SP validates signature  │  │ AWS STS / GCP STS validates       │
  │ + audience + timestamp  │  │ signature + iss + aud + sub       │
  │ → console session       │  │ → AssumeRoleWithWebIdentity       │
  └─────────────────────────┘  └───────────────────────────────────┘

  Security bound: IdP security bounds every system that trusts it
  Disable in Okta → access revoked everywhere that trusts Okta

Introduction

Before federation existed, every system had its own user database. Your Jira account. Your AWS account. Your Salesforce account. Your internal wiki. Each one had its own password, its own MFA, its own offboarding process. When an engineer joined, someone had to create accounts in every system. When they left, you hoped whoever processed the offboarding remembered to deactivate all of them.

I’ve done that audit — the one where you’re trying to figure out if a former employee still has access to anything. You go system by system, cross-reference against HR records, find accounts that exist in places you’ve forgotten the company even uses. In one environment I found an ex-engineer’s account still active in a vendor portal six months after they left, because that system was set up by someone who had since also left the company, and nobody had documented it.

Federation solves this structurally. One identity provider. One place to authenticate. One place to revoke. Every downstream system trusts the IdP’s assertion rather than managing credentials independently. Disable someone in Okta and they lose access everywhere that trusts Okta — immediately, without a checklist.

This episode is how federation actually works at the protocol level, because understanding the mechanism is what lets you design it securely. A federation setup with a trust policy that accepts assertions from any OIDC issuer is worse than no federation — it’s a false sense of security.

The Federation Model

Identity Provider (IdP)          Service Provider (SP) / Relying Party
  (Okta, Google, AD FS, Entra ID)       (AWS, Salesforce, GitHub, your app)
         │                                          │
         │  1. User authenticates to IdP             │
         │     (password + MFA)                      │
         │                                          │
         │  2. IdP generates a signed assertion      │
         │     (SAML response or OIDC ID Token)      │
         │ ──────────────────────────────────────── ▶│
         │                                          │
         │  3. SP validates the signature            │
         │     (using IdP's public certificate       │
         │      or JWKS endpoint)                    │
         │  4. SP maps identity to local permissions │
         │  5. SP grants access                      │

The SP never sees the user’s password. It never has one. It trusts the IdP’s cryptographic signature — if the assertion is signed with the IdP’s private key, and the SP trusts that key, the identity is accepted.

This trust chain has one critical property: the security of every SP is bounded by the security of the IdP. Compromise the IdP, and every system that trusts it is compromised. This is why IdP security deserves the same attention as the most sensitive system it gates access to.

SAML 2.0 — The Enterprise Standard

SAML (Security Assertion Markup Language) is XML-based, verbose, and battle-tested. Published in 2005, it’s the protocol behind most enterprise SSO deployments. When your company says “use your corporate login for this vendor app,” SAML is usually the mechanism.

1. User visits AWS console (the Service Provider)
2. AWS checks: no active session → redirect to IdP
   → https://company.okta.com/saml?SAMLRequest=...
3. Okta authenticates the user (password, MFA)
4. Okta generates a SAML Assertion — a signed XML document containing:
   - Who the user is (Subject, typically email)
   - Their attributes (group memberships, custom attributes)
   - When the assertion was issued and when it expires (valid 5-10 minutes typically)
   - Which SP this is for (Audience restriction)
   - Okta's digital signature (RSA-SHA256 or similar)
5. Browser POSTs the assertion to AWS's ACS (Assertion Consumer Service) URL
6. AWS validates the signature against Okta's public cert (retrieved from Okta's metadata URL)
7. AWS reads the SAML attribute for the IAM role
8. AWS calls sts:AssumeRoleWithSAML → issues temporary credentials
9. User gets a console session — no AWS credentials were ever stored anywhere

What a SAML Assertion Actually Looks Like

<saml:Assertion>
  <saml:Issuer>https://okta.company.com</saml:Issuer>

  <saml:Subject>
    <saml:NameID>[email protected]</saml:NameID>
  </saml:Subject>

  <saml:AttributeStatement>
    <!-- This attribute tells AWS which IAM role to assume -->
    <saml:Attribute Name="https://aws.amazon.com/SAML/Attributes/Role">
      <saml:AttributeValue>
        arn:aws:iam::123456789012:role/EngineerRole,arn:aws:iam::123456789012:saml-provider/OktaProvider
      </saml:AttributeValue>
    </saml:Attribute>
  </saml:AttributeStatement>

  <!-- Critical: time bounds on this assertion -->
  <saml:Conditions NotBefore="2026-04-11T09:00:00Z" NotOnOrAfter="2026-04-11T09:05:00Z">
    <saml:AudienceRestriction>
      <!-- Critical: this assertion is ONLY valid for AWS -->
      <saml:Audience>https://signin.aws.amazon.com/saml</saml:Audience>
    </saml:AudienceRestriction>
  </saml:Conditions>

  <ds:Signature>... RSA-SHA256 signature over the above ...</ds:Signature>
</saml:Assertion>

The Audience restriction and the NotOnOrAfter timestamp are two of the most security-critical fields. The audience ensures this assertion can’t be reused for a different SP. The timestamp ensures it can’t be replayed after expiry.

Setting Up SAML Federation with AWS

# Register Okta as a SAML provider in AWS IAM
aws iam create-saml-provider \
  --saml-metadata-document file://okta-metadata.xml \
  --name OktaProvider

# Create the IAM role that federated users will assume
aws iam create-role \
  --role-name EngineerRole \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:saml-provider/OktaProvider"
      },
      "Action": "sts:AssumeRoleWithSAML",
      "Condition": {
        "StringEquals": {
          "SAML:aud": "https://signin.aws.amazon.com/saml"
        }
      }
    }]
  }'

# In Okta: configure the AWS IAM Identity Center app
# Attribute mapping: https://aws.amazon.com/SAML/Attributes/Role
# Value: arn:aws:iam::123456789012:role/EngineerRole,arn:aws:iam::123456789012:saml-provider/OktaProvider

# Set maximum session duration (8 hours is reasonable for human access)
aws iam update-role \
  --role-name EngineerRole \
  --max-session-duration 28800

SAML Attack Surface

Attack	What It Does	Why It Works	Prevention
XML Signature Wrapping (XSW)	Attacker inserts a malicious assertion, wraps it around the legitimate signed one; some SPs validate the wrong element	SAML’s XML structure is complex; naive signature validation checks the signed element, not the element the SP reads	Use a vetted SAML library — never hand-roll parsing
Assertion replay	Steal a valid assertion (e.g., via network intercept) and replay it before `NotOnOrAfter`	If the SP doesn’t track used assertion IDs, the same assertion can be used multiple times	Short expiry; SP tracks seen assertion IDs
Audience bypass	SP doesn’t verify the `Audience` field	An assertion issued for SP A can be used at SP B	Always validate `Audience` matches your SP entity ID

XML Signature Wrapping is the most interesting attack historically — it was how security researchers demonstrated SAML implementations in AWS, Google, and others could be bypassed before vendors patched their libraries. The lesson: SAML is complex enough that rolling your own parser is asking for a vulnerability.

OpenID Connect (OIDC) — The Modern Protocol

OIDC is JSON-based, REST-native, and designed for the web and API-first world. Built on top of OAuth 2.0, it’s the protocol behind “Sign in with Google,” GitHub’s OIDC tokens for Actions, and workload identity federation across cloud providers.

Token Anatomy

An OIDC ID Token is a JWT — three base64-encoded parts separated by dots:

Header.Payload.Signature

Header:
{
  "alg": "RS256",           ← signing algorithm
  "kid": "key-id-123"       ← which key signed this (for JWKS rotation)
}

Payload (the claims):
{
  "iss": "https://accounts.google.com",         ← who issued this token
  "sub": "108378629573454321234",               ← stable user identifier (not email)
  "aud": "my-app-client-id",                   ← who this token is for
  "exp": 1749600000,                           ← expires at (Unix timestamp)
  "iat": 1749596400,                           ← issued at
  "email": "[email protected]",
  "email_verified": true,
  "hd": "company.com"                          ← hosted domain (Google Workspace)
}

Signature: RSA-SHA256(base64(header) + "." + base64(payload), idp_private_key)

The relying party (your application, or AWS STS) validates the signature using the IdP’s public keys — available at the JWKS endpoint (/.well-known/jwks.json). The signature verification proves the token was issued by the expected IdP and hasn’t been tampered with since.

The Full OIDC Token Exchange (GitHub Actions → AWS)

# GitHub Actions automatically provides an OIDC token in the runner environment
# The token contains: iss=token.actions.githubusercontent.com, repo, ref, sha, run_id, etc.

# Step 1: Fetch the OIDC token from GitHub's token service
TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
  "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" | jq -r '.value')

# Step 2: Present to AWS STS for exchange
aws sts assume-role-with-web-identity \
  --role-arn arn:aws:iam::123456789012:role/GitHubActionsRole \
  --role-session-name github-deploy \
  --web-identity-token "${TOKEN}"

# STS performs these validations:
# 1. Fetch GitHub's JWKS: https://token.actions.githubusercontent.com/.well-known/jwks
# 2. Verify signature is valid
# 3. Verify iss = "token.actions.githubusercontent.com" (matches OIDC provider)
# 4. Verify aud = "sts.amazonaws.com"
# 5. Verify sub matches the trust policy condition
# 6. Verify exp is in the future

The trust policy condition on the IAM role is what prevents any GitHub repository from assuming this role:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
        "token.actions.githubusercontent.com:sub": "repo:my-org/my-repo:ref:refs/heads/main"
      }
    }
  }]
}

The sub condition is the security boundary. repo:my-org/my-repo:ref:refs/heads/main means: only runs triggered from the main branch of my-org/my-repo can assume this role. A pull request from a fork, a run from a different repo, or a run from a different branch — all get a different sub claim and the assumption fails.

I’ve reviewed trust policies that omit the sub condition and just check aud. That means any GitHub Actions workflow — in any repository, owned by anyone — can assume that role. That’s not a misconfiguration to be theoretical about: public GitHub repositories exist, and they can trigger GitHub Actions.

OIDC Validation Checklist

Every application that validates OIDC tokens must check all of these:

✓ Signature valid (using IdP's JWKS endpoint — not a hardcoded key)
✓ iss matches the expected IdP URL
✓ aud matches your application's client ID (not just "any audience")
✓ exp is in the future
✓ nbf (not before), if present, is in the past
✓ iat is recent (within your clock skew tolerance)
✓ For workload identity: sub is pinned to the specific workload

Skipping aud validation is the most common mistake. A token issued for application A with aud: app-a-client-id should not be accepted by application B. Without audience validation, any application in your system that can obtain a token for the IdP can reuse it at any other application. Libraries like python-jose and jsonwebtoken validate aud by default — but they need to be configured with the expected audience value.

Enterprise Federation Patterns

Multi-Account AWS with IAM Identity Center + Okta

The pattern I deploy in every multi-account AWS environment:

Okta (IdP)
  └── IAM Identity Center
        ├── Account: prod     → Permission Sets: ReadOnly, DevOps
        ├── Account: staging  → Permission Sets: Developer  
        ├── Account: shared   → Permission Sets: NetworkAdmin, SecurityAudit
        └── Account: sandbox  → Permission Sets: Admin (sandbox only)

# Engineers access accounts through Identity Center portal
aws configure sso
# Prompts: SSO start URL, region, account, role

aws sso login --profile prod-readonly

# List available accounts and roles (useful for tooling and scripts)
aws sso list-accounts --access-token "${TOKEN}"
aws sso list-account-roles --access-token "${TOKEN}" --account-id "${ACCOUNT_ID}"

# Get temporary credentials for a specific account/role
aws sso get-role-credentials \
  --account-id "${ACCOUNT_ID}" \
  --role-name ReadOnly \
  --access-token "${TOKEN}"

When an engineer is offboarded from Okta, they lose access to every AWS account immediately. No individual IAM user deletion across 20 accounts. No access key hunting. One action in Okta, complete revocation.

Just-in-Time (JIT) Provisioning

Rather than creating user accounts in every downstream system ahead of time, JIT provisioning creates accounts on first login:

User authenticates to IdP
SAML/OIDC assertion includes group memberships and attributes
SP receives assertion, checks if a user account exists for this sub
If not: create the account with attributes from the assertion
Grant access based on group claims
On subsequent logins: update the account’s attributes if claims changed

The security property: when a user is disabled in the IdP, their account in downstream systems becomes inaccessible even if the account object still exists. There’s nothing to log in with. JIT accounts don’t survive IdP deletion — they’re inactive shells that produce no risk.

The IdP Is the Trust Anchor — Protect It Accordingly

The entire security of a federated system is bounded by the security of the IdP. If an attacker can log into Okta as an admin, they can issue valid SAML assertions for any user, for any role, to any SP that trusts Okta. Every downstream system is compromised simultaneously.

This is not theoretical. In the 2023 Caesars and MGM Resorts attacks, initial access was achieved through social engineering against identity provider support — not through technical exploitation of cloud infrastructure. Once identity infrastructure is compromised, everything downstream follows.

What this means practically:

MFA for all IdP admin accounts — hardware FIDO2 keys, not TOTP. TOTP codes can be phished in real-time. Hardware keys cannot.
PIM / JIT access for IdP configuration changes — no standing admin access
Separate monitoring and alerting for IdP admin activity
Audit who can modify SAML/OIDC configurations and attribute mappings in the IdP — these are the levers for privilege escalation
Narrow audience restrictions — configure which SPs can receive assertions; don’t create a wildcard IdP configuration that serves all SPs

Conditional Access — Adding Context to Federation

Modern IdPs support Conditional Access policies that restrict when assertions are issued:

// Entra ID Conditional Access: require MFA + compliant device for AWS access
{
  "conditions": {
    "applications": {
      "includeApplications": ["AWS-Application-ID-in-Entra"]
    },
    "users": {
      "includeGroups": ["all-employees"]
    },
    "locations": {
      "excludeLocations": ["NamedLocation-CorporateNetwork"]
    }
  },
  "grantControls": {
    "operator": "AND",
    "builtInControls": ["mfa", "compliantDevice"]
  }
}

This policy: when an employee accesses AWS from outside the corporate network, they must use MFA on a device that MDM has verified as compliant. From inside the network, the policy still applies but the named location exclusion can relax certain requirements.

Conditional Access is how you move beyond “authenticated to IdP” as the only gate. Device health, network location, risk score — these become inputs to the access decision.

Framework Alignment

Framework	Reference	What It Covers Here
CISSP	Domain 5 — Identity and Access Management	Federation is the mechanism for extending identity trust across organizational boundaries
CISSP	Domain 3 — Security Architecture	Trust relationships must be explicitly designed; overly broad federation trust is an architectural failure
ISO 27001:2022	5.19 Information security in supplier relationships	Federation with third-party IdPs and SPs establishes a cross-organizational trust boundary that must be governed
ISO 27001:2022	8.5 Secure authentication	SAML and OIDC are the secure authentication protocols for federated access — token validation requirements
ISO 27001:2022	5.17 Authentication information	Credential lifecycle in federated systems — no passwords distributed to SPs; IdP manages authentication
SOC 2	CC6.1	Federated identity is the access control mechanism for human access to cloud environments in CC6.1
SOC 2	CC6.6	Logical access from outside system boundaries — federation with external IdPs and partner organizations

Key Takeaways

Federation means downstream systems trust the IdP’s signed assertion — they never see credentials and don’t need to manage them independently
SAML is XML-based, browser-oriented, widely supported for enterprise SSO; OIDC is JWT-based, API-friendly, the protocol for modern workload identity and consumer SSO
In OIDC, the sub condition in trust policies is what prevents any workload from assuming any role — omitting it is a critical misconfiguration
Validate all JWT claims: signature, iss, aud, exp, sub — libraries do this, but they need correct configuration
The IdP is the trust anchor — its security posture bounds the security of every system that trusts it. Treat IdP admin access with the same controls as your most sensitive systems.
JIT provisioning and Conditional Access extend federation from “who are you” to “are you in an appropriate context right now”

What’s Next

EP11 brings this into Kubernetes — RBAC, service account tokens, and how the Kubernetes authorization layer interacts with cloud IAM. Two separate systems, both requiring security. A gap in either becomes a gap in both.

Next: Kubernetes RBAC and AWS IAM

Get EP11 in your inbox when it publishes → linuxcent.com/subscribe

Introduction

TL;DR

The Big Picture: From Castle-and-Moat to Zero Trust

1. The Extended CIA Triad Deep-Dive

Confidentiality (Protecting the Data)

Integrity (Trusting the Data)

Availability (Ensuring Access)

Authenticity & Non-Repudiation (The “Extended” Pillars)

2. Core Architecture Principles: Defense-in-Depth

3. Dismantling the Castle-and-Moat (Zero Trust)

The Three Pillars of ZTA (NIST SP 800-207)

4. Integration with the Software Lifecycle (SDLC)

The “Shift Left” Myth

Quick Check: Is Your Architecture “Leaky”?

Production Gotchas

Framework Alignment

Key Takeaways

What’s Next

TL;DR

The Big Picture: From /etc/passwd to Zero Trust

The Assumption That Zero Trust Rejects

Human Zero Trust: Continuous Verification

Workload Identity: The Non-Human Problem

SPIFFE: The Standard

SPIRE: The Runtime

mTLS: Both Sides Show ID

Open Policy Agent: Authorization After Identity

⚠ Common Misconceptions

Framework Alignment

Key Takeaways

TL;DR

The Big Picture

Introduction

The Three Principles

Verify Explicitly

Use Least Privilege — Just-in-Time, Just-Enough

Assume Breach

Building Zero Trust IAM — Block by Block

Block 1: Strong Identity Foundation

Block 2: Device Posture as an Access Signal

Block 3: Just-in-Time Privilege Elevation

Block 4: Continuous Session Validation

Block 5: Micro-Segmented Permissions

Block 6: Universal Audit Trail

Framework Alignment

Zero Trust Maturity — Where to Start

The Practical Sequence

Core Curriculum Complete

TL;DR

The Big Picture

Introduction

The Federation Model

SAML 2.0 — The Enterprise Standard

How a SAML Login Flows

What a SAML Assertion Actually Looks Like

Setting Up SAML Federation with AWS

SAML Attack Surface

OpenID Connect (OIDC) — The Modern Protocol

Token Anatomy

The Full OIDC Token Exchange (GitHub Actions → AWS)

OIDC Validation Checklist

Enterprise Federation Patterns

Multi-Account AWS with IAM Identity Center + Okta

Just-in-Time (JIT) Provisioning

The IdP Is the Trust Anchor — Protect It Accordingly

Conditional Access — Adding Context to Federation

Framework Alignment

Key Takeaways

What’s Next