SAML vs OIDC vs OAuth2: Which Protocol Handles Which Identity Problem

Reading Time: 6 minutes

The Identity Stack, Episode 10
EP09: Active DirectoryEP10EP11: Identity Providers → …

Focus Keyphrase: SAML vs OIDC explained
Search Intent: Investigational
Meta Description: SAML, OAuth2, and OIDC solve different problems and are often confused. Here’s what each protocol does, when to use it, and how a browser SSO login actually works. (163 chars)


TL;DR

  • SAML 2.0 is a federation protocol for browser-based SSO — an IdP issues a signed XML assertion that a Service Provider trusts; designed for enterprise applications
  • OAuth2 is an authorization delegation protocol, not authentication — it lets an application act on your behalf without knowing your password; the access token says what, not who
  • OIDC (OpenID Connect) = OAuth2 + an identity layer — adds the id_token (a JWT containing who you are) on top of OAuth2’s access_token (what you can do)
  • SAML vs OIDC: SAML is XML, enterprise-native, stateful; OIDC is JSON/JWT, API-native, stateless — new applications almost always use OIDC
  • The id_token is a JWT — decode it at jwt.io and read every claim — it tells you exactly what the IdP asserts about the user
  • The browser SSO flow is three redirects: user → SP → IdP (authenticate) → SP (consume assertion)

The Problem: LDAP and Kerberos Don’t Cross the Internet

EP09 showed how authentication works inside a corporate network. LDAP and Kerberos both assume network proximity to the directory server — firewall-friendly ports don’t help when the authentication protocol requires a direct connection to the KDC or directory.

Internal network: works
  Browser → intranet app → LDAP/Kerberos → AD DC (all on 10.0.0.0/8)

Internet: breaks
  Browser → SaaS app (AWS) → LDAP/Kerberos → AD DC (on-prem behind firewall)
  ✗ KDC not reachable across NAT
  ✗ LDAP not exposed to internet (shouldn't be)
  ✗ Every SaaS app can't have its own LDAP connection to your DC

SAML was invented in 2002 to solve this. OIDC in 2014. Both let identity assertions travel over HTTPS — the one protocol that crosses every firewall.


SAML 2.0: Enterprise Browser SSO

SAML 2.0 has three actors: the User, the Identity Provider (IdP), and the Service Provider (SP).

1. User visits SP (e.g., Salesforce)
   SP: "I don't know this user — send them to the IdP"
   ↓  HTTP redirect with SAMLRequest (base64-encoded AuthnRequest)

2. User arrives at IdP (e.g., Okta, AD FS, Entra ID)
   IdP: "Authenticate me" → user enters credentials
   IdP: generates a signed SAML Assertion (XML)
   ↓  HTTP POST to SP's Assertion Consumer Service (ACS) URL

3. SP receives the SAMLResponse
   SP: verifies the signature using IdP's public key
   SP: extracts user attributes from the Assertion
   SP: creates a session — user is logged in

The SAML Assertion is an XML document signed by the IdP. It contains:

<saml:Assertion>
  <saml:Issuer>https://idp.corp.com</saml:Issuer>
  <saml:Subject>
    <saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">
      [email protected]
    </saml:NameID>
  </saml:Subject>
  <saml:Conditions
    NotBefore="2026-04-27T01:00:00Z"
    NotOnOrAfter="2026-04-27T01:05:00Z">  ← short-lived: replay protection
  </saml:Conditions>
  <saml:AttributeStatement>
    <saml:Attribute Name="email">
      <saml:AttributeValue>[email protected]</saml:AttributeValue>
    </saml:Attribute>
    <saml:Attribute Name="groups">
      <saml:AttributeValue>engineers</saml:AttributeValue>
      <saml:AttributeValue>sre-team</saml:AttributeValue>
    </saml:Attribute>
  </saml:AttributeStatement>
</saml:Assertion>

The SP trusts the assertion because it’s signed with the IdP’s private key, and the SP has the IdP’s public certificate configured. No direct connection between SP and IdP needed during authentication — only the browser carries the assertion.

SP-initiated vs IdP-initiated:
– SP-initiated: user visits the SP, gets redirected to IdP, authenticates, redirected back — the common flow
– IdP-initiated: user starts at the IdP (e.g., company portal), clicks an app, IdP sends assertion directly — simpler but no SP-generated RequestID, so the SP can’t verify the request was expected (a security concern)


OAuth2: Authorization Delegation (Not Authentication)

This distinction is important and consistently confused: OAuth2 is for authorization, not authentication.

OAuth2 solves: “I want to let GitHub Actions post to my Slack without giving GitHub my Slack password.”

Resource Owner (you)  → grants permission to →  Client (GitHub Actions)
                                                        │
                                                        │ access_token
                                                        ▼
                                               Resource Server (Slack API)
                                               "this token can post messages"

The access_token answers “what can this client do?” not “who is this user?” A resource server receiving an access token knows the token is valid and what scopes it carries — it does not necessarily know which human authorized it.

The four OAuth2 grant types:

Grant Use case
Authorization Code Web apps (server-side) — most secure, recommended
PKCE (+ Auth Code) Native/SPA apps — Auth Code without client secret
Client Credentials Machine-to-machine (no user) — service accounts
Device Code Devices without browsers (smart TVs, CLIs)

The Implicit grant (tokens in URL fragment) is deprecated. Don’t use it.


OIDC: OAuth2 + Who You Are

OpenID Connect adds identity to OAuth2 by adding the id_token — a JWT that the IdP signs and that contains claims about the authenticated user.

Authorization Code flow with OIDC:

1. Client redirects user to IdP:
   GET /authorize?
     response_type=code
     &client_id=myapp
     &scope=openid email profile    ← "openid" scope triggers OIDC
     &redirect_uri=https://app.com/callback
     &state=random-nonce

2. IdP authenticates user, returns:
   GET /callback?code=AUTH_CODE&state=random-nonce

3. Client exchanges code for tokens:
   POST /token
   grant_type=authorization_code&code=AUTH_CODE...

4. IdP returns:
   {
     "access_token": "eyJ...",    ← what the user authorized
     "id_token": "eyJ...",        ← who the user is (JWT)
     "token_type": "Bearer",
     "expires_in": 3600
   }

The id_token decoded:

{
  "iss": "https://idp.corp.com",          ← issuer (the IdP)
  "sub": "user-guid-12345",               ← subject (stable user identifier)
  "aud": "myapp",                          ← audience (your client_id)
  "exp": 1745730000,                       ← expiry (Unix timestamp)
  "iat": 1745726400,                       ← issued at
  "email": "[email protected]",
  "name": "Vamshi Krishna",
  "groups": ["engineers", "sre-team"]     ← custom claims from IdP
}
# Decode any JWT at the command line (no verification — for debugging only)
echo "eyJ..." | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

# Or: jwt.io — paste the token, read every claim

sub is the stable user identifier. Email addresses change. Names change. The sub claim is the IdP’s internal identifier for the user — use it as the primary key when storing user data. Never store email as the primary key.


SAML vs OIDC: When to Use Which

SAML 2.0 OIDC
Format XML JSON / JWT
Transport HTTP POST (browser only) HTTP redirect + JSON API
Age 2002 2014
Enterprise adoption Very high (AD FS, Okta, Entra ID) Very high (newer apps)
API-friendly No Yes
Mobile apps No Yes
Complexity High (XML, schemas, signatures) Medium (JWT, JSON)
Single Logout Specified (rarely works well) Optional, inconsistent

Use SAML when: You’re integrating with an enterprise SaaS that only supports SAML (Salesforce classic, legacy HR systems), or your IdP team mandates it.

Use OIDC when: You’re building a new application, integrating with a modern IdP, or need API-based token validation. OIDC is the default for everything new.

Use OAuth2 (Client Credentials) when: Service-to-service authentication with no user — your CI/CD pipeline authenticating to an API, your microservice calling another microservice.


A Complete Browser SSO Flow (OIDC)

1. User visits https://app.corp.com (not logged in)
   App: no session → redirect to IdP

2. GET https://idp.corp.com/authorize?
        response_type=code
        &client_id=app-corp
        &scope=openid email
        &redirect_uri=https://app.corp.com/callback
        &state=abc123
        &nonce=xyz789

3. IdP: user is not authenticated → show login form
   User: enters [email protected] + password
   (or: IdP sees existing session cookie → skip login)

4. IdP: authentication success
   Redirect: GET https://app.corp.com/callback?code=AUTH_CODE&state=abc123

5. App (server-side): validate state=abc123 (CSRF protection)
   POST https://idp.corp.com/token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=app-corp
     &client_secret=SECRET
     &redirect_uri=https://app.corp.com/callback

6. IdP responds:
   { "id_token": "JWT...", "access_token": "JWT...", "expires_in": 3600 }

7. App: validate id_token signature (using IdP's JWKS endpoint)
   App: extract sub, email, groups from id_token
   App: create session for [email protected]
   App: redirect user to original destination

Step 7 is where most bugs live. The app must validate: signature (using IdP’s public keys from /.well-known/jwks.json), iss (matches the expected IdP), aud (matches the client_id), exp (not expired), and nonce (matches what was sent in step 2). Skip any of these and you have an authentication bypass.


⚠ Common Misconceptions

“OAuth2 is for login.” OAuth2 is for authorization delegation. It can be used as a login mechanism only when OIDC (the openid scope + id_token) is added on top. “Login with Google” uses OIDC, not bare OAuth2.

“JWTs are encrypted.” By default, JWTs are signed (JWS), not encrypted. The header and payload are base64url-encoded — anyone can decode them. Encryption (JWE) is a separate, less commonly used spec. Never put secrets in a JWT payload assuming it’s private.

“SAML Single Logout works reliably.” SAML SLO is specified but inconsistently implemented. Many SPs ignore SLO requests or don’t propagate them correctly. Don’t depend on SLO for security — session revocation requires additional mechanisms (short-lived tokens, token introspection, session registries).


Framework Alignment

Domain Relevance
CISSP Domain 5: Identity and Access Management SAML, OAuth2, and OIDC are the three protocols that enable federated identity and SSO — understanding which does what is foundational to modern IAM design
CISSP Domain 4: Communications and Network Security JWT validation (signature, claims, expiry) is a network security control — failing to validate any claim is an authentication bypass vulnerability
CISSP Domain 3: Security Architecture and Engineering The choice of SAML vs OIDC is an architectural decision that affects every application integration, mobile support, and API design

Key Takeaways

  • SAML 2.0: XML-based browser SSO — three redirects, signed assertion, enterprise legacy apps
  • OAuth2: authorization delegation — access tokens grant scopes, not identity
  • OIDC: OAuth2 + id_token — adds who the user is on top of what they can do
  • sub is the stable user identifier in OIDC — never use email as a primary key
  • JWT validation must check: signature, iss, aud, exp, nonce — missing any is a security bypass
  • New applications: OIDC. Legacy enterprise SaaS: SAML. Service-to-service: OAuth2 Client Credentials

What’s Next

EP10 covered the protocols. EP11 covers the systems that implement them — the identity providers: what Okta, Entra ID, Keycloak, and AD FS actually do, how they federate with each other, and how SCIM handles user provisioning separately from authentication.

Next: Identity Providers Explained: On-Prem, Cloud, SCIM, and Federation

Get EP11 in your inbox when it publishes → linuxcent.com/subscribe

SAML vs OIDC: Which Federation Protocol Belongs in Your Cloud?

Reading Time: 10 minutes

Meta Description: Choose between SAML vs OIDC federation for your cloud — understand token formats, trust flows, and which protocol fits your IdP and workload mix.


What Is Cloud IAMAuthentication vs AuthorizationIAM Roles vs PoliciesAWS IAM Deep DiveGCP Resource Hierarchy IAMAzure RBAC ScopesOIDC Workload IdentityAWS IAM Privilege EscalationAWS Least Privilege AuditSAML vs OIDC Federation


TL;DR

  • Federation means downstream systems trust the IdP’s signed assertion — they never see credentials and don’t manage them independently
  • SAML is XML-based, browser-oriented, the enterprise standard; OIDC is JWT-based, API-native, the modern protocol for workload identity and consumer SSO
  • In OIDC trust policies, the sub condition is the security boundary — omitting it means any GitHub Actions workflow in any repository can assume your role
  • Validate all JWT claims: signature, iss, aud, exp, sub — libraries do this, but need correct configuration (especially aud)
  • The IdP is the trust anchor: compromise the IdP and every downstream system is compromised. Treat IdP admin access with the same controls as your most sensitive system.
  • JIT provisioning and Conditional Access extend federation from “who are you” to “are you in an appropriate context right now”

The Big Picture

  FEDERATION: HOW TRUST FLOWS FROM IdP TO DOWNSTREAM SYSTEMS

  Identity Provider  (Okta / Entra ID / Google / AD FS)
  ┌──────────────────────────────────────────────────────────────────┐
  │  User or workload authenticates → IdP issues signed assertion   │
  │                                                                  │
  │  ┌──────────────────────────┐  ┌───────────────────────────┐   │
  │  │  SAML Assertion (XML)    │  │  OIDC ID Token (JWT)       │   │
  │  │  RSA-signed, 5–10 min    │  │  RS256-signed, ~1 hr      │   │
  │  │  Audience: SP entity ID  │  │  aud: client ID           │   │
  │  │  Subject: user identity  │  │  sub: specific workload   │   │
  │  └───────────┬──────────────┘  └──────────┬────────────────┘   │
  └─────────────────────────────────────────────────────────────────┘
                 │  human SSO                  │  workload identity
                 ▼                             ▼
  ┌─────────────────────────┐  ┌───────────────────────────────────┐
  │ SP validates signature  │  │ AWS STS / GCP STS validates       │
  │ + audience + timestamp  │  │ signature + iss + aud + sub       │
  │ → console session       │  │ → AssumeRoleWithWebIdentity       │
  └─────────────────────────┘  └───────────────────────────────────┘

  Security bound: IdP security bounds every system that trusts it
  Disable in Okta → access revoked everywhere that trusts Okta

Introduction

Before federation existed, every system had its own user database. Your Jira account. Your AWS account. Your Salesforce account. Your internal wiki. Each one had its own password, its own MFA, its own offboarding process. When an engineer joined, someone had to create accounts in every system. When they left, you hoped whoever processed the offboarding remembered to deactivate all of them.

I’ve done that audit — the one where you’re trying to figure out if a former employee still has access to anything. You go system by system, cross-reference against HR records, find accounts that exist in places you’ve forgotten the company even uses. In one environment I found an ex-engineer’s account still active in a vendor portal six months after they left, because that system was set up by someone who had since also left the company, and nobody had documented it.

Federation solves this structurally. One identity provider. One place to authenticate. One place to revoke. Every downstream system trusts the IdP’s assertion rather than managing credentials independently. Disable someone in Okta and they lose access everywhere that trusts Okta — immediately, without a checklist.

This episode is how federation actually works at the protocol level, because understanding the mechanism is what lets you design it securely. A federation setup with a trust policy that accepts assertions from any OIDC issuer is worse than no federation — it’s a false sense of security.


The Federation Model

Identity Provider (IdP)          Service Provider (SP) / Relying Party
  (Okta, Google, AD FS, Entra ID)       (AWS, Salesforce, GitHub, your app)
         │                                          │
         │  1. User authenticates to IdP             │
         │     (password + MFA)                      │
         │                                          │
         │  2. IdP generates a signed assertion      │
         │     (SAML response or OIDC ID Token)      │
         │ ──────────────────────────────────────── ▶│
         │                                          │
         │  3. SP validates the signature            │
         │     (using IdP's public certificate       │
         │      or JWKS endpoint)                    │
         │  4. SP maps identity to local permissions │
         │  5. SP grants access                      │

The SP never sees the user’s password. It never has one. It trusts the IdP’s cryptographic signature — if the assertion is signed with the IdP’s private key, and the SP trusts that key, the identity is accepted.

This trust chain has one critical property: the security of every SP is bounded by the security of the IdP. Compromise the IdP, and every system that trusts it is compromised. This is why IdP security deserves the same attention as the most sensitive system it gates access to.


SAML 2.0 — The Enterprise Standard

SAML (Security Assertion Markup Language) is XML-based, verbose, and battle-tested. Published in 2005, it’s the protocol behind most enterprise SSO deployments. When your company says “use your corporate login for this vendor app,” SAML is usually the mechanism.

How a SAML Login Flows

1. User visits AWS console (the Service Provider)
2. AWS checks: no active session → redirect to IdP
   → https://company.okta.com/saml?SAMLRequest=...
3. Okta authenticates the user (password, MFA)
4. Okta generates a SAML Assertion — a signed XML document containing:
   - Who the user is (Subject, typically email)
   - Their attributes (group memberships, custom attributes)
   - When the assertion was issued and when it expires (valid 5-10 minutes typically)
   - Which SP this is for (Audience restriction)
   - Okta's digital signature (RSA-SHA256 or similar)
5. Browser POSTs the assertion to AWS's ACS (Assertion Consumer Service) URL
6. AWS validates the signature against Okta's public cert (retrieved from Okta's metadata URL)
7. AWS reads the SAML attribute for the IAM role
8. AWS calls sts:AssumeRoleWithSAML → issues temporary credentials
9. User gets a console session — no AWS credentials were ever stored anywhere

What a SAML Assertion Actually Looks Like

<saml:Assertion>
  <saml:Issuer>https://okta.company.com</saml:Issuer>

  <saml:Subject>
    <saml:NameID>[email protected]</saml:NameID>
  </saml:Subject>

  <saml:AttributeStatement>
    <!-- This attribute tells AWS which IAM role to assume -->
    <saml:Attribute Name="https://aws.amazon.com/SAML/Attributes/Role">
      <saml:AttributeValue>
        arn:aws:iam::123456789012:role/EngineerRole,arn:aws:iam::123456789012:saml-provider/OktaProvider
      </saml:AttributeValue>
    </saml:Attribute>
  </saml:AttributeStatement>

  <!-- Critical: time bounds on this assertion -->
  <saml:Conditions NotBefore="2026-04-11T09:00:00Z" NotOnOrAfter="2026-04-11T09:05:00Z">
    <saml:AudienceRestriction>
      <!-- Critical: this assertion is ONLY valid for AWS -->
      <saml:Audience>https://signin.aws.amazon.com/saml</saml:Audience>
    </saml:AudienceRestriction>
  </saml:Conditions>

  <ds:Signature>... RSA-SHA256 signature over the above ...</ds:Signature>
</saml:Assertion>

The Audience restriction and the NotOnOrAfter timestamp are two of the most security-critical fields. The audience ensures this assertion can’t be reused for a different SP. The timestamp ensures it can’t be replayed after expiry.

Setting Up SAML Federation with AWS

# Register Okta as a SAML provider in AWS IAM
aws iam create-saml-provider \
  --saml-metadata-document file://okta-metadata.xml \
  --name OktaProvider

# Create the IAM role that federated users will assume
aws iam create-role \
  --role-name EngineerRole \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:saml-provider/OktaProvider"
      },
      "Action": "sts:AssumeRoleWithSAML",
      "Condition": {
        "StringEquals": {
          "SAML:aud": "https://signin.aws.amazon.com/saml"
        }
      }
    }]
  }'

# In Okta: configure the AWS IAM Identity Center app
# Attribute mapping: https://aws.amazon.com/SAML/Attributes/Role
# Value: arn:aws:iam::123456789012:role/EngineerRole,arn:aws:iam::123456789012:saml-provider/OktaProvider

# Set maximum session duration (8 hours is reasonable for human access)
aws iam update-role \
  --role-name EngineerRole \
  --max-session-duration 28800

SAML Attack Surface

Attack What It Does Why It Works Prevention
XML Signature Wrapping (XSW) Attacker inserts a malicious assertion, wraps it around the legitimate signed one; some SPs validate the wrong element SAML’s XML structure is complex; naive signature validation checks the signed element, not the element the SP reads Use a vetted SAML library — never hand-roll parsing
Assertion replay Steal a valid assertion (e.g., via network intercept) and replay it before NotOnOrAfter If the SP doesn’t track used assertion IDs, the same assertion can be used multiple times Short expiry; SP tracks seen assertion IDs
Audience bypass SP doesn’t verify the Audience field An assertion issued for SP A can be used at SP B Always validate Audience matches your SP entity ID

XML Signature Wrapping is the most interesting attack historically — it was how security researchers demonstrated SAML implementations in AWS, Google, and others could be bypassed before vendors patched their libraries. The lesson: SAML is complex enough that rolling your own parser is asking for a vulnerability.


OpenID Connect (OIDC) — The Modern Protocol

OIDC is JSON-based, REST-native, and designed for the web and API-first world. Built on top of OAuth 2.0, it’s the protocol behind “Sign in with Google,” GitHub’s OIDC tokens for Actions, and workload identity federation across cloud providers.

Token Anatomy

An OIDC ID Token is a JWT — three base64-encoded parts separated by dots:

Header.Payload.Signature

Header:
{
  "alg": "RS256",           ← signing algorithm
  "kid": "key-id-123"       ← which key signed this (for JWKS rotation)
}

Payload (the claims):
{
  "iss": "https://accounts.google.com",         ← who issued this token
  "sub": "108378629573454321234",               ← stable user identifier (not email)
  "aud": "my-app-client-id",                   ← who this token is for
  "exp": 1749600000,                           ← expires at (Unix timestamp)
  "iat": 1749596400,                           ← issued at
  "email": "[email protected]",
  "email_verified": true,
  "hd": "company.com"                          ← hosted domain (Google Workspace)
}

Signature: RSA-SHA256(base64(header) + "." + base64(payload), idp_private_key)

The relying party (your application, or AWS STS) validates the signature using the IdP’s public keys — available at the JWKS endpoint (/.well-known/jwks.json). The signature verification proves the token was issued by the expected IdP and hasn’t been tampered with since.

The Full OIDC Token Exchange (GitHub Actions → AWS)

# GitHub Actions automatically provides an OIDC token in the runner environment
# The token contains: iss=token.actions.githubusercontent.com, repo, ref, sha, run_id, etc.

# Step 1: Fetch the OIDC token from GitHub's token service
TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
  "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" | jq -r '.value')

# Step 2: Present to AWS STS for exchange
aws sts assume-role-with-web-identity \
  --role-arn arn:aws:iam::123456789012:role/GitHubActionsRole \
  --role-session-name github-deploy \
  --web-identity-token "${TOKEN}"

# STS performs these validations:
# 1. Fetch GitHub's JWKS: https://token.actions.githubusercontent.com/.well-known/jwks
# 2. Verify signature is valid
# 3. Verify iss = "token.actions.githubusercontent.com" (matches OIDC provider)
# 4. Verify aud = "sts.amazonaws.com"
# 5. Verify sub matches the trust policy condition
# 6. Verify exp is in the future

The trust policy condition on the IAM role is what prevents any GitHub repository from assuming this role:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
        "token.actions.githubusercontent.com:sub": "repo:my-org/my-repo:ref:refs/heads/main"
      }
    }
  }]
}

The sub condition is the security boundary. repo:my-org/my-repo:ref:refs/heads/main means: only runs triggered from the main branch of my-org/my-repo can assume this role. A pull request from a fork, a run from a different repo, or a run from a different branch — all get a different sub claim and the assumption fails.

I’ve reviewed trust policies that omit the sub condition and just check aud. That means any GitHub Actions workflow — in any repository, owned by anyone — can assume that role. That’s not a misconfiguration to be theoretical about: public GitHub repositories exist, and they can trigger GitHub Actions.

OIDC Validation Checklist

Every application that validates OIDC tokens must check all of these:

✓ Signature valid (using IdP's JWKS endpoint — not a hardcoded key)
✓ iss matches the expected IdP URL
✓ aud matches your application's client ID (not just "any audience")
✓ exp is in the future
✓ nbf (not before), if present, is in the past
✓ iat is recent (within your clock skew tolerance)
✓ For workload identity: sub is pinned to the specific workload

Skipping aud validation is the most common mistake. A token issued for application A with aud: app-a-client-id should not be accepted by application B. Without audience validation, any application in your system that can obtain a token for the IdP can reuse it at any other application. Libraries like python-jose and jsonwebtoken validate aud by default — but they need to be configured with the expected audience value.


Enterprise Federation Patterns

Multi-Account AWS with IAM Identity Center + Okta

The pattern I deploy in every multi-account AWS environment:

Okta (IdP)
  └── IAM Identity Center
        ├── Account: prod     → Permission Sets: ReadOnly, DevOps
        ├── Account: staging  → Permission Sets: Developer  
        ├── Account: shared   → Permission Sets: NetworkAdmin, SecurityAudit
        └── Account: sandbox  → Permission Sets: Admin (sandbox only)
# Engineers access accounts through Identity Center portal
aws configure sso
# Prompts: SSO start URL, region, account, role

aws sso login --profile prod-readonly

# List available accounts and roles (useful for tooling and scripts)
aws sso list-accounts --access-token "${TOKEN}"
aws sso list-account-roles --access-token "${TOKEN}" --account-id "${ACCOUNT_ID}"

# Get temporary credentials for a specific account/role
aws sso get-role-credentials \
  --account-id "${ACCOUNT_ID}" \
  --role-name ReadOnly \
  --access-token "${TOKEN}"

When an engineer is offboarded from Okta, they lose access to every AWS account immediately. No individual IAM user deletion across 20 accounts. No access key hunting. One action in Okta, complete revocation.

Just-in-Time (JIT) Provisioning

Rather than creating user accounts in every downstream system ahead of time, JIT provisioning creates accounts on first login:

  1. User authenticates to IdP
  2. SAML/OIDC assertion includes group memberships and attributes
  3. SP receives assertion, checks if a user account exists for this sub
  4. If not: create the account with attributes from the assertion
  5. Grant access based on group claims
  6. On subsequent logins: update the account’s attributes if claims changed

The security property: when a user is disabled in the IdP, their account in downstream systems becomes inaccessible even if the account object still exists. There’s nothing to log in with. JIT accounts don’t survive IdP deletion — they’re inactive shells that produce no risk.


The IdP Is the Trust Anchor — Protect It Accordingly

The entire security of a federated system is bounded by the security of the IdP. If an attacker can log into Okta as an admin, they can issue valid SAML assertions for any user, for any role, to any SP that trusts Okta. Every downstream system is compromised simultaneously.

This is not theoretical. In the 2023 Caesars and MGM Resorts attacks, initial access was achieved through social engineering against identity provider support — not through technical exploitation of cloud infrastructure. Once identity infrastructure is compromised, everything downstream follows.

What this means practically:

  • MFA for all IdP admin accounts — hardware FIDO2 keys, not TOTP. TOTP codes can be phished in real-time. Hardware keys cannot.
  • PIM / JIT access for IdP configuration changes — no standing admin access
  • Separate monitoring and alerting for IdP admin activity
  • Audit who can modify SAML/OIDC configurations and attribute mappings in the IdP — these are the levers for privilege escalation
  • Narrow audience restrictions — configure which SPs can receive assertions; don’t create a wildcard IdP configuration that serves all SPs

Conditional Access — Adding Context to Federation

Modern IdPs support Conditional Access policies that restrict when assertions are issued:

// Entra ID Conditional Access: require MFA + compliant device for AWS access
{
  "conditions": {
    "applications": {
      "includeApplications": ["AWS-Application-ID-in-Entra"]
    },
    "users": {
      "includeGroups": ["all-employees"]
    },
    "locations": {
      "excludeLocations": ["NamedLocation-CorporateNetwork"]
    }
  },
  "grantControls": {
    "operator": "AND",
    "builtInControls": ["mfa", "compliantDevice"]
  }
}

This policy: when an employee accesses AWS from outside the corporate network, they must use MFA on a device that MDM has verified as compliant. From inside the network, the policy still applies but the named location exclusion can relax certain requirements.

Conditional Access is how you move beyond “authenticated to IdP” as the only gate. Device health, network location, risk score — these become inputs to the access decision.


Framework Alignment

Framework Reference What It Covers Here
CISSP Domain 5 — Identity and Access Management Federation is the mechanism for extending identity trust across organizational boundaries
CISSP Domain 3 — Security Architecture Trust relationships must be explicitly designed; overly broad federation trust is an architectural failure
ISO 27001:2022 5.19 Information security in supplier relationships Federation with third-party IdPs and SPs establishes a cross-organizational trust boundary that must be governed
ISO 27001:2022 8.5 Secure authentication SAML and OIDC are the secure authentication protocols for federated access — token validation requirements
ISO 27001:2022 5.17 Authentication information Credential lifecycle in federated systems — no passwords distributed to SPs; IdP manages authentication
SOC 2 CC6.1 Federated identity is the access control mechanism for human access to cloud environments in CC6.1
SOC 2 CC6.6 Logical access from outside system boundaries — federation with external IdPs and partner organizations

Key Takeaways

  • Federation means downstream systems trust the IdP’s signed assertion — they never see credentials and don’t need to manage them independently
  • SAML is XML-based, browser-oriented, widely supported for enterprise SSO; OIDC is JWT-based, API-friendly, the protocol for modern workload identity and consumer SSO
  • In OIDC, the sub condition in trust policies is what prevents any workload from assuming any role — omitting it is a critical misconfiguration
  • Validate all JWT claims: signature, iss, aud, exp, sub — libraries do this, but they need correct configuration
  • The IdP is the trust anchor — its security posture bounds the security of every system that trusts it. Treat IdP admin access with the same controls as your most sensitive systems.
  • JIT provisioning and Conditional Access extend federation from “who are you” to “are you in an appropriate context right now”

What’s Next

EP11 brings this into Kubernetes — RBAC, service account tokens, and how the Kubernetes authorization layer interacts with cloud IAM. Two separate systems, both requiring security. A gap in either becomes a gap in both.

Next: Kubernetes RBAC and AWS IAM

Get EP11 in your inbox when it publishes → linuxcent.com/subscribe

OIDC Workload Identity: Eliminate Cloud Access Keys Entirely

Reading Time: 12 minutes

Meta Description: Replace static cloud credentials with OIDC workload identity — eliminate key rotation entirely for Lambda, GKE, and EKS workloads in production.


What Is Cloud IAMAuthentication vs AuthorizationIAM Roles vs PoliciesAWS IAM Deep DiveGCP Resource Hierarchy IAMAzure RBAC ScopesOIDC Workload Identity


TL;DR

  • Workload identity federation replaces static cloud access keys with short-lived tokens tied to runtime identity — no key to rotate, no secret to leak
  • The OIDC token exchange pattern is consistent across AWS (IRSA / Pod Identity), GCP (Workload Identity), and Azure (AKS Workload Identity) — learn one, translate the others
  • AWS EKS: use Pod Identity for new clusters; IRSA is the pattern for existing ones — both eliminate static keys
  • GCP GKE: --workload-pool at cluster level + roles/iam.workloadIdentityUser binding on the GCP service account
  • Azure AKS: federated credential on a managed identity + azure.workload.identity/use: "true" pod label
  • Cross-cloud federation works: an AWS IAM role can call GCP APIs without a GCP key file on the AWS side
  • Enforce IMDSv2 everywhere; pin OIDC trust conditions to specific service account names; give each workload its own identity

The Big Picture

  WORKLOAD IDENTITY FEDERATION — BEFORE AND AFTER

  ── STATIC CREDENTIALS (the broken model) ────────────────────────────────

  IAM user created → access key generated
         ↓
  Key distributed to pods / CI / servers → stored in Secrets, env vars, .env
         ↓
  Valid indefinitely — never expires on its own
         ↓
  Rotation is manual, painful, deferred ("there's a ticket for that")
         ↓
  Key proliferates across environments — you lose track of every copy
         ↓
  Leaked key → unlimited blast radius until someone notices and revokes it

  ── WORKLOAD IDENTITY FEDERATION (the current model) ─────────────────────

  No key created. No key distributed. No key to rotate.

  Workload starts → requests signed JWT from its native IdP
         │           (EKS OIDC issuer, GitHub Actions, GKE metadata server)
         ↓
  JWT carries workload claims: namespace, service account, repo, instance ID
         ↓
  Cloud STS / token endpoint validates JWT signature + trust conditions
         ↓
  Short-lived credential issued  (AWS STS: 1–12h  |  GCP/Azure: ~1h)
         ↓
  Credential expires automatically — nothing to clean up
         ↓
  Token stolen → usable for 1 hour maximum, audience-bound, not reusable

Workload identity federation is the architectural answer to static credential sprawl. The workload’s proof of identity is its runtime environment — the cluster it runs in, the repository it belongs to, the service account it uses. The cloud provider never issues a persistent secret. This episode covers how that exchange works across all three clouds and Kubernetes.


Introduction

Workload identity federation eliminates static cloud credentials by replacing them with short-lived tokens that the runtime environment generates and the cloud provider validates against a registered trust relationship. No key to distribute, no rotation schedule to maintain, no proliferation to track.

A while back I was reviewing a Kubernetes cluster that had been running in production for about two years. The team had done good work — solid app code, reasonable cluster configuration. But when I started looking at how pods were authenticating to AWS, I found what I find in roughly 60% of environments I look at.

Twelve service accounts. Twelve access key pairs. Keys created 6 to 24 months ago. Stored as Kubernetes Secrets. Mounted into pods as environment variables. Never rotated because “the app would need to be restarted” and nobody owned the rotation schedule. Two of the keys belonged to AWS IAM users who no longer worked at the company — the users had been deactivated, but the access keys were still valid because in AWS, access keys live independently of console login status.

When I asked who was responsible for rotating these, the answer I got was: “There’s a ticket for that.”

There’s always a ticket for that.

The engineering problem here isn’t that the team was careless. It’s that static credentials are fundamentally unmanageable at scale. Workload identity removes the problem at its root.


Why Static Credentials Are the Wrong Model for Machines

Before getting into solutions, let me be precise about why this is a security problem, not just an operational inconvenience.

Static credentials have four fundamental failure modes:

They don’t expire. An AWS access key created in 2022 is valid in 2026 unless someone explicitly rotates it. GitGuardian’s 2024 data puts the average time from secret creation to detection at 328 days. That’s almost a year of exposure window before anyone even knows.

They lose origin context. When an API call arrives at AWS with an access key, the authorization system can tell you what key was used — not whether it was used by your Lambda function, by a developer debugging something, or by an attacker using a stolen copy. Static credentials are context-blind.

They proliferate invisibly. One key, distributed to a team, copied into three environments, cached on developer laptops, stored in a CI/CD pipeline, pasted into a config file in a test environment that got committed. By the time you need to rotate it, you don’t know all the places it lives.

Rotation is operationally painful. Creating a new key, updating every place the old key lives, removing the old key — while ensuring nothing breaks during the transition — is a coordination exercise that organizations consistently defer. Every month the rotation doesn’t happen is another month of accumulated risk.

Workload identity solves all four by replacing persistent credentials with short-lived tokens that are generated from the runtime environment and verified by the cloud provider against a registered trust relationship.


The OIDC Exchange — What’s Actually Happening

All three major cloud providers have converged on the same underlying mechanism: OIDC token exchange.

Workload (pod, GitHub Actions runner, EC2 instance, on-prem server)
    │
    │  1. Request a signed JWT from the native identity provider
    │     (EKS OIDC server, GitHub's token.actions.githubusercontent.com,
    │      GKE metadata server, Azure IMDS)
    ▼
Native IdP issues a JWT. It contains claims about the workload:
    - What repository triggered this CI run
    - What Kubernetes namespace and service account this pod uses
    - What EC2 instance ID this request came from
    │
    │  2. Workload presents the JWT to the cloud STS / federation endpoint
    ▼
Cloud IAM evaluates:
    - Is the JWT signature valid? (verified against the IdP's public keys)
    - Does the issuer match a registered trust relationship?
    - Do the claims match the conditions in the trust policy?
    │
    │  3. If all checks pass: short-lived cloud credentials issued
    │     (AWS: temporary STS credentials, expiry 1-12 hours)
    │     (GCP: OAuth2 access token, expiry ~1 hour)
    │     (Azure: access token, expiry ~1 hour)
    ▼
Workload calls cloud API with short-lived credentials.
Credentials expire. Nothing to clean up. Nothing to rotate.

No static secret is stored anywhere. The workload’s identity is its runtime environment — the cluster it runs in, the repository it belongs to, the service account it uses. If someone steals the short-lived token, it expires in an hour. If someone tries to use a token for a different resource than it was issued for, the audience claim doesn’t match and it’s rejected.


AWS: IRSA and Pod Identity for EKS

IRSA — The Original Pattern

IRSA (IAM Roles for Service Accounts) federates a Kubernetes service account identity with an AWS IAM role. Each pod’s service account is the proof of identity; AWS issues temporary credentials in exchange for the OIDC JWT.

# Step 1: get the OIDC issuer URL for your EKS cluster
OIDC_ISSUER=$(aws eks describe-cluster \
  --name my-cluster \
  --query "cluster.identity.oidc.issuer" \
  --output text)

# Step 2: register this OIDC issuer with IAM
aws iam create-open-id-connect-provider \
  --url "${OIDC_ISSUER}" \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list "$(openssl s_client -connect ${OIDC_ISSUER#https://}:443 2>/dev/null \
    | openssl x509 -fingerprint -noout | cut -d= -f2 | tr -d ':')"

# Step 3: create an IAM role with a trust policy scoped to a specific service account
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
OIDC_ID="${OIDC_ISSUER#https://}"

cat > irsa-trust.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_ID}"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "${OIDC_ID}:sub": "system:serviceaccount:production:app-backend",
        "${OIDC_ID}:aud": "sts.amazonaws.com"
      }
    }
  }]
}
EOF

aws iam create-role \
  --role-name app-backend-s3-role \
  --assume-role-policy-document file://irsa-trust.json

aws iam put-role-policy \
  --role-name app-backend-s3-role \
  --policy-name AppBackendPolicy \
  --policy-document file://app-backend-policy.json
# Step 4: annotate the Kubernetes service account with the role ARN
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-backend
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app-backend-s3-role

The EKS Pod Identity webhook injects two environment variables into any pod using this service account: AWS_WEB_IDENTITY_TOKEN_FILE pointing to a projected token, and AWS_ROLE_ARN. The AWS SDK reads these automatically. The application doesn’t know any of this is happening — it just calls S3 and it works, using credentials that were never stored anywhere and expire automatically.

The trust policy’s sub condition is the security boundary. system:serviceaccount:production:app-backend means: only pods in the production namespace using the app-backend service account can assume this role. A pod in a different namespace, even with the same service account name, gets a different sub claim and the assumption fails.

EKS Pod Identity — The Simpler Modern Approach

AWS released Pod Identity as a simpler alternative to IRSA. No OIDC provider setup, no manual trust policy with OIDC conditions:

# Enable the Pod Identity agent addon on the cluster
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name eks-pod-identity-agent

# Create the association — this replaces the OIDC trust policy setup
aws eks create-pod-identity-association \
  --cluster-name my-cluster \
  --namespace production \
  --service-account app-backend \
  --role-arn arn:aws:iam::123456789012:role/app-backend-s3-role

Same result, less ceremony. For new clusters, Pod Identity is the path I’d recommend. IRSA remains important to understand for the many existing clusters already using it.

IAM Roles Anywhere — For On-Premises Workloads

Not everything runs in Kubernetes. For on-premises servers and workloads outside AWS, IAM Roles Anywhere issues temporary credentials to servers that present an X.509 certificate signed by a trusted CA:

# Register your internal CA as a trust anchor
aws rolesanywhere create-trust-anchor \
  --name "OnPremCA" \
  --source sourceType=CERTIFICATE_BUNDLE,sourceData.x509CertificateData="$(base64 -w0 ca-cert.pem)"

# Create a profile mapping the CA to allowed roles
aws rolesanywhere create-profile \
  --name "OnPremServers" \
  --role-arns "arn:aws:iam::123456789012:role/OnPremAppRole" \
  --trust-anchor-arns "${TRUST_ANCHOR_ARN}"

# On the on-prem server — exchange the certificate for AWS credentials
aws_signing_helper credential-process \
  --certificate /etc/pki/server.crt \
  --private-key /etc/pki/server.key \
  --trust-anchor-arn "${TRUST_ANCHOR_ARN}" \
  --profile-arn "${PROFILE_ARN}" \
  --role-arn "arn:aws:iam::123456789012:role/OnPremAppRole"

The server’s certificate (managed by your internal PKI or an ACM Private CA) is the proof of identity. No access key distributed to the server — just a certificate that your CA signed and that you can revoke through your existing certificate revocation infrastructure.


GCP: Workload Identity for GKE

For GKE clusters, Workload Identity is enabled at the cluster level and creates a bridge between Kubernetes service accounts and GCP service accounts:

# Enable Workload Identity on the cluster
gcloud container clusters update my-cluster \
  --workload-pool=my-project.svc.id.goog

# Enable on the node pool (required for the metadata server to work)
gcloud container node-pools update default-pool \
  --cluster=my-cluster \
  --workload-metadata=GKE_METADATA

# Create the GCP service account for the workload
gcloud iam service-accounts create app-backend \
  --project=my-project

SA_EMAIL="[email protected]"

# Grant the GCP SA the permissions it needs
gcloud storage buckets add-iam-policy-binding gs://app-data \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/storage.objectViewer"

# Create the trust relationship: K8s SA → GCP SA
gcloud iam service-accounts add-iam-policy-binding "${SA_EMAIL}" \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:my-project.svc.id.goog[production/app-backend]"
# Annotate the Kubernetes service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-backend
  namespace: production
  annotations:
    iam.gke.io/gcp-service-account: [email protected]

When the pod makes a GCP API call using ADC (Application Default Credentials), the GKE metadata server intercepts the credential request. It validates the pod’s Kubernetes identity, checks the IAM binding, and returns a short-lived GCP access token. The GCP service account key file never exists. There’s nothing to protect, nothing to rotate, nothing to leak.


Azure: Workload Identity for AKS

Azure’s workload identity for Kubernetes replaced the older AAD Pod Identity approach — which required a DaemonSet, had known TOCTOU vulnerabilities, and was operationally fragile. The current implementation uses the OIDC pattern:

# Enable OIDC issuer and workload identity on the AKS cluster
az aks update \
  --name my-aks \
  --resource-group rg-prod \
  --enable-oidc-issuer \
  --enable-workload-identity

# Get the OIDC issuer URL for this cluster
OIDC_ISSUER=$(az aks show \
  --name my-aks --resource-group rg-prod \
  --query "oidcIssuerProfile.issuerUrl" -o tsv)

# Create a user-assigned managed identity for the workload
az identity create --name app-backend-identity --resource-group rg-identities
CLIENT_ID=$(az identity show --name app-backend-identity -g rg-identities --query clientId -o tsv)
PRINCIPAL_ID=$(az identity show --name app-backend-identity -g rg-identities --query principalId -o tsv)

# Grant the identity the access it needs
az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/SUB_ID/resourceGroups/rg-prod/providers/Microsoft.Storage/storageAccounts/appstore

# Federate: trust the K8s service account from this cluster
az identity federated-credential create \
  --name aks-app-backend-binding \
  --identity-name app-backend-identity \
  --resource-group rg-identities \
  --issuer "${OIDC_ISSUER}" \
  --subject "system:serviceaccount:production:app-backend" \
  --audience "api://AzureADTokenExchange"
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-backend
  namespace: production
  annotations:
    azure.workload.identity/client-id: "CLIENT_ID_HERE"
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    azure.workload.identity/use: "true"   # triggers token injection
spec:
  serviceAccountName: app-backend
  containers:
  - name: app
    image: my-app:latest
    # Azure SDK DefaultAzureCredential picks up the injected token automatically

Cross-Cloud Federation — When AWS Talks to GCP

The same OIDC mechanism works cross-cloud. An AWS Lambda or EC2 instance can call GCP APIs without any GCP service account key on the AWS side:

# GCP side: create a workload identity pool that trusts AWS
gcloud iam workload-identity-pools create "aws-workloads" --location=global

gcloud iam workload-identity-pools providers create-aws "aws-provider" \
  --workload-identity-pool="aws-workloads" \
  --account-id="AWS_ACCOUNT_ID"

# Bind the specific AWS role to the GCP service account
gcloud iam service-accounts add-iam-policy-binding [email protected] \
  --role=roles/iam.workloadIdentityUser \
  --member="principalSet://iam.googleapis.com/projects/GCP_PROJ_NUM/locations/global/workloadIdentityPools/aws-workloads/attribute.aws_role/arn:aws:sts::AWS_ACCOUNT:assumed-role/MyAWSRole"

The AWS workload presents its STS-issued credentials to GCP’s token exchange endpoint. GCP verifies the AWS signature, checks the attribute mapping (only MyAWSRole from that AWS account), and issues a short-lived GCP access token. No GCP service account key is ever distributed to the AWS side.


The Threat Model — What Workload Identity Doesn’t Solve

Workload identity dramatically reduces the attack surface, but it doesn’t eliminate it:

Threat What Still Applies Mitigation
Token theft from the container filesystem The projected token is readable if you have container filesystem access Short TTL (default 1h); tokens are audience-bound — can’t use a K8s token to call Azure APIs
SSRF to metadata service An SSRF vulnerability can fetch credentials from the metadata endpoint Enforce IMDSv2 on AWS; use metadata server restrictions on GKE/AKS
Overpermissioned service account Workload identity doesn’t enforce least privilege — the SA can still be over-granted One SA per workload; review permissions against actual usage
Trust policy too broad OIDC trust policy allows any service account in a namespace Always pin to specific SA name in the sub condition

The SSRF-to-metadata-service path deserves particular attention. IMDSv2 (mandatory in AWS by requiring a PUT to get a token before any metadata request) blocks most SSRF scenarios because a simple SSRF can only make GET requests. Enforce it:

# Enforce IMDSv2 at instance launch
aws ec2 run-instances \
  --metadata-options HttpTokens=required,HttpPutResponseHopLimit=1

# Enforce org-wide via SCP — no instance can launch without IMDSv2
{
  "Effect": "Deny",
  "Action": "ec2:RunInstances",
  "Resource": "arn:aws:ec2:*:*:instance/*",
  "Condition": {
    "StringNotEquals": {
      "ec2:MetadataHttpTokens": "required"
    }
  }
}

⚠ Production Gotchas

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 1 — Trust policy scoped to namespace, not service account ║
║                                                                      ║
║  A condition like "sub": "system:serviceaccount:production:*"        ║
║  grants any pod in the production namespace the ability to assume    ║
║  the role. A compromised or new workload in that namespace gets      ║
║  access automatically.                                               ║
║                                                                      ║
║  Fix: always pin the sub condition to the exact service account      ║
║  name. "system:serviceaccount:production:app-backend" — not a glob.  ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 2 — Shared service accounts across workloads             ║
║                                                                      ║
║  Reusing one service account for multiple workloads saves setup      ║
║  time and creates a lateral movement path. A compromised workload    ║
║  that shares a service account with a payment processor has payment  ║
║  processor permissions.                                              ║
║                                                                      ║
║  Fix: one service account per workload. The overhead is low.         ║
║  The blast radius reduction is significant.                          ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 3 — IMDSv1 still reachable after enabling IMDSv2        ║
║                                                                      ║
║  Enabling IMDSv2 on new instances doesn't affect existing ones.      ║
║  The SCP approach enforces it at the org level going forward, but    ║
║  existing instances need explicit remediation.                       ║
║                                                                      ║
║  Fix: audit existing instances for IMDSv1 exposure.                 ║
║  aws ec2 describe-instances --query                                  ║
║    "Reservations[].Instances[?MetadataOptions.HttpTokens!='required']║
║    .[InstanceId,Tags]"                                               ║
╚══════════════════════════════════════════════════════════════════════╝

Quick Reference

┌────────────────────────────────┬───────────────────────────────────────────────────────┐
│ Term                           │ What it means                                         │
├────────────────────────────────┼───────────────────────────────────────────────────────┤
│ Workload identity federation   │ OIDC-based exchange: runtime JWT → short-lived token  │
│ IRSA                           │ IAM Roles for Service Accounts — EKS + OIDC pattern   │
│ EKS Pod Identity               │ Newer, simpler IRSA replacement — no OIDC setup       │
│ GKE Workload Identity          │ K8s SA → GCP SA via workload pool + IAM binding       │
│ AKS Workload Identity          │ K8s SA → managed identity via federated credential    │
│ IAM Roles Anywhere             │ AWS temp credentials for on-prem via X.509 cert       │
│ IMDSv2                         │ Token-gated AWS metadata service — blocks SSRF        │
│ OIDC sub claim                 │ Workload's unique identity string — use for pinning   │
│ Projected service account token│ K8s-injected JWT — the OIDC token pods present to AWS │
└────────────────────────────────┴───────────────────────────────────────────────────────┘

Key commands:
┌────────────────────────────────────────────────────────────────────────────────────────┐
│  # AWS — list OIDC providers registered in this account                               │
│  aws iam list-open-id-connect-providers                                               │
│                                                                                        │
│  # AWS — list Pod Identity associations for a cluster                                 │
│  aws eks list-pod-identity-associations --cluster-name my-cluster                     │
│                                                                                        │
│  # AWS — verify what credentials a pod is actually using                              │
│  aws sts get-caller-identity   # run from inside the pod                              │
│                                                                                        │
│  # AWS — audit instances missing IMDSv2                                               │
│  aws ec2 describe-instances \                                                          │
│    --query "Reservations[].Instances[?MetadataOptions.HttpTokens!='required']          │
│    .[InstanceId]" --output text                                                        │
│                                                                                        │
│  # GCP — verify workload identity binding on a GCP service account                   │
│  gcloud iam service-accounts get-iam-policy SA_EMAIL                                  │
│                                                                                        │
│  # GCP — list workload identity pools                                                 │
│  gcloud iam workload-identity-pools list --location=global                            │
│                                                                                        │
│  # Azure — list federated credentials on a managed identity                           │
│  az identity federated-credential list \                                               │
│    --identity-name app-backend-identity --resource-group rg-identities                │
└────────────────────────────────────────────────────────────────────────────────────────┘

Framework Alignment

Framework Reference What It Covers Here
CISSP Domain 5 — Identity and Access Management Non-human identities dominate cloud environments; workload identity federation is the modern machine authentication pattern
CISSP Domain 1 — Security & Risk Management Static credential sprawl is a measurable, eliminable risk; workload identity removes it at the root
ISO 27001:2022 5.17 Authentication information Managing machine credentials — workload identity replaces long-lived secrets with short-lived, environment-bound tokens
ISO 27001:2022 8.5 Secure authentication OIDC token exchange is the secure authentication mechanism for machine identities
ISO 27001:2022 5.18 Access rights Service account provisioning and deprovisioning — workload identity ties access to the runtime environment, not a stored secret
SOC 2 CC6.1 Workload identity federation is the preferred technical control for machine-to-cloud authentication in CC6.1
SOC 2 CC6.7 Short-lived, audience-bound tokens restrict credential reuse across systems — addresses transmission and access controls

Key Takeaways

  • Static credentials for machine identities are the problem, not the solution — workload identity federation eliminates them at the root
  • The OIDC token exchange pattern is consistent across AWS (IRSA/Pod Identity), GCP (Workload Identity), and Azure (AKS Workload Identity) — learn one, the others are a translation
  • AWS EKS: use Pod Identity for new clusters; IRSA remains the pattern for existing ones — both eliminate static keys
  • GCP GKE: Workload Identity enabled at cluster level, SA annotation at the K8s service account level
  • Azure AKS: federated credential on the managed identity, azure.workload.identity/use: "true" label on pods
  • Cross-cloud federation works — an AWS IAM role can call GCP APIs without a GCP key file
  • Enforce IMDSv2 everywhere; pin OIDC trust conditions to specific service account names; apply least privilege to the underlying cloud identity

What’s Next

You’ve eliminated the static credential problem. The next question is: what happens when the IAM configuration itself is the vulnerability? AWS IAM privilege escalation goes into the attack paths — how iam:PassRole, iam:CreateAccessKey, and misconfigured trust policies turn IAM misconfigurations into full account compromise. If you’re designing or auditing cloud access control, you need to know these paths before an attacker finds them.

Next: AWS IAM Privilege Escalation: How iam:PassRole Leads to Full Compromise

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

Authentication vs Authorization: AWS AccessDenied Explained

Reading Time: 10 minutes

Meta Description: Understand the difference between authentication vs authorization — and debug AWS AccessDenied errors by knowing whether to fix the credential or the policy.


What Is Cloud IAMAuthentication vs AuthorizationIAM Roles vs PoliciesAWS IAM Deep DiveGCP Resource Hierarchy IAMAzure RBAC Scopes


TL;DR

  • Authentication asks are you who you claim to be? Authorization asks are you allowed to do this? — two separate gates, two separate failure modes
  • AWS AccessDenied is an authorization failure — the identity authenticated fine; fix the policy, not the credentials
  • Prefer short-lived credentials (STS temporary tokens, Managed Identities) over long-lived access keys — the difference is the blast radius window
  • MFA strengthens authentication; it does nothing for authorization — a hijacked session with broad permissions is just as dangerous with or without MFA on the original login
  • HTTP 401 = authentication failure; HTTP 403 = authorization failure — the code tells you which gate to debug
  • Both layers must enforce least privilege independently — application-layer authorization is not a substitute for tight cloud IAM

The Big Picture

Every API call in the cloud passes through two gates before it executes. Most engineers know the first one. The second is where most security failures live.

  THE TWO GATES — every cloud API call passes through both, in order

  ┌──────────────────────────────────────────────────────────────────┐
  │  GATE 1 — AUTHENTICATION                                         │
  │  "Are you who you claim to be?"                                  │
  │                                                                  │
  │  IAM user     →  Access Key + Secret (long-lived, rotatable)    │
  │  IAM role     →  Temporary STS token (expires automatically)    │
  │  Human        →  Password + MFA via console or IdP              │
  │  Service      →  Instance profile / Managed Identity / OIDC     │
  │                                                                  │
  │  Passes → move to Gate 2                                        │
  │  Fails  → stopped here, HTTP 401                                │
  └──────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  GATE 2 — AUTHORIZATION                                          │
  │  "Are you allowed to do what you're trying to do?"               │
  │                                                                  │
  │  Evaluated against: identity-based policies · SCPs              │
  │                     resource-based policies · conditions         │
  │                     permissions boundaries · session policies    │
  │                                                                  │
  │  Default answer: DENY (explicit Allow required every time)      │
  │                                                                  │
  │  Passes → request executes                                      │
  │  Fails  → AccessDenied / HTTP 403                               │
  └──────────────────────────────────────────────────────────────────┘

  MFA hardens Gate 1. It has zero effect on Gate 2.
  A hijacked session with a valid token clears Gate 1 automatically.
  Gate 2 is your last line of defense — and the one that's most often misconfigured.

Introduction

The authentication vs authorization distinction is the most commonly confused boundary in cloud security — and the source of most misdirected debugging when an AWS AccessDenied error appears. These are two separate gates, two separate failure modes, and two entirely different fixes.

Early in my career I wrote an API endpoint I was proud of. Token validation. Rejection of unauthenticated requests. I called it “secured” in the code review.

A senior engineer asked one question: “What happens if I take a valid token from a regular user and call your /admin/delete-user endpoint?”

I ran the test. It worked. Any employee — with a perfectly valid, properly issued token — could delete any user account in the system.

The authentication was correct. The authorization didn’t exist.

That gap between proving who you are and proving you’re allowed to do this is where a surprising number of security incidents live. Not just in application code — in cloud IAM too.

I’ve reviewed AWS environments where MFA was enforced on every human account, access keys were rotated quarterly, and yet a Lambda function had s3:* on * because whoever wrote the deployment script reached for AmazonS3FullAccess and moved on.

Gate 1 was solid. Gate 2 was wide open.

This episode draws the boundary cleanly — what each gate is, how each cloud implements it, and the specific failure modes that happen when the two get conflated.


How Authentication Works in Cloud IAM

Authentication answers: are you who you claim to be?

The three factor types

Authentication has not fundamentally changed in decades. What has changed is how cloud platforms implement it.

Factor Type Cloud Examples
Something you know Knowledge Password, access key secret, PIN
Something you have Possession TOTP app, FIDO2 hardware key, smart card
Something you are Inherence Biometrics — less common in cloud contexts

MFA requires two distinct factors. A password plus a username is not MFA — both are knowledge factors. A password plus a TOTP code is MFA. Worth stating clearly because I’ve seen internal documentation describe “username and password” as two-factor authentication.

SMS codes count as MFA, but they’re the weakest form. SIM-swapping attacks — convincing a carrier to port your number — have been used to defeat SMS MFA on high-value accounts. If TOTP or FIDO2 hardware keys are available, use them.

How AWS authenticates

AWS has two fundamentally different identity classes:

Human identities authenticate via console (password + optional MFA) or CLI/API (Access Key ID + Secret Access Key). The access key is a long-lived credential with no default expiry. Every .env file with an access key, every git commit that included one, every CI/CD log that printed one — that credential is live until someone explicitly rotates or deletes it.

Machine identities — EC2, Lambda, ECS tasks — authenticate via temporary credentials issued by STS:

# Assume a role — get temporary credentials that expire
aws sts assume-role \
  --role-arn arn:aws:iam::123456789012:role/DevRole \
  --role-session-name alice-session \
  --duration-seconds 3600
# Returns: AccessKeyId + SecretAccessKey + SessionToken
# All three expire together. Nothing to rotate.

# From inside an EC2 instance — credentials arrive automatically via IMDS
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyAppRole
# Returns: AccessKeyId, SecretAccessKey, Token, Expiration
# AWS refreshes these before expiry. The application never sees a rotation event.

The IMDS model is the right one. The application never manages a credential — it appears, it’s used, it expires. If it leaks, it’s usable for hours at most, not years.

Why Long-Lived Credentials Keep Appearing

How GCP authenticates

GCP cleanly separates human and machine authentication.

Humans authenticate via Google Account or Workspace (OAuth2). The gcloud CLI handles the flow:

gcloud auth login                        # browser-based OAuth2 for humans
gcloud auth application-default login    # sets up Application Default Credentials for local dev

Machine identities use service accounts, ideally attached to the resource rather than using downloaded key files. Key files are GCP’s equivalent of long-lived AWS access keys — same problems, same risks.

# From inside a GCE VM — ADC uses the attached service account, no key file needed
gcloud auth print-access-token
# Use it: curl -H "Authorization: Bearer $(gcloud auth print-access-token)" ...

How Azure authenticates

Azure’s identity plane is Entra ID (formerly Azure Active Directory). Humans authenticate via Entra ID using OAuth2/OIDC. Machine identities use Managed Identities — Azure handles the entire credential lifecycle, nothing to configure or rotate.

az login                                  # browser-based OAuth2
az login --service-principal \            # service principal for automation
  -u APP_ID -p CERT_OR_SECRET \
  --tenant TENANT_ID

# From inside an Azure VM — get a token via IMDS, no credentials needed
curl 'http://169.254.169.254/metadata/identity/oauth2/token\
?api-version=2018-02-01&resource=https://management.azure.com/' \
  -H 'Metadata: true'

The credential failure modes that repeat everywhere

In practice, the same patterns appear across all three clouds in every audit:

Leaked credentials — access keys in git commits, .env files, Docker image layers, CI/CD logs. GitHub’s secret scanning finds thousands of these monthly on public repos alone.

Long-lived credentials — an access key from 2019 is still valid in 2026 unless someone explicitly rotated it. I’ve audited accounts where 30% of access keys had never been rotated, some five years old.

Shared credentials — one key used by three services. When you revoke it, three things break. When it leaks, you can’t tell which service was the source.

Credential sprawl — service account keys downloaded for “one quick test” and never deleted. I once found seventeen key files for a single GCP service account, created by different engineers over two years. None rotated. Five belonged to accounts that no longer existed.

The direction of travel in all three clouds is credential-less: workload identity federation, managed identities, instance profiles. We’ll cover this specifically in OIDC Workload Identity: Eliminate Cloud Access Keys Entirely.


How Authorization Evaluates Every API Call

Authorization happens after authentication. The system knows who you are — now it decides what you can do. This decision is enforced through IAM roles vs policies — the building blocks that express what each identity is allowed to do on which resources.

What the evaluation looks like

Every API call triggers an authorization check. You don’t notice when it succeeds. You notice when it fails:

REQUEST:
  Action:    s3:DeleteObject
  Resource:  arn:aws:s3:::prod-backups/2024-01-15.tar.gz
  Principal: arn:aws:iam::123456789012:role/DevEngineerRole
  Context:   { source_ip: "10.0.1.5", mfa: false, time: "14:32 UTC" }

EVALUATION:
  1. Explicit Deny anywhere? → none found
  2. Explicit Allow in any policy? → not granted
  3. Default → DENY

RESULT: AccessDenied

The engineer authenticated successfully. Valid credentials, valid session. But DevEngineerRole has no policy granting s3:DeleteObject on that bucket. Gate 1 passed. Gate 2 denied. They are evaluated independently.

Policy evaluation chains by cloud

AWS — evaluated in layers, explicit Deny wins at any layer:

1. Explicit Deny in any SCP?           → DENY (cannot be overridden anywhere)
2. No SCP Allow?                       → DENY
3. Explicit Deny in identity or resource policy? → DENY
4. Resource-based policy Allow?        → can ALLOW (same account)
5. Permissions boundary — no Allow?    → DENY
6. Session policy — no Allow?          → DENY
7. Identity-based policy Allow?        → ALLOW
Default (nothing granted):             → DENY

The default is always Deny. Every successful authorization is an explicit "Effect": "Allow" somewhere in the chain. This is the opposite of traditional Unix — in the cloud, if you didn’t explicitly grant it, it doesn’t exist.

GCP — additive, permissions accumulate up the hierarchy:

Permission granted if ANY binding grants it at:
  resource level → project level → folder level → organization level

IAM Deny Policies can override all grants (newer feature).
No binding at any level? → Denied.

Azure RBAC:

1. Explicit Deny Assignment?           → DENY (even Owner can't override)
2. Role Assignment with Allow?         → ALLOW
Default:                               → DENY

Why Confusing Authentication and Authorization Breaks Security

The token-as-authorization antipattern

An application checks for a valid JWT and if found, proceeds. The JWT proves the user authenticated with the IdP. However, it says nothing about what they’re allowed to do.

# This is authentication only — anyone with a valid token gets through
@app.route("/admin/delete-user", methods=["POST"])
def delete_user():
    token = request.headers.get("Authorization")
    if verify_token(token):           # asks: is this token real and unexpired?
        delete_user_from_db(...)      # executes for any valid token holder
        return "OK"
    return "Unauthorized", 401

# This separates the two correctly
@app.route("/admin/delete-user", methods=["POST"])
def delete_user():
    token = request.headers.get("Authorization")
    principal = verify_token(token)                    # Gate 1: authentication
    if not has_permission(principal, "users:delete"):  # Gate 2: authorization
        return "Forbidden", 403
    delete_user_from_db(...)
    return "OK"

The short-expiry principle

Credential type Provider Typical lifetime Risk
Access Key + Secret AWS Permanent (until deleted) Years of exposure if leaked
STS Temporary Token AWS 15 min – 12 hours Hours at most
OAuth2 Access Token GCP / Azure ~1 hour Short window
IMDS Token (VM) All three Minutes Auto-refreshed by platform

A credential that expires in an hour has a one-hour exposure window if stolen. A credential that never expires has an unlimited window. This is the operational argument for managed identities and instance profiles, beyond just convenience.

# AWS — configure max session duration at role level
aws iam update-role \
  --role-name MyRole \
  --max-session-duration 3600   # 1 hour max

# GCP — access tokens expire in ~1 hour automatically
gcloud auth print-access-token
# Refresh: gcloud auth application-default print-access-token

# Azure — token lifetime configurable in Entra ID token policies
az account get-access-token --resource https://management.azure.com/

⚠ Production Gotchas

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 1 — "We have MFA, so permissions can be broad"          ║
║                                                                      ║
║  MFA protects Gate 1 only. If a session is hijacked after login    ║
║  (via malware, SSRF, or a stolen session cookie), the attacker has  ║
║  a valid, MFA-authenticated token. Gate 1 is already cleared.       ║
║  Broad permissions in Gate 2 are the full attack surface.           ║
║                                                                      ║
║  Fix: treat Gate 2 (IAM policy) as your primary blast-radius        ║
║  control. MFA buys time. Least privilege limits damage.             ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 2 — Debugging AccessDenied by rotating credentials      ║
║                                                                      ║
║  AWS AccessDenied is an authorization failure. The identity         ║
║  authenticated successfully — there's no Allow in the policy.       ║
║  Rotating the access key does nothing.                              ║
║                                                                      ║
║  Fix: check the policy chain. Use simulate-principal-policy to      ║
║  confirm where the Allow is missing before touching credentials.    ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 3 — Application-layer authZ with broad cloud IAM        ║
║                                                                      ║
║  "The app controls access" is not a substitute for scoped cloud     ║
║  IAM. An SSRF vulnerability, exposed debug endpoint, or            ║
║  compromised dependency bypasses the application layer entirely.    ║
║  The cloud identity's permissions become the attacker's surface.    ║
║                                                                      ║
║  Fix: both layers enforce least privilege independently.            ║
╚══════════════════════════════════════════════════════════════════════╝

Authentication vs Authorization Audit Checklist

Split your IAM review along the authN/authZ boundary — they’re different problems with different fixes.

Authentication — Gate 1:
– Are there long-lived access keys that could be replaced with STS/Managed Identity?
– Is MFA enforced for all human identities with console or API access?
– Are service account key files present where workload identity is available?
– Are credentials stored in a secrets manager — not in code, .env files, or repos?
– When did each long-lived credential last rotate?

Authorization — Gate 2:
– Does every policy follow least privilege — only the permissions the workload actually uses?
– Are there wildcards (s3:*, "Resource": "*") that could be narrowed?
– Are write, delete, and IAM-modification actions scoped to specific resources?
– Are SCPs or permissions boundaries capping maximum permissions at org or account level?
– When were each role’s permissions last reviewed against actual usage (Access Analyzer)?


Quick Reference

┌────────────────────────────┬──────────────────────────────────────────────────┐
│ Term                       │ What it means                                    │
├────────────────────────────┼──────────────────────────────────────────────────┤
│ Authentication (AuthN)     │ Verifying identity — are you who you claim?      │
│ Authorization (AuthZ)      │ Verifying permission — are you allowed to act?   │
│ MFA                        │ Two distinct factors; strengthens Gate 1 only    │
│ STS (AWS)                  │ Security Token Service — issues temp credentials │
│ Access Key                 │ Long-lived AWS credential; avoid for services    │
│ Instance profile (AWS)     │ Container attaching a role to EC2                │
│ Managed Identity (Azure)   │ Credential-less identity for Azure services      │
│ Service Account (GCP)      │ Machine identity; prefer attached over key file  │
│ HTTP 401                   │ Authentication failure — prove who you are       │
│ HTTP 403 / AccessDenied    │ Authorization failure — fix the policy           │
└────────────────────────────┴──────────────────────────────────────────────────┘

Commands to know:
┌──────────────────────────────────────────────────────────────────────────────┐
│  # AWS — assume a role and get temporary credentials                        │
│  aws sts assume-role --role-arn arn:aws:iam::ACCOUNT:role/ROLE \            │
│    --role-session-name my-session --duration-seconds 3600                   │
│                                                                              │
│  # AWS — simulate a policy to debug AccessDenied before touching anything   │
│  aws iam simulate-principal-policy \                                         │
│    --policy-source-arn arn:aws:iam::ACCOUNT:role/MyRole \                   │
│    --action-names s3:GetObject \                                             │
│    --resource-arns arn:aws:s3:::my-bucket/*                                 │
│                                                                              │
│  # AWS — check what credentials your session is using                       │
│  aws sts get-caller-identity                                                 │
│                                                                              │
│  # GCP — print the current access token (expires in ~1 hour)                │
│  gcloud auth print-access-token                                              │
│                                                                              │
│  # GCP — show which account ADC is using                                    │
│  gcloud auth application-default print-access-token                         │
│                                                                              │
│  # Azure — get current token for ARM                                         │
│  az account get-access-token --resource https://management.azure.com/       │
│                                                                              │
│  # Azure — check who you're logged in as                                     │
│  az account show                                                             │
└──────────────────────────────────────────────────────────────────────────────┘

Framework Alignment

Framework Reference What It Covers Here
CISSP Domain 5 — Identity and Access Management AuthN and AuthZ are the two core mechanisms; this episode defines the boundary
CISSP Domain 1 — Security & Risk Management Conflating the two creates systematic, measurable risk with different attack surfaces
ISO 27001:2022 5.17 Authentication information Managing credentials and authentication mechanisms across the identity lifecycle
ISO 27001:2022 8.5 Secure authentication Technical controls — MFA, session management, credential policies
ISO 27001:2022 5.15 Access control Policy requirements that depend on cleanly separating identity from permission
SOC 2 CC6.1 Logical access controls — this episode defines the two-gate model CC6.1 is built on
SOC 2 CC6.7 Access restrictions enforced at the authorization layer, not just authentication

Key Takeaways

  • Authentication proves identity; authorization proves permission — two gates, two separate failure modes, two separate fixes
  • AWS AccessDenied is a Gate 2 failure — the credential is valid, the policy is missing; fix the policy
  • Short-lived credentials (STS, Managed Identities, instance profiles) reduce the blast radius of a credential compromise from years to hours
  • MFA hardens Gate 1 — it has no effect on what an authenticated identity can do
  • HTTP 401 = Gate 1 failed; HTTP 403 = Gate 2 failed — the status code tells you where to look
  • Application-layer authorization and cloud IAM authorization are independent — both must enforce least privilege

What’s Next

You now know what the two gates are and where failures in each originate. IAM Roles vs Policies: How Cloud Authorization Actually Works goes into the mechanics of Gate 2 — the permissions, policies, and roles that implement authorization in practice, and the structural patterns that keep them from turning into an unmanageable sprawl.

Next: IAM Roles vs Policies: How Cloud Authorization Actually Works

Get the IAM roles vs policies breakdown in your inbox when it publishes → linuxcent.com/subscribe