Compliance Grading — Automated OpenSCAP with A-F Scores Before Deployment

Reading Time: 6 minutes

OS Hardening as Code, Episode 4
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening · Automated OpenSCAP Compliance**


TL;DR

  • “We use CIS L1” means nothing without a verified grade — automated OpenSCAP compliance provides one before any instance is deployed
  • Stratum runs OpenSCAP after every build and attaches the grade to the image metadata: cis-l1-A-98
  • Grades are A through F based on percentage of controls passing, with explicit accounting for documented overrides
  • SARIF output is machine-readable — importable directly into GitHub Advanced Security, Jira, or any SIEM
  • Drift detection: rescan any running instance against the original blueprint and see exactly which controls changed since the image was built
  • An image that scores below your minimum grade threshold doesn’t get snapshotted — it doesn’t exist

The Problem: A Grade That’s Never Been Verified Is Not a Grade

Security audit request:
"Provide CIS L1 compliance evidence for all production instances"

Team response:
  Instance A: "CIS L1 hardened" — OpenSCAP last run: 4 months ago
  Instance B: "CIS L1 hardened" — OpenSCAP last run: never
  Instance C: "CIS L1 hardened" — OpenSCAP version: 1.2 (current: 1.3.8)
  Instance D: "CIS L1 hardened" — manual scan output: "87% passing"
  Instance E: "CIS L1 hardened" — manual scan output: "91% passing"

"Which profile was used for D and E? Are they comparable?"
"Were they scanned before or after a recent kernel update?"
"Why is C running an old OpenSCAP version?"

Automated OpenSCAP compliance means the grade is generated the same way, on every image, every time, before the image is ever deployed.

EP03 showed that the same HardeningBlueprint YAML builds consistent OS images across six cloud providers. What it left open is the question every auditor eventually asks: how do you know the Ansible hardening actually did what you think it did? Running Ansible-Lockdown successfully means the tasks ran. It does not mean every CIS control is satisfied — some controls can’t be applied by Ansible alone, some require manual verification, and some interact with the environment in unexpected ways.


A compliance team requested CIS L2 evidence for a SOC 2 Type II audit. The security team had been running OpenSCAP scans — but manually, on-demand, using slightly different profiles across teams, with no standard for how to store or compare results.

The audit found four problems:
1. Two instances had been scanned with CIS L1, not L2, despite being labeled “CIS L2”
2. Three instances hadn’t been scanned in over six months
3. The scan outputs from different teams were in different formats (HTML vs XML vs text)
4. Two instances showed “91% passing” and “89% passing” — with no documentation of whether those were acceptable thresholds or what the failing controls were

The audit took two weeks to resolve. The finding wasn’t a security failure — it was a documentation and process failure. But it consumed two weeks of engineering time and appeared in the audit report as a gap.

The root cause: compliance scanning was a manual step that produced inconsistent output in an inconsistent format.


How Automated OpenSCAP Compliance Works

Every Stratum build ends with an automated OpenSCAP scan:

stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
      │
      ├─ Provisions build instance
      │
      ├─ Runs Ansible-Lockdown (144 tasks)
      │
      ├─ Runs post-build OpenSCAP scan
      │    ├── Profile: CIS Ubuntu 22.04 L1 (from blueprint)
      │    ├── OpenSCAP version: pinned in blueprint (default: latest)
      │    └── 100 controls checked
      │
      ├─ Calculates grade
      │    ├── Passing:   92 controls
      │    ├── Failing:   6 controls
      │    ├── Overrides: 2 (documented in blueprint)
      │    └── Grade: A (94/100 effective, 98% pass rate)
      │
      ├─ Writes to image metadata:
      │    compliance_grade=cis-l1-A-94
      │    compliance_scan_date=2026-04-19
      │    [email protected]
      │
      └─ Snapshots AMI (or fails if grade < min_grade)

The grade is written into the AMI (or GCP/Azure image) metadata at creation time. It travels with the image. Any instance launched from this AMI carries the provenance of what was applied and what grade was achieved.


The A-F Grade Calculation

The grade is not a simple percentage. It accounts for documented overrides and applies a threshold-based letter scale:

Total CIS controls:    100
Passing:               92
Failing:               6 (genuine failures)
Overrides (compliant): 2 (documented in blueprint, counted as passing)

Effective passing:     94 / 100
Grade:                 A

Grade thresholds (configurable per blueprint):

Grade Default threshold Meaning
A ≥ 95% effective Production-ready, minimal exceptions
B 85–94% Acceptable with documented exceptions
C 70–84% Below standard — deploy with caution
D 55–69% Significant gaps — do not deploy to production
F < 55% Hardening failed — image not snapshotted

The thresholds are configurable in the blueprint:

compliance:
  benchmark: cis-l1
  controls: all
  min_grade: B          # Build fails if grade < B
  grade_thresholds:
    A: 95
    B: 85
    C: 70
    D: 55

If the build produces a grade below min_grade, the instance is terminated and no image is created. The failure is logged with the full list of controls that blocked the grade.


Reading the Scan Output

# Show the last build's scan results
stratum scan --show-last --blueprint ubuntu22-cis-l1.yaml

# Output:
# Build: ubuntu22-cis-l1 @ 2026-04-19T15:42:01Z
# Provider: aws (ap-south-1)
# Grade: A (94/100 effective controls)
#
# Passing controls: 92
# Failing controls: 6
# ──────────────────────────────────────────────
# FAIL  1.1.7   Ensure separate partition for /var/log/audit
#       Reason: tmpfs used — separate block device not configured
#       Remediation: Add /var/log/audit to separate EBS volume
#
# FAIL  1.6.1.3 Ensure AppArmor is enabled in bootloader config
#       Reason: GRUB_CMDLINE_LINUX missing apparmor=1 security=apparmor
#       Remediation: Update /etc/default/grub, run update-grub, reboot
#
# FAIL  3.1.1   Ensure IPv6 is disabled if not needed
#       Reason: net.ipv6.conf.all.disable_ipv6=0
#       Remediation: Set in /etc/sysctl.d/60-kernel-hardening.conf
# ...
#
# Overrides (compliant): 2
# ──────────────────────────────────────────────
# OVERRIDE  1.1.2   tmpfs /tmp via systemd unit — equivalent control
# OVERRIDE  5.2.4   SSH timeout managed by session manager policy

The failing controls tell you exactly what to fix and how to fix it. This is the difference between “87% passing” as a number and “87% passing” as an actionable gap list.


SARIF Export

Every scan produces a SARIF (Static Analysis Results Interchange Format) file:

# Export scan results to SARIF
stratum scan \
  --instance i-0abc123 \
  --benchmark cis-l1 \
  --output sarif \
  --out-file scan-results/i-0abc123-cis-l1.sarif

SARIF is the standard format for security scan results. It’s directly importable into:

  • GitHub Advanced Security — upload via actions/upload-sarif, results appear in the Security tab
  • Jira — import as security findings, linked to the image or instance ID
  • Splunk / SIEM — structured JSON, parseable as events
  • AWS Security Hub — importable as findings via the Security Hub API

For audit purposes, the SARIF file is the evidence artifact. It contains the full scan profile, every control result, the OpenSCAP version, the scan timestamp, and the machine it was run against.

# Upload to GitHub Advanced Security
stratum scan \
  --instance i-0abc123 \
  --benchmark cis-l1 \
  --output sarif \
  --github-upload \
  --github-ref $GITHUB_REF \
  --github-sha $GITHUB_SHA

Drift Detection

The grade at build time is the baseline. Any instance can be rescanned against the blueprint that built it:

# Rescan a running instance
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Output:
# Instance: i-0abc123 (launched from ami-0a7f3c9e82d1b4c05)
# Original grade (build):  A (94/100) — 2026-01-15
# Current grade (rescan):  B (87/100) — 2026-04-19
#
# Drifted controls (7):
#   3.3.2  TCP SYN cookies: FAIL — net.ipv4.tcp_syncookies=0
#           Last passing: 2026-01-15 (build)
#           Current value: 0 (expected: 1)
#
#   5.3.2  sudo log_input: FAIL — rule removed from /etc/sudoers.d/
#           Last passing: 2026-01-15 (build)
#           Current value: [rule absent] (expected: Defaults log_input)

Drift detection is how you find the instances that were “temporarily” modified and never reverted. The scan compares the current state against the baseline — not against a generic CIS profile, but against the specific blueprint version that built the image.


Scanning Without a Build: Assessing Existing Instances

For instances not built with Stratum, you can run a standalone scan:

# Assess an existing instance against CIS L1
stratum scan --instance i-0legacy123 --benchmark cis-l1

# No blueprint comparison — just the raw CIS grade
# Output:
# Grade: C (72/100)
# 28 controls failing
# ...

This is useful for assessing the state of instances built before Stratum was in use, or for comparing a manual hardening approach against the benchmark.


What Controls Typically Block an A Grade

For Ubuntu 22.04 CIS L1 builds in most cloud environments, these are the controls that most commonly prevent an A grade:

Control Why it often fails Fix
1.1.7 /var/log/audit separate partition Cloud images don’t have separate volumes at build time Add EBS volume, configure at launch
1.6.1 AppArmor bootloader config GRUB parameters not set correctly Update /etc/default/grub, run update-grub
3.1.1 Disable IPv6 Cloud networking sometimes requires IPv6 Override with documented reason if intentional
5.2.21 SSH MaxStartups Default sshd_config not updated Add MaxStartups 10:30:60 to sshd_config
6.1.10 World-writable files Some package installations leave world-writable files Post-install cleanup in Ansible role

The first two (separate audit partition, AppArmor bootloader) are the most common A→B blockers and often require architecture decisions about how volumes are provisioned at launch versus build time.


Key Takeaways

  • Automated OpenSCAP compliance means every image has a verified, reproducible grade generated by the same scanner with the same profile, before it’s ever deployed
  • The A-F grade accounts for documented overrides from the blueprint — the failing controls in the output are genuine gaps, not known exceptions
  • SARIF export makes scan results importable into GitHub Advanced Security, Jira, SIEM, and audit tooling
  • Drift detection catches configuration changes that happen after the image is deployed — the grade at build time is the baseline
  • Images that score below min_grade don’t get snapshotted — the failed build tells you exactly which controls to fix

What’s Next

Automated OpenSCAP compliance gives every image a verified grade before deployment. What EP04 left open is what happens after the grade is known — specifically, what prevents an engineer from deploying a C-grade image to production “just this once.”

The Pipeline API is the answer. EP05 covers the CI/CD compliance gate: POST /api/pipeline/scan fails the build if the image grade is below threshold. The unhardened image never reaches production — not because engineers are disciplined, but because the pipeline won’t let it through.

Next: CI/CD compliance gate — block unhardened images before they reach production

Get EP05 in your inbox when it publishes → linuxcent.com/subscribe

One Blueprint, Six Clouds — Multi-Provider OS Image Builds

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**


TL;DR

  • Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
  • A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
  • The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
  • Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
  • One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
  • Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.


We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.


How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.


The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.


What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider OS disk Ephemeral Data
AWS /dev/xvda /dev/xvdb /dev/xvdc+
GCP /dev/sda /dev/sdb+
Azure /dev/sda /dev/sdb (temp disk) /dev/sdc+
DigitalOcean /dev/vda /dev/vdb+

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.


Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.


Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.


Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.


Key Takeaways

  • Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
  • The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
  • Parallel multi-provider builds produce images with identical compliance grades on the same schedule
  • Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
  • Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

Hardening Blueprint as Code — Declare Your OS Baseline in YAML

Reading Time: 6 minutes

OS Hardening as Code, Episode 2
Cloud AMI Security Risks · Linux Hardening as Code**


TL;DR

  • A hardening runbook is a list of steps someone runs. A HardeningBlueprint YAML is a build artifact — if it wasn’t applied, the image doesn’t exist
  • Linux hardening as code means declaring your entire OS security baseline in a single YAML file and building it reproducibly across any provider
  • stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws either produces a hardened image or fails — there is no partial state
  • The blueprint includes: target OS/provider, compliance benchmark, Ansible roles, and per-control overrides with documented reasons
  • One blueprint file = one source of truth for your hardening posture, version-controlled and reviewable like any other infrastructure code
  • Post-build OpenSCAP scan runs automatically — the image only snapshots if it passes

The Problem: A Runbook That Gets Skipped Once Is a Runbook That Gets Skipped

Hardening runbook
       │
       ▼
  Human executes
  steps manually
       │
       ├─── 47 deployments: followed correctly
       │
       └─── 1 deployment at 2am: step 12 skipped
                    │
                    ▼
           Instance in production
           without audit logging,
           SSH password auth enabled,
           unnecessary services running

Linux hardening as code eliminates the human decision point. If the blueprint wasn’t applied, the image doesn’t exist.

EP01 showed that default cloud AMIs arrive pre-broken — unnecessary services, no audit logging, weak kernel parameters, SSH configured for convenience not security. The obvious response is a hardening script. But a script run by a human is still a process step. It can be skipped. It can be done halfway. It can drift across different engineers who each interpret “run the hardening script” slightly differently.


A production deployment last year. The platform team had a solid CIS L1 hardening runbook — 68 steps, well-documented, followed consistently. Then a critical incident at 2am required three new instances to be deployed on short notice. The engineer on call ran the provisioning script and, under pressure, skipped the hardening step with the intention of running it the next morning.

They didn’t. The three instances stayed in production unhardened for six weeks before an automated scan caught them. Audit logging wasn’t configured. SSH was accepting password authentication. Two unnecessary services were running that weren’t in the approved software list.

Nothing was breached. But the finding went into the next compliance report as a gap, the team spent a week remediating, and the post-mortem conclusion was “we need better runbook discipline.”

That’s the wrong conclusion. The runbook isn’t the problem. The problem is that hardening was a process step instead of a build constraint.


What Linux Hardening as Code Actually Means

Linux hardening as code is the same principle as infrastructure as code applied to OS security posture: the desired state is declared in a file, the file is the source of truth, and the execution is deterministic and repeatable.

HardeningBlueprint YAML
         │
         ▼
  stratum build
         │
  ┌──────┴──────────────────┐
  │  Provider Layer          │
  │  (cloud-init, disk       │
  │   names, metadata        │
  │   endpoint per provider) │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  Ansible-Lockdown        │
  │  (CIS L1/L2, STIG —      │
  │   the hardening steps)   │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  OpenSCAP Scanner        │
  │  (post-build verify)     │
  └──────┬──────────────────┘
         │
         ▼
  Golden Image (AMI/GCP image/Azure image)
  + Compliance grade in image metadata

The YAML file is what you write. Stratum handles the rest.


The HardeningBlueprint YAML

The blueprint is the complete, auditable declaration of your OS security posture:

# ubuntu22-cis-l1.yaml
name: ubuntu22-cis-l1
description: Ubuntu 22.04 CIS Level 1 baseline for production workloads
version: "1.0"

target:
  os: ubuntu
  version: "22.04"
  provider: aws
  region: ap-south-1
  instance_type: t3.medium

compliance:
  benchmark: cis-l1
  controls: all

hardening:
  - ansible-lockdown/UBUNTU22-CIS
  - role: custom-audit-logging
    vars:
      audit_log_retention_days: 90
      audit_max_log_file: 100

filesystem:
  tmp:
    type: tmpfs
    options: [nodev, nosuid, noexec]
  home:
    options: [nodev]

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"
  - id: 5.2.4
    override: compliant
    reason: "SSH timeout managed by session manager policy, not sshd_config"

Each section is explicit:

target — which OS, which version, which provider. This is the only provider-specific section. The compliance intent below it is portable.

compliance — which benchmark and which controls to apply. controls: all means every CIS L1 control. You can also specify controls: [1.x, 2.x] to scope to specific sections.

hardening — which Ansible roles to run. ansible-lockdown/UBUNTU22-CIS is the community CIS hardening role. You can add custom roles alongside it.

controls — documented exceptions. Not suppressions — overrides with a recorded reason. This is the difference between “we turned off this control” and “this control is satisfied by an equivalent implementation, documented here.”


Building the Image

# Validate the blueprint before building
stratum blueprint validate ubuntu22-cis-l1.yaml

# Build — this will take 15-20 minutes
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws

# Output:
# [15:42:01] Launching build instance...
# [15:42:45] Running ansible-lockdown/UBUNTU22-CIS (144 tasks)...
# [15:51:33] Running custom-audit-logging role...
# [15:52:11] Running post-build OpenSCAP scan (benchmark: cis-l1)...
# [15:54:08] Grade: A (98/100 controls passing)
# [15:54:09] 2 controls overridden (documented in blueprint)
# [15:54:10] Creating AMI snapshot: ami-0a7f3c9e82d1b4c05
# [15:54:47] Done. AMI tagged with compliance grade: cis-l1-A-98

If the post-build scan comes back below a configurable threshold, the build fails — no AMI is created. The instance is terminated. The image does not exist.

That is the structural guarantee. You cannot skip a build step at 2am because at 2am you’re calling stratum build, not running steps manually.


The Control Override Mechanism

The override mechanism is what separates this from checkbox compliance.

Every security benchmark has controls that conflict with how production environments actually work. CIS L1 recommends /tmp on a separate partition. Many cloud instances use tmpfs with equivalent nodev, nosuid, noexec mount options. The intent of the control is satisfied. The literal implementation differs.

Without an override mechanism, you have two bad options: fail the scan (noisy, meaningless), or configure the scanner to ignore the control (undocumented, invisible to auditors).

The blueprint’s controls section gives you a third option: record the override, document the reason, and let the scanner count it as compliant. The SARIF output and the compliance grade both reflect the documented state.

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"

This appears in the build log, in the SARIF export, and in the image metadata. An auditor reading the output sees: control 1.1.2 — compliant, documented exception, reason recorded. Not: control 1.1.2 — ignored.


What the Blueprint Gives You That a Script Doesn’t

Hardening script HardeningBlueprint YAML
Version-controlled Possible but not enforced Always — it’s a file
Auditable exceptions Typically not Built-in override mechanism
Post-build verification Manual or none Automatic OpenSCAP scan
Image exists only if hardened No Yes — build fails if scan fails
Multi-cloud portability Requires separate scripts Provider flag, same YAML
Drift detection Not possible Rescan instance against original grade
Skippable at 2am Yes No — you’d have to change the build process

The last row is the one that matters. A script is skippable because there’s a human in the loop. A blueprint is a build artifact — you can’t deploy the image without the blueprint having been applied, because the image is what the blueprint produces.


Validating a Blueprint Before Building

# Syntax and schema validation
stratum blueprint validate ubuntu22-cis-l1.yaml

# Dry-run — show what Ansible tasks will run, what controls will be checked
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws --dry-run

# Show all available controls for a benchmark
stratum blueprint controls --benchmark cis-l1 --os ubuntu --version 22.04

# Show what a specific control checks
stratum blueprint controls --id 1.1.2 --benchmark cis-l1

The dry-run output shows every Ansible task that will run, every OpenSCAP check that will fire, and flags any controls that might conflict with the provider environment before you’ve launched a build instance.


Production Gotchas

Build time is 15–25 minutes. Ansible-Lockdown applies 144+ tasks for CIS L1. Build this into your pipeline timing — don’t expect golden images in 3 minutes.

Cloud-init ordering matters. On AWS, certain hardening steps (sysctl tuning, PAM configuration) interact with cloud-init. The Stratum provider layer handles sequencing — but if you add custom hardening roles, test the cloud-init interaction explicitly.

Some CIS controls conflict with managed service requirements. AWS Systems Manager Session Manager requires specific SSH configuration. RDS requires specific networking settings. Use the controls override section to document these — don’t suppress them silently.

Kernel parameter hardening requires a reboot. Controls in the 3.x (network parameters) and 1.5.x (kernel modules) sections apply sysctl changes that take effect on reboot. The Stratum build process reboots the instance before the OpenSCAP scan — don’t skip the reboot if you’re building manually.


Key Takeaways

  • Linux hardening as code means the blueprint YAML is the build artifact — the image either exists and is hardened, or it doesn’t exist
  • The controls override mechanism is the difference between undocumented suppressions and auditable, reasoned exceptions
  • Post-build OpenSCAP scan runs automatically — a failing grade blocks image creation
  • One blueprint file is portable across providers (EP03 covers this): the compliance intent stays in the YAML, the cloud-specific details go in the provider layer
  • Version-controlling the blueprint gives you a complete history of what your OS security posture was at any point in time — the same way Terraform state tracks infrastructure

What’s Next

One blueprint, one provider. EP02 showed that the skip-at-2am problem is solved when hardening is a build artifact rather than a process step.

What it didn’t address: what happens when you expand to a second cloud. GCP uses different disk names. Azure cloud-init fires in a different order. The AWS metadata endpoint IP is different from every other provider. If you maintain separate hardening scripts per cloud, they drift within a month.

EP03 covers multi-cloud OS hardening: the same blueprint, six providers, no drift.

Next: multi-cloud OS hardening — one blueprint for AWS, GCP, and Azure

Get EP03 in your inbox when it publishes → linuxcent.com/subscribe