One Blueprint, Six Clouds — Multi-Provider OS Image Builds

Reading Time: 6 minutes

OS Hardening as Code, Episode 3
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening**


TL;DR

  • Multi-cloud OS hardening with separate scripts per provider means three scripts that drift within weeks
  • A HardeningBlueprint YAML separates compliance intent (portable) from provider details (handled by Stratum’s provider layer)
  • The same blueprint builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox with a single --provider flag change
  • Provider-specific differences — disk names, cloud-init ordering, metadata endpoint IPs — are abstracted away from the blueprint author
  • One YAML file becomes the single source of truth for OS security posture across your entire fleet, regardless of cloud
  • Drift detection works fleet-wide: rescan any instance against the original blueprint grade on any provider

The Problem: Three Clouds, Three Scripts, Three Ways to Drift

AWS hardening script          GCP hardening script          Azure hardening script
├── /dev/xvd* disk refs       ├── /dev/sda* disk refs       ├── /dev/sda* disk refs
├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS      ├── 169.254.169.254 IMDS
├── cloud-init order A        ├── cloud-init order B        ├── cloud-init order C
└── Updated: Jan 2025         └── Updated: Aug 2024         └── Updated: Mar 2024
                                         │
                                         └─ 5 months behind
                                            on CIS updates

Multi-cloud OS hardening starts as a copy-paste of the AWS script. Within a month, the clouds diverge.

EP02 showed that a HardeningBlueprint YAML eliminates the skip-at-2am problem by making hardening a build artifact. What it assumed — quietly — is that you’re building for one provider. The moment you expand to a second cloud, the provider-specific details in the blueprint become a problem: disk names differ, cloud-init fires in a different order, and AWS-specific assumptions break silently on GCP.


We expanded from AWS to GCP six months ago. The EC2 hardening script had been working reliably for over a year. The GCP engineer took the AWS script, made some quick changes, and started building images.

The first GCP images had a subtle problem: the /tmp and /home separate partition entries in /etc/fstab referenced /dev/xvdb — an AWS disk naming convention. GCP uses /dev/sdb. The fstab entries were silently ignored. The mounts existed but weren’t restricted. The CIS controls for separate filesystem partitions were listed as passing in the scan output because the Ansible task had “run successfully” — it just hadn’t done what we thought.

It took a pentest three months later to catch it. The finding: six production GCP instances with /tmp not mounted with noexec, nosuid, nodev — despite our “CIS L1 hardened” label.

The root cause wasn’t the engineer. It was a hardening approach that required cloud-specific knowledge embedded in the script rather than in a provider abstraction layer.


How Stratum Separates Compliance Intent from Provider Details

Multi-cloud OS hardening works when the compliance intent and the provider details are kept strictly separate.

HardeningBlueprint YAML
(compliance intent — portable)
         │
         ▼
  Stratum Provider Layer
  ┌─────────────────────────────────────────────┐
  │  AWS         │  GCP         │  Azure        │
  │  /dev/xvd*   │  /dev/sda*   │  /dev/sda*    │
  │  IMDS v2     │  GCP IMDS    │  Azure IMDS   │
  │  cloud-init  │  cloud-init  │  waagent       │
  │  order A     │  order B     │  order C       │
  └─────────────────────────────────────────────┘
         │
         ▼
  Ansible-Lockdown + Provider-Aware Configuration
         │
         ▼
  OpenSCAP Scan
         │
         ▼
  Golden Image (AMI / GCP Image / Azure Image)

The blueprint author declares what should be true about the OS. Stratum’s provider layer handles how that’s achieved on each cloud.

The disk naming, cloud-init sequencing, metadata endpoint configuration, and provider-specific package repositories are all abstracted into the provider layer. They never appear in the blueprint file.


The Same Blueprint Across Six Providers

# Build the same baseline on three clouds
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws
stratum build --blueprint ubuntu22-cis-l1.yaml --provider gcp
stratum build --blueprint ubuntu22-cis-l1.yaml --provider azure

# The other three supported providers
stratum build --blueprint ubuntu22-cis-l1.yaml --provider digitalocean
stratum build --blueprint ubuntu22-cis-l1.yaml --provider linode
stratum build --blueprint ubuntu22-cis-l1.yaml --provider proxmox

The blueprint file is identical across all six. The output — AMI, GCP machine image, Azure managed image — is equivalent in terms of security posture. The same 144 CIS L1 controls apply. The same OpenSCAP scan runs. The same grade lands in the image metadata.

If you change the blueprint — add a control, update the Ansible role version, add a custom audit logging configuration — you rebuild all providers from the same source and all images come out consistent.


What the Provider Layer Handles

The provider layer is where the cloud-specific knowledge lives, so the blueprint author doesn’t have to carry it:

Disk naming:

Provider OS disk Ephemeral Data
AWS /dev/xvda /dev/xvdb /dev/xvdc+
GCP /dev/sda /dev/sdb+
Azure /dev/sda /dev/sdb (temp disk) /dev/sdc+
DigitalOcean /dev/vda /dev/vdb+

The CIS controls for separate /tmp and /home partitions reference disk paths that differ across these providers. The provider layer translates the blueprint’s filesystem.tmp declaration into the correct fstab entries for the target cloud.

Cloud-init ordering:

Different providers initialize services in different orders. On AWS, the network is available before cloud-init runs most tasks. On GCP, some network configuration happens after cloud-init starts. On Azure, the waagent handles some configuration that cloud-init handles elsewhere.

The provider layer sequences the hardening steps to run in the correct order for each provider — specifically, it waits for network availability before applying network-level hardening, and ensures the package manager is configured before running Ansible roles that require package installation.

Metadata endpoint configuration:

CIS controls include restrictions on access to the instance metadata service (IMDSv2 enforcement on AWS, equivalent controls on GCP/Azure). The provider layer applies the correct restriction for each cloud — the blueprint just declares compliance: benchmark: cis-l1.


Building for All Providers Simultaneously

For fleet standardization, you can build all providers in a single operation:

# Build for all providers in parallel
stratum build \
  --blueprint ubuntu22-cis-l1.yaml \
  --provider aws,gcp,azure

# Output:
# [aws]   Launching build instance in ap-south-1...
# [gcp]   Launching build instance in asia-south1...
# [azure] Launching build instance in southindia...
# ...
# [aws]   Grade: A (98/100) — ami-0a7f3c9e82d1b4c05
# [gcp]   Grade: A (98/100) — projects/my-project/global/images/ubuntu22-cis-l1-20260419
# [azure] Grade: A (98/100) — /subscriptions/.../images/ubuntu22-cis-l1-20260419

All three builds run in parallel. All three images carry identical compliance grades. The image names embed the date and grade for easy identification.


Blueprint Versioning and Drift Detection

Version-controlling the blueprint file solves a problem that multi-cloud environments hit consistently: knowing what your OS security posture was six months ago.

# Check the current state of a fleet instance against the blueprint
stratum scan --instance i-0abc123 --blueprint ubuntu22-cis-l1.yaml

# Compare against original build grade
# Output:
# Instance: i-0abc123 (aws, ap-south-1)
# Original grade (build): A (98/100) — 2026-01-15
# Current grade (scan):   B (89/100) — 2026-04-19
# 
# Drifted controls (9):
#   3.3.2  — TCP SYN cookies: FAIL (sysctl net.ipv4.tcp_syncookies=0)
#   5.3.2  — sudo log_input: FAIL (removed from /etc/sudoers.d/)
#   ...

Drift detection compares the current instance state against the blueprint that built it. Controls that passed at build time and now fail indicate configuration drift — something changed after the image was deployed. This is how you find the three instances that a sysadmin “temporarily” modified and never reverted.


Production Gotchas

Provider-specific CIS controls exist. CIS AWS Foundations Benchmark and CIS GCP Benchmark include cloud-specific controls (VPC flow logs, CloudTrail, etc.) that are separate from the OS-level CIS controls. The blueprint handles OS-level controls. Cloud-level controls (IAM, logging, network configuration) belong in your cloud security posture management tooling.

Build costs vary by provider. On AWS, the build instance is a t3.medium for 15–20 minutes (~$0.02). On GCP and Azure, equivalent pricing applies. For multi-provider builds, run them in regions close to your primary workloads to minimize image transfer time.

Proxmox builds require a local Stratum agent. Unlike cloud providers, Proxmox doesn’t have an API that Stratum can reach from outside. The Proxmox provider requires the Stratum agent running on the Proxmox host. The build process and blueprint format are identical; only the network topology differs.

GCP image sharing across projects requires explicit IAM. GCP machine images aren’t automatically available to other projects in the organization. After building, run stratum image share --provider gcp --image ubuntu22-cis-l1-20260419 --projects

or configure sharing at the organization level.


Key Takeaways

  • Multi-cloud OS hardening with separate scripts per provider creates inevitable drift; a provider-abstracted blueprint eliminates it
  • The same HardeningBlueprint YAML builds on AWS, GCP, Azure, DigitalOcean, Linode, and Proxmox — the compliance intent is in the file, the provider details are in Stratum’s provider layer
  • Parallel multi-provider builds produce images with identical compliance grades on the same schedule
  • Drift detection works fleet-wide: any instance on any provider can be rescanned against the blueprint that built it
  • Blueprint version control is the single source of truth for OS security posture history — what was true on any given date, across any provider

What’s Next

One blueprint, six clouds, identical compliance grades. EP03 showed that the multi-cloud drift problem disappears when provider details are abstracted away from the blueprint.

What neither EP02 nor EP03 answered is the auditor’s question: how do you know the image is actually compliant? “We ran CIS L1” is not an answer. “Grade A, 98/100 controls, SARIF export attached” is.

EP04 covers automated OpenSCAP compliance: the post-build scan in detail — how the A-F grade is calculated, what controls block an A grade, how SARIF exports work, and how drift detection catches what changed after deployment.

Next: automated OpenSCAP compliance — CIS benchmark grading before deployment

Get EP04 in your inbox when it publishes → linuxcent.com/subscribe

Hardening Blueprint as Code — Declare Your OS Baseline in YAML

Reading Time: 6 minutes

OS Hardening as Code, Episode 2
Cloud AMI Security Risks · Linux Hardening as Code**


TL;DR

  • A hardening runbook is a list of steps someone runs. A HardeningBlueprint YAML is a build artifact — if it wasn’t applied, the image doesn’t exist
  • Linux hardening as code means declaring your entire OS security baseline in a single YAML file and building it reproducibly across any provider
  • stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws either produces a hardened image or fails — there is no partial state
  • The blueprint includes: target OS/provider, compliance benchmark, Ansible roles, and per-control overrides with documented reasons
  • One blueprint file = one source of truth for your hardening posture, version-controlled and reviewable like any other infrastructure code
  • Post-build OpenSCAP scan runs automatically — the image only snapshots if it passes

The Problem: A Runbook That Gets Skipped Once Is a Runbook That Gets Skipped

Hardening runbook
       │
       ▼
  Human executes
  steps manually
       │
       ├─── 47 deployments: followed correctly
       │
       └─── 1 deployment at 2am: step 12 skipped
                    │
                    ▼
           Instance in production
           without audit logging,
           SSH password auth enabled,
           unnecessary services running

Linux hardening as code eliminates the human decision point. If the blueprint wasn’t applied, the image doesn’t exist.

EP01 showed that default cloud AMIs arrive pre-broken — unnecessary services, no audit logging, weak kernel parameters, SSH configured for convenience not security. The obvious response is a hardening script. But a script run by a human is still a process step. It can be skipped. It can be done halfway. It can drift across different engineers who each interpret “run the hardening script” slightly differently.


A production deployment last year. The platform team had a solid CIS L1 hardening runbook — 68 steps, well-documented, followed consistently. Then a critical incident at 2am required three new instances to be deployed on short notice. The engineer on call ran the provisioning script and, under pressure, skipped the hardening step with the intention of running it the next morning.

They didn’t. The three instances stayed in production unhardened for six weeks before an automated scan caught them. Audit logging wasn’t configured. SSH was accepting password authentication. Two unnecessary services were running that weren’t in the approved software list.

Nothing was breached. But the finding went into the next compliance report as a gap, the team spent a week remediating, and the post-mortem conclusion was “we need better runbook discipline.”

That’s the wrong conclusion. The runbook isn’t the problem. The problem is that hardening was a process step instead of a build constraint.


What Linux Hardening as Code Actually Means

Linux hardening as code is the same principle as infrastructure as code applied to OS security posture: the desired state is declared in a file, the file is the source of truth, and the execution is deterministic and repeatable.

HardeningBlueprint YAML
         │
         ▼
  stratum build
         │
  ┌──────┴──────────────────┐
  │  Provider Layer          │
  │  (cloud-init, disk       │
  │   names, metadata        │
  │   endpoint per provider) │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  Ansible-Lockdown        │
  │  (CIS L1/L2, STIG —      │
  │   the hardening steps)   │
  └──────┬──────────────────┘
         │
  ┌──────┴──────────────────┐
  │  OpenSCAP Scanner        │
  │  (post-build verify)     │
  └──────┬──────────────────┘
         │
         ▼
  Golden Image (AMI/GCP image/Azure image)
  + Compliance grade in image metadata

The YAML file is what you write. Stratum handles the rest.


The HardeningBlueprint YAML

The blueprint is the complete, auditable declaration of your OS security posture:

# ubuntu22-cis-l1.yaml
name: ubuntu22-cis-l1
description: Ubuntu 22.04 CIS Level 1 baseline for production workloads
version: "1.0"

target:
  os: ubuntu
  version: "22.04"
  provider: aws
  region: ap-south-1
  instance_type: t3.medium

compliance:
  benchmark: cis-l1
  controls: all

hardening:
  - ansible-lockdown/UBUNTU22-CIS
  - role: custom-audit-logging
    vars:
      audit_log_retention_days: 90
      audit_max_log_file: 100

filesystem:
  tmp:
    type: tmpfs
    options: [nodev, nosuid, noexec]
  home:
    options: [nodev]

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"
  - id: 5.2.4
    override: compliant
    reason: "SSH timeout managed by session manager policy, not sshd_config"

Each section is explicit:

target — which OS, which version, which provider. This is the only provider-specific section. The compliance intent below it is portable.

compliance — which benchmark and which controls to apply. controls: all means every CIS L1 control. You can also specify controls: [1.x, 2.x] to scope to specific sections.

hardening — which Ansible roles to run. ansible-lockdown/UBUNTU22-CIS is the community CIS hardening role. You can add custom roles alongside it.

controls — documented exceptions. Not suppressions — overrides with a recorded reason. This is the difference between “we turned off this control” and “this control is satisfied by an equivalent implementation, documented here.”


Building the Image

# Validate the blueprint before building
stratum blueprint validate ubuntu22-cis-l1.yaml

# Build — this will take 15-20 minutes
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws

# Output:
# [15:42:01] Launching build instance...
# [15:42:45] Running ansible-lockdown/UBUNTU22-CIS (144 tasks)...
# [15:51:33] Running custom-audit-logging role...
# [15:52:11] Running post-build OpenSCAP scan (benchmark: cis-l1)...
# [15:54:08] Grade: A (98/100 controls passing)
# [15:54:09] 2 controls overridden (documented in blueprint)
# [15:54:10] Creating AMI snapshot: ami-0a7f3c9e82d1b4c05
# [15:54:47] Done. AMI tagged with compliance grade: cis-l1-A-98

If the post-build scan comes back below a configurable threshold, the build fails — no AMI is created. The instance is terminated. The image does not exist.

That is the structural guarantee. You cannot skip a build step at 2am because at 2am you’re calling stratum build, not running steps manually.


The Control Override Mechanism

The override mechanism is what separates this from checkbox compliance.

Every security benchmark has controls that conflict with how production environments actually work. CIS L1 recommends /tmp on a separate partition. Many cloud instances use tmpfs with equivalent nodev, nosuid, noexec mount options. The intent of the control is satisfied. The literal implementation differs.

Without an override mechanism, you have two bad options: fail the scan (noisy, meaningless), or configure the scanner to ignore the control (undocumented, invisible to auditors).

The blueprint’s controls section gives you a third option: record the override, document the reason, and let the scanner count it as compliant. The SARIF output and the compliance grade both reflect the documented state.

controls:
  - id: 1.1.2
    override: compliant
    reason: "tmpfs /tmp implemented via systemd unit — equivalent control"

This appears in the build log, in the SARIF export, and in the image metadata. An auditor reading the output sees: control 1.1.2 — compliant, documented exception, reason recorded. Not: control 1.1.2 — ignored.


What the Blueprint Gives You That a Script Doesn’t

Hardening script HardeningBlueprint YAML
Version-controlled Possible but not enforced Always — it’s a file
Auditable exceptions Typically not Built-in override mechanism
Post-build verification Manual or none Automatic OpenSCAP scan
Image exists only if hardened No Yes — build fails if scan fails
Multi-cloud portability Requires separate scripts Provider flag, same YAML
Drift detection Not possible Rescan instance against original grade
Skippable at 2am Yes No — you’d have to change the build process

The last row is the one that matters. A script is skippable because there’s a human in the loop. A blueprint is a build artifact — you can’t deploy the image without the blueprint having been applied, because the image is what the blueprint produces.


Validating a Blueprint Before Building

# Syntax and schema validation
stratum blueprint validate ubuntu22-cis-l1.yaml

# Dry-run — show what Ansible tasks will run, what controls will be checked
stratum build --blueprint ubuntu22-cis-l1.yaml --provider aws --dry-run

# Show all available controls for a benchmark
stratum blueprint controls --benchmark cis-l1 --os ubuntu --version 22.04

# Show what a specific control checks
stratum blueprint controls --id 1.1.2 --benchmark cis-l1

The dry-run output shows every Ansible task that will run, every OpenSCAP check that will fire, and flags any controls that might conflict with the provider environment before you’ve launched a build instance.


Production Gotchas

Build time is 15–25 minutes. Ansible-Lockdown applies 144+ tasks for CIS L1. Build this into your pipeline timing — don’t expect golden images in 3 minutes.

Cloud-init ordering matters. On AWS, certain hardening steps (sysctl tuning, PAM configuration) interact with cloud-init. The Stratum provider layer handles sequencing — but if you add custom hardening roles, test the cloud-init interaction explicitly.

Some CIS controls conflict with managed service requirements. AWS Systems Manager Session Manager requires specific SSH configuration. RDS requires specific networking settings. Use the controls override section to document these — don’t suppress them silently.

Kernel parameter hardening requires a reboot. Controls in the 3.x (network parameters) and 1.5.x (kernel modules) sections apply sysctl changes that take effect on reboot. The Stratum build process reboots the instance before the OpenSCAP scan — don’t skip the reboot if you’re building manually.


Key Takeaways

  • Linux hardening as code means the blueprint YAML is the build artifact — the image either exists and is hardened, or it doesn’t exist
  • The controls override mechanism is the difference between undocumented suppressions and auditable, reasoned exceptions
  • Post-build OpenSCAP scan runs automatically — a failing grade blocks image creation
  • One blueprint file is portable across providers (EP03 covers this): the compliance intent stays in the YAML, the cloud-specific details go in the provider layer
  • Version-controlling the blueprint gives you a complete history of what your OS security posture was at any point in time — the same way Terraform state tracks infrastructure

What’s Next

One blueprint, one provider. EP02 showed that the skip-at-2am problem is solved when hardening is a build artifact rather than a process step.

What it didn’t address: what happens when you expand to a second cloud. GCP uses different disk names. Azure cloud-init fires in a different order. The AWS metadata endpoint IP is different from every other provider. If you maintain separate hardening scripts per cloud, they drift within a month.

EP03 covers multi-cloud OS hardening: the same blueprint, six providers, no drift.

Next: multi-cloud OS hardening — one blueprint for AWS, GCP, and Azure

Get EP03 in your inbox when it publishes → linuxcent.com/subscribe