OS Hardening as Code, Episode 5
Cloud AMI Security Risks · Linux Hardening as Code · Multi-Cloud OS Hardening · Automated OpenSCAP Compliance · CI/CD Compliance Gate**
Focus Keyphrase: CI/CD compliance gate
Search Intent: Investigational
Meta Description: A compliance grade no one checks before deploying is decoration. The Pipeline API makes grade a build constraint — unhardened images never reach production. (158 chars)
TL;DR
- A CI/CD compliance gate turns an OS hardening grade from a report into a build constraint — unhardened images fail the pipeline before they can be deployed
POST /api/pipeline/scanreturns pass/fail against a minimum grade threshold — integrates into any CI/CD system that can make an HTTP request- Failed gate output tells engineers exactly which controls failed and what to fix — not just “blocked”
- The gate works on both build-time grades (new images) and runtime grades (existing instances)
- GitHub Actions, GitLab CI, Jenkins, and Tekton integrations are one curl command
- The structural guarantee: an image that doesn’t pass the gate doesn’t exist in the deployment pipeline
The Problem: A Grade No One Checks Is Decoration
Pipeline without compliance gate:
Build → Test → Security scan (results to dashboard) → Deploy
What actually happens:
Build → Test → Security scan → "C grade, but we need to ship" → Deploy anyway
│
└─ Dashboard shows C grade
Nobody is paged
Deployment succeeds
A CI/CD compliance gate means the pipeline can’t continue if the grade is below threshold.
EP04 showed that automated OpenSCAP compliance gives every image a verified, reproducible grade before deployment. What it assumed is that someone checks the grade before deploying. They don’t — not under deadline pressure, not when the image has been “working fine for months,” not at 2am.
The same problem that made hardening runbooks skippable applies to compliance grades: if checking the grade is a discretionary step, it will be skipped.
A new microservice was deployed from an unhardened base image. The team had built it quickly during a sprint, used a community AMI as the base, and planned to harden it “in the next sprint.”
Three weeks later, a penetration test found it. SSH password authentication enabled. Three unnecessary services running — one of them with a known CVE. The finding: the instance had full inbound access from the VPC and was reachable from a compromised adjacent instance.
The deployment had gone through the normal CI/CD pipeline. Unit tests passed. Integration tests passed. A vulnerability scan ran. The scan produced a report that went to a dashboard. Nobody had a gate set up to fail the build if the image was unhardened.
The hardening work from the “next sprint” plan would have taken four hours. The pentest remediation took a week, plus the time to investigate what had been exposed during the three weeks the instance was running.
The CI/CD pipeline had every check except the one that would have caught the base image problem before the first deployment.
The Pipeline API
The Pipeline API is a single HTTP endpoint that takes an image or instance ID, checks it against a minimum grade, and returns pass or fail:
# Fail the pipeline if the image grade is below B
curl -sf -X POST https://stratum.yourdomain.com/api/pipeline/scan \
-H "Authorization: Bearer ${STRATUM_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"image_id": "ami-0a7f3c9e82d1b4c05",
"min_grade": "B"
}'
# Pass response (grade A):
# HTTP 200
# {
# "result": "pass",
# "image_id": "ami-0a7f3c9e82d1b4c05",
# "grade": "A",
# "score": 94,
# "controls_passing": 94,
# "controls_total": 100,
# "scanned_at": "2026-04-19T15:54:10Z"
# }
# Fail response (grade C):
# HTTP 422
# {
# "result": "fail",
# "image_id": "ami-0c9d5e3f81a2b6e07",
# "grade": "C",
# "score": 72,
# "min_grade_required": "B",
# "failing_controls": [
# { "id": "1.1.7", "title": "Separate partition for /var/log/audit", "severity": "medium" },
# { "id": "3.3.2", "title": "TCP SYN cookies enabled", "severity": "low" },
# ...
# ]
# }
A non-200 response fails the pipeline. The || exit 1 in the shell integration handles this — if the API returns 422, the pipeline step exits non-zero and the job fails.
GitHub Actions Integration
# .github/workflows/deploy.yml
jobs:
build-image:
runs-on: ubuntu-latest
outputs:
ami_id: ${{ steps.build.outputs.ami_id }}
steps:
- name: Build hardened AMI
id: build
run: |
AMI_ID=$(stratum build \
--blueprint ubuntu22-cis-l1.yaml \
--provider aws \
--output json | jq -r '.image_id')
echo "ami_id=${AMI_ID}" >> $GITHUB_OUTPUT
compliance-gate:
runs-on: ubuntu-latest
needs: build-image
steps:
- name: Stratum compliance gate
run: |
curl -sf -X POST ${{ vars.STRATUM_URL }}/api/pipeline/scan \
-H "Authorization: Bearer ${{ secrets.STRATUM_TOKEN }}" \
-H "Content-Type: application/json" \
-d "{\"image_id\": \"${{ needs.build-image.outputs.ami_id }}\", \"min_grade\": \"B\"}" \
|| { echo "Compliance gate failed — image does not meet minimum grade B"; exit 1; }
deploy:
runs-on: ubuntu-latest
needs: [build-image, compliance-gate]
steps:
- name: Deploy to staging
run: |
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-asg \
--launch-template "ImageId=${{ needs.build-image.outputs.ami_id }}"
The deploy job only runs if compliance-gate passes. The AMI doesn’t reach the autoscaling group if it doesn’t meet the grade threshold.
GitLab CI Integration
# .gitlab-ci.yml
stages:
- build
- compliance
- deploy
build-image:
stage: build
script:
- |
AMI_ID=$(stratum build \
--blueprint ubuntu22-cis-l1.yaml \
--provider aws \
--output json | jq -r '.image_id')
echo "AMI_ID=${AMI_ID}" >> build.env
artifacts:
reports:
dotenv: build.env
compliance-gate:
stage: compliance
needs: [build-image]
script:
- |
curl -sf -X POST ${STRATUM_URL}/api/pipeline/scan \
-H "Authorization: Bearer ${STRATUM_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"image_id\": \"${AMI_ID}\", \"min_grade\": \"B\"}"
deploy:
stage: deploy
needs: [build-image, compliance-gate]
script:
- ./deploy.sh ${AMI_ID}
What the Failed Gate Tells You
The value of the CI/CD compliance gate is not just that it blocks bad images — it’s that the failure output tells engineers what to fix.
A gate failure in CI shows:
Compliance gate failed.
Image: ami-0c9d5e3f81a2b6e07
Grade: C (72/100)
Required: B (85/100)
Gap: 13 controls failing
Failing controls:
HIGH 1.1.7 Separate partition for /var/log/audit
Fix: Provision /var/log/audit on a separate EBS volume
MEDIUM 1.6.1.3 AppArmor enabled in bootloader
Fix: Update GRUB_CMDLINE_LINUX, run update-grub, reboot
MEDIUM 3.3.2 TCP SYN cookies
Fix: echo "net.ipv4.tcp_syncookies=1" > /etc/sysctl.d/60-cis.conf
LOW 5.2.21 SSH MaxStartups
Fix: Add "MaxStartups 10:30:60" to /etc/ssh/sshd_config
...
View full scan report: https://stratum.yourdomain.com/scans/ami-0c9d5e3f81a2b6e07
This is not a wall — it’s a list of exactly what to fix. The engineer running the pipeline sees the gap, fixes the blueprint or the Ansible role, rebuilds, and the gate passes. The gap is closed before any instance is deployed.
Runtime Gate: Checking Existing Instances
The Pipeline API also works against running instances, not just images:
# Gate on a running instance's current compliance state
curl -sf -X POST https://stratum.yourdomain.com/api/pipeline/scan \
-H "Authorization: Bearer ${STRATUM_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"instance_id": "i-0abc123",
"min_grade": "B",
"scan_type": "runtime"
}'
This is useful in deployment pipelines that don’t build custom AMIs — they launch instances and configure them after launch. The runtime gate runs after configuration is complete and before the instance is registered with the load balancer.
It also integrates into scheduled compliance jobs — scan your fleet on a schedule and alert when any instance drifts below grade threshold.
Grade Thresholds by Environment
Not all environments need the same threshold. A common pattern:
# Environment-specific minimum grades
environments:
production: A # 95%+ passing — no exceptions
staging: B # 85%+ passing — minor gaps acceptable
development: C # 70%+ passing — experimental OK
# Production deploy gate
curl -sf -X POST .../api/pipeline/scan \
-d '{"image_id": "ami-...", "min_grade": "A"}'
# Staging deploy gate
curl -sf -X POST .../api/pipeline/scan \
-d '{"image_id": "ami-...", "min_grade": "B"}'
This lets development move fast with a lower bar while enforcing the highest standard at the production gate.
Production Gotchas
Gate latency on first scan: If the image hasn’t been scanned yet, the Pipeline API triggers a scan on demand. This takes 2–3 minutes. For build pipelines that want instant gate results, use stratum build --blueprint ... --scan-on-build to ensure the scan runs during the build step and the result is cached for the gate call.
Token rotation: The STRATUM_TOKEN used for API authentication should be rotated on the same schedule as other service credentials. Use environment-specific tokens so a compromised staging token doesn’t bypass a production gate.
Webhook notifications on gate failure: The Pipeline API can send a webhook to Slack, PagerDuty, or any endpoint when a gate fails. Configure this for production pipelines so failures are visible beyond the CI log.
# In the Stratum config
notifications:
pipeline_failures:
- type: slack
webhook: ${SLACK_WEBHOOK}
channel: "#platform-security"
- type: webhook
url: ${PAGERDUTY_WEBHOOK}
min_grade: D # only page on D/F, not B/C failures
Key Takeaways
- A CI/CD compliance gate turns a compliance grade from a dashboard metric into a pipeline constraint — the image doesn’t deploy if it doesn’t pass
POST /api/pipeline/scanis a single HTTP call that any CI/CD system can make — no agent, no plugin, no SDK required- Failed gate output is actionable: every failing control includes the specific fix, not just the control ID
- Runtime gates check instances after configuration, not just at image build time
- Environment-specific thresholds let development move faster while enforcing the highest standard at production
What’s Next
The CI/CD compliance gate closes the final gap: even if an unhardened image gets built, it can’t deploy. EP05 is the bookmark episode — this is the point where OS hardening becomes structurally enforced rather than procedurally expected.
EP06 is the series closer. For five episodes, you’ve been using Stratum as a user. What does it look like to run it yourself — extend it with a custom control, add a provider, deploy the platform in your own infrastructure?
Stratum is open-core (Apache 2.0). EP06 is the architecture reveal, the GitHub release, and the extension guide for everything the series taught.
Next: Stratum — open-source OS hardening platform for multi-cloud infrastructure
Get EP06 in your inbox when it publishes → linuxcent.com/subscribe