~2,800 words · Reading time: 12 min · Series: OS Image Security, Post 1 of 6
When you launch an EC2 instance from an AWS Marketplace AMI, or spin up a VM from a cloud-provider base image on GCP or Azure, you’re trusting a decision someone else made months ago about what your server should contain. That decision was made for the widest possible audience — not for your workload, your threat model, or your compliance requirements.
This post tears open what’s actually inside a default cloud image, compares it against what a production-hardened image should contain, and explains why the calculus changes depending on whether you’re deploying to AWS, an on-prem KVM host, or a Nutanix AHV cluster.
What a cloud provider is actually optimising for
AWS, Canonical, Red Hat, and every other publisher shipping to cloud marketplaces are solving a distribution problem, not a security problem. Their images need to:
- Boot successfully on any instance type in any region
- Work for the first-time user running their first workload
- Support every possible use case — web servers, databases, ML training jobs, bastion hosts, everything
That constraint produces images that are, by design, permissive. Permissive gets out of the way. Permissive doesn’t break anything on day one. Permissive is also the opposite of what you want on a production server.
Let’s look at what “permissive” actually means in concrete terms.
Dissecting a default AWS AMI
Take Amazon Linux 2023 (AL2023), one of the more intentionally stripped-down cloud images available. Even with Amazon’s effort to reduce its footprint compared to AL2, a fresh AL2023 instance ships with more than most workloads need.
Services running at boot that most workloads don’t need
chronyd.service # Fine — you need NTP
systemd-resolved.service # Fine
dbus-broker.service # Fine
amazon-ssm-agent.service # Arguably fine if you use SSM
NetworkManager.service # Debatable — most cloud workloads don't need NM
On a RHEL 8/9 or Ubuntu 22.04 Marketplace image, the list is longer. You’ll find avahi-daemon (mDNS/DNS-SD service discovery — on a server), bluetooth.service in some configurations, cups on some RHEL variants, and on Ubuntu, snapd running and occupying memory along with its associated mount units.
Every running service is an attack surface. Every socket it opens is a listening endpoint you didn’t ask for.
SSH configuration out of the box
The default sshd_config on most Marketplace images is not hardened. You’ll typically find:
PermitRootLogin prohibit-password # Better than 'yes', but not 'no'
PasswordAuthentication no # Usually disabled by cloud-init — good
X11Forwarding yes # On a headless server. Why?
AllowAgentForwarding yes # Unnecessary for most workloads
PrintLastLog yes # Minor, but generates audit noise
MaxAuthTries 6 # CIS recommends 4 or fewer
ClientAliveInterval 0 # No idle timeout
CIS Benchmark Level 1 for RHEL 9 has 40+ SSH-specific controls. A default image satisfies perhaps a third of them.
Kernel parameters that aren’t tuned
# Not set, or not set correctly, on most default images:
net.ipv4.conf.all.send_redirects = 1 # Should be 0
net.ipv4.conf.default.accept_redirects = 1 # Should be 0
net.ipv4.ip_forward = 0 # Correct if not a router, but often left unset
kernel.randomize_va_space = 2 # Usually correct — verify anyway
fs.suid_dumpable = 0 # Often not set
kernel.dmesg_restrict = 1 # Rarely set
These live in /etc/sysctl.d/ and need to be explicitly applied. In a default AMI, they are not.
No audit daemon configured
auditd is installed on most RHEL-family images. It is not configured. The default audit.rules file is essentially empty — the daemon runs but captures almost nothing. On Ubuntu, auditd isn’t even installed by default.
CIS Benchmark Level 2 for RHEL 9 specifies 30+ auditd rules covering file access, privilege escalation, user management changes, network configuration changes, and more. None of them are present in a default AMI.
Package surface
Run rpm -qa | wc -l or dpkg -l | grep -c ^ii on a fresh instance. AL2023 comes in around 350 packages. Ubuntu 22.04 Server minimal sits around 500. RHEL 9 from Marketplace — depending on the variant — lands between 400 and 600.
How many of those packages does your application actually need? For a Python web service: Python, your runtime dependencies, and a handful of system libraries. The rest is exposure.
The on-prem story is different — and often worse
Cloud images at least get regular updates from their publishers. On-prem KVM and Nutanix environments tell a different story.
The KVM / QCOW2 situation
Most teams running KVM get their base images one of three ways:
- Download a cloud image (cloud-init enabled QCOW2) from the distro vendor and use it directly
- Convert an existing VMware VMDK or OVA and hope for the best
- Run a manual Kickstart/Preseed install once, then treat the result as the “golden image” forever
Option 1 gives you the same problems as the cloud image analysis above, plus you’re now responsible for handling cloud-init in an environment that might not have a metadata service — so you either ship a seed ISO with every VM, or you rip out cloud-init and manage first-boot differently.
Option 3 is the most common and the most dangerous. That “golden image” was created by someone who’s possibly no longer at the company, contains packages pinned to versions from 18 months ago, and has sshd configured however was convenient at the time. Worse, it gets cloned hundreds of times and none of those clones are ever individually updated at the image level.
The Nutanix AHV specifics
Nutanix AHV images have additional considerations that cloud images don’t deal with:
- AHV uses a custom paravirtualised SCSI controller (
virtio-scsior the Nutanix variant). Images imported from VMware needpvscsidrivers removed andvirtio_scsiadded to the initramfs before the disk will be detected at boot. - The Nutanix guest tools agent (
ngt) is separate from the kernel and needs to be installed inside the image for snapshot quiescence, VSS integration, and in-guest metrics. cloud-initworks on AHV but requires theConfigDrivedatasource — not theEC2datasource that most cloud QCOW2 images default to. An unconfigured datasource means cloud-init times out at boot, costing 3–5 minutes on every first start.- NUMA topology on large AHV nodes affects memory allocation in ways that need kernel tuning (
vm.zone_reclaim_mode,kernel.numa_balancing) — parameters no generic cloud image sets.
The result is that most Nutanix environments end up with a patchwork: partially converted images, manually applied guest tools, and hardening that was done once per environment rather than once per image.
What a hardened image actually looks like
A properly built hardened image isn’t just “a default image with some hardening applied at the end.” The hardening is architectural — decisions made at build time that change the fundamental shape of what’s inside the image.
Package set — minimal by design
Start from a minimal install group — @minimal-environment on RHEL/Rocky, --variant=minbase on Debian derivatives. Then add only what the image class requires. For a web server image: your runtime, a process supervisor, and nothing else. No man-db, no X11-common, no avahi.
Every package you don’t install is a CVE that can never affect you.
Filesystem hardening
Separate mount points with restrictive options prevent a class of privilege escalation attacks that depend on executing binaries from world-writable locations:
/tmp nodev,nosuid,noexec
/var nodev,nosuid
/var/tmp nodev,nosuid,noexec
/home nodev,nosuid
/dev/shm nodev,nosuid,noexec
These are not applied by any default cloud image.
Kernel parameters — baked in at build time
# /etc/sysctl.d/99-hardening.conf
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.log_martians = 1
net.ipv6.conf.all.accept_redirects = 0
kernel.randomize_va_space = 2
fs.suid_dumpable = 0
kernel.dmesg_restrict = 1
kernel.kptr_restrict = 2
net.core.bpf_jit_harden = 2
Applied at image build time. Present on every instance, every time, before your application code runs.
SSH locked down
Protocol 2
PermitRootLogin no
MaxAuthTries 4
LoginGraceTime 60
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no
PermitUserEnvironment no
Ciphers [email protected],[email protected],aes256-ctr
MACs [email protected],[email protected]
KexAlgorithms curve25519-sha256,diffie-hellman-group16-sha512
ClientAliveInterval 300
ClientAliveCountMax 3
Banner /etc/issue.net
This is approximately CIS Level 1 SSH hardening. It lives in the image — not in a post-deploy playbook.
auditd rules embedded
# Privilege escalation
-a always,exit -F arch=b64 -S execve -C uid!=euid -F euid=0 -k setuid
# Sudo usage
-w /etc/sudoers -p wa -k sudoers
# User and group management
-w /etc/passwd -p wa -k identity
-w /etc/group -p wa -k identity
# Kernel module loading
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
The full CIS L2 auditd ruleset runs to ~60 rules. They’re all committed to the image. Every instance generates audit logs from minute one of its existence.
Services disabled at build time
systemctl disable avahi-daemon
systemctl disable cups
systemctl disable postfix
systemctl disable bluetooth
systemctl disable rpcbind
systemctl mask debug-shell.service
The service list varies by distro. The principle is the same: if it’s not required by the image’s purpose, it doesn’t run.
The platform dimension: why you can’t use one image everywhere
This is where the complexity gets real. A CIS-hardened RHEL 9 image built for AWS doesn’t directly work on KVM, and it doesn’t directly work on Nutanix either. The security controls are the same — the platform-specific layer underneath them is not.
Here’s what needs to differ per target platform:
| Concern | AWS (AMI) | KVM (QCOW2) | Nutanix AHV |
|---|---|---|---|
| Disk format | Raw / VMDK → AMI | QCOW2 | QCOW2 / VMDK |
| Boot mechanism | GRUB2 + PVGRUB2 or UEFI | GRUB2 | GRUB2 + UEFI |
| Network driver | ENA (ena kernel module) | virtio-net | virtio-net |
| Storage driver | NVMe or xen-blkfront | virtio-blk / virtio-scsi | virtio-scsi |
| cloud-init datasource | ec2 | NoCloud / ConfigDrive | ConfigDrive |
| Guest agent | AWS SSM / CloudWatch | qemu-guest-agent | Nutanix Guest Tools |
| Metadata service | 169.254.169.254 | None (seed ISO) or local | Nutanix AOS |
A single pipeline needs to produce platform-specific artefacts from a single hardened source. The hardening doesn’t change. The drivers, datasources, and agents do.
Where this sits relative to CIS and NIST
The controls described above aren’t arbitrary. They map directly to published frameworks.
CIS Benchmark Level 1 covers controls with low operational impact and high security return — SSH configuration, kernel parameters, filesystem mount options, service reduction. Almost everything in the “what a hardened image looks like” section above is CIS Level 1.
CIS Benchmark Level 2 adds auditd configuration, PAM controls, additional filesystem protections, and more aggressive service disablement. It trades some operational flexibility for a significantly smaller attack surface.
NIST SP 800-53 CM-6 (Configuration Settings) directly requires that systems be configured to the most restrictive settings consistent with operational requirements. Baking hardening into the image is a stronger implementation of CM-6 than applying it post-deploy — because it’s guaranteed, auditable at build time, and consistent across every instance regardless of how it was launched.
NIST SP 800-53 SI-2 (Flaw Remediation) maps to your image patching cadence. An image rebuilt monthly against the latest package repositories satisfies SI-2 more completely than runtime patching alone, because it also eliminates packages you don’t need — packages that would need patching if they were present.
The full CIS and NIST control mapping will be covered in depth later in this series.
The build-time vs runtime hardening distinction
This is the most important concept in the entire post.
Hardening applied at runtime — via Ansible, Chef, cloud-init user-data, or a shell script — is conditional. It runs if the automation runs. It applies if nothing fails. It’s consistent only if every deployment goes through exactly the same path.
Hardening embedded in the image is unconditional. It cannot be skipped. It doesn’t depend on connectivity to an Ansible control node. It doesn’t require cloud-init to succeed. It cannot be accidentally omitted by a new team member who doesn’t know the runbook.
This distinction matters most at incident response time. When you’re investigating a compromised instance, the first question you want to answer confidently is: was this instance ever in a known-good state?
- If your hardening is in the image: yes, from boot.
- If your hardening is applied post-deploy: it depends on whether everything went right on that specific instance’s first boot.
What comes next
The practical question this raises: how do you build these images in a repeatable, multi-platform way, with CIS scanning integrated into the build pipeline?
Packer covers most of the builder layer. OpenSCAP provides the scanning. Kickstart, cloud-init, and Nutanix AHV-specific tooling fill the gaps. But the orchestration between these — producing a consistent hardened image for three different target platforms from a single source of truth — is where most teams hit friction.
The next post in this series covers the platform-specific differences between AWS, KVM, and Nutanix in depth: what actually needs to change per target when your security baseline is shared.
Next in the series: Cloud vs KVM vs Nutanix — why one image doesn’t fit all →
Questions or corrections? Open an issue or reach me on LinkedIn. If this was useful, the series index has the full roadmap.
)
)

