Enterprise Awakening: RBAC, CRDs, Cloud Providers, and Helm Goes Mainstream (2016–2018)

Reading Time: 6 minutes


Introduction

By the end of 2016, engineers were running Kubernetes in production. Not as an experiment — in production, handling real traffic. And that’s where the real gaps became visible.

The 2016–2018 period is the era when Kubernetes grew up. RBAC went stable. CRDs replaced the fragile ThirdPartyResource hack. The major cloud providers launched managed services. Helm became the standard for packaging. And the security posture, which had been an afterthought in the Borg-derived model, started getting serious attention.


Kubernetes 1.6 — The RBAC Milestone (March 2017)

Kubernetes 1.6 is the release that made enterprise Kubernetes possible. The headline feature: RBAC (Role-Based Access Control) promoted to beta, enabled by default.

Before RBAC, Kubernetes had attribute-based access control (ABAC) — a flat policy file on the API server that required a restart to change. It worked, but it was operationally painful and offered no granularity at the namespace level.

RBAC introduced four objects:
Role: A set of permissions scoped to a namespace
ClusterRole: A set of permissions cluster-wide or reusable across namespaces
RoleBinding: Assigns a Role to a user/group/service account in a namespace
ClusterRoleBinding: Assigns a ClusterRole cluster-wide

# Example: read-only access to pods in the dev namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: dev
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Also in 1.6:
etcd v3 as default: Better performance, watch semantics, and transaction support
Node Authorization mode: Kubelets can now only access secrets and pods bound to their own node — a critical lateral movement restriction
Audit logging (alpha): API server logs every request — who did what, to which resource, at what time
– Scale: Tested to 5,000 nodes per cluster

The node authorization mode deserves more attention than it typically gets. Before 1.6, a compromised kubelet could read all secrets in the cluster. Node authorization restricted the kubelet to only the secrets it needed for pods scheduled on that node. This single change dramatically reduced the blast radius of a node compromise.


Kubernetes 1.7 — Custom Resource Definitions (June 2017)

The most significant architectural decision in Kubernetes history after the initial design: ThirdPartyResources (TPRs) were replaced with CustomResourceDefinitions (CRDs).

TPRs were a fragile mechanism introduced in 1.2 that let users define custom API types. They had serious limitations: no schema validation, no versioning, data loss bugs, and poor upgrade behavior. In 1.7, they were replaced with CRDs.

CRDs are what make the Kubernetes API extension model work. They let you define new resource types that the API server stores and serves, with optional schema validation via OpenAPI v3 schemas, version conversion, and admission webhook integration.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.stable.example.com
spec:
  group: stable.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              size:
                type: string
              version:
                type: string
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database

CRDs enabled the entire Operator ecosystem that would define the next phase of Kubernetes. Without stable, schema-validated custom resources, you can’t build reliable controllers on top of them.

Also in 1.7:
Secrets encryption at rest (alpha): Finally, secrets stored in etcd could be encrypted with AES-CBC or AES-GCM
Network Policy promoted to stable: CNI plugins implementing NetworkPolicy could now enforce pod-level ingress/egress rules
API aggregation layer: Extend the Kubernetes API with custom API servers — the foundation for metrics-server and other API extensions


Kubernetes 1.8 — RBAC Goes Stable (September 2017)

RBAC graduated to stable in 1.8. This was the point of no return for enterprise adoption. Security teams could now enforce least-privilege on Kubernetes API access with a documented, stable API.

Key additions:
Storage Classes stable: Dynamic volume provisioning — request a PersistentVolume and have the underlying storage (EBS, GCE PD, NFS) automatically provisioned
Workloads API (apps/v1beta2): Deployments, ReplicaSets, DaemonSets, and StatefulSets all moved under a unified API group, signaling they were heading toward stable

The admission webhook framework — which would become the foundation for policy enforcement tools like OPA/Gatekeeper — was also being refined in this period.


The Cloud Provider Moment (2017–2018)

October 2017: Docker Surrenders

At DockerCon Europe in October 2017, Docker Inc. announced that Docker Enterprise Edition would ship with Kubernetes support alongside Docker Swarm. This was, effectively, Docker Inc. conceding the orchestration market to Kubernetes. Swarm remained available, but the message was clear: Kubernetes was the production standard.

October 2017: Microsoft Previews AKS

Microsoft previewed Azure Kubernetes Service at DockerCon Europe. The managed Kubernetes race was on.

November 2017: Amazon Announces EKS

At AWS re:Invent 2017, Amazon announced Elastic Kubernetes Service. The three major cloud providers — Google (GKE, running since 2014), Microsoft (AKS), and Amazon (EKS) — were all committed to managed Kubernetes.

For enterprise buyers, this was the signal they needed. Kubernetes was no longer a bet on an experimental technology — it was the supported, managed offering from every major cloud provider.


Kubernetes 1.9 — Workloads API Stable (December 2017)

The Workloads API (apps/v1) went stable in 1.9. This matters because it locked in the API contract for Deployments, ReplicaSets, DaemonSets, and StatefulSets. Infrastructure built on these APIs would not break on upgrades.

# apps/v1 Deployment — the stable form that operators rely on
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

Also in 1.9:
Windows container support moved to beta — actual Windows Server 2016 nodes in a cluster
CoreDNS available as an alternative to kube-dns: A more extensible, plugin-based DNS server that would replace kube-dns as the default in 1.11


Kubernetes 1.10 — Storage, Auth, and Scale (March 2018)

1.10 continued the enterprise hardening:
CSI (Container Storage Interface) beta: A standardized interface between Kubernetes and storage providers. Before CSI, storage drivers were compiled into the kubelet binary. CSI moved them out-of-tree, allowing storage vendors to ship their own drivers without waiting for a Kubernetes release
External credential providers (alpha): Authenticate against external systems (cloud IAM, HashiCorp Vault) for kubeconfig credentials
Node problem detector stable: Detect and report node-level problems (kernel deadlocks, corrupted file systems) as Kubernetes events and node conditions

The CSI transition was one of the most important infrastructure decisions of this period. It decoupled storage driver development from the Kubernetes release cycle — a necessary step for cloud providers to ship storage integrations rapidly and independently.


The Istio Announcement and Service Mesh Wars (May 2017)

Google and IBM announced Istio in May 2017 — a service mesh that layered mTLS, traffic management, and observability on top of existing Kubernetes deployments without changing application code.

Istio’s architecture: sidecar proxies (Envoy) injected into every pod, managed by a control plane. Every service-to-service call passes through the sidecar, enabling:
– Mutual TLS between services (zero-trust networking at the service layer)
– Fine-grained traffic control (canary releases, circuit breaking, retries)
– Distributed tracing and metrics

Linkerd (from Buoyant) had been working on the same problem since 2016. The two projects would compete for the “service mesh standard” throughout 2017–2019.

The service mesh conversation was fundamentally a security architecture conversation: how do you enforce mutual authentication and encryption between services in a Kubernetes cluster without requiring application developers to implement it?


CoreOS Acquisition and the Operator Pattern (2018)

In January 2018, Red Hat acquired CoreOS for $250 million. CoreOS had contributed two things that would permanently shape Kubernetes:

1. The Operator Pattern (introduced by CoreOS engineers Brandon Philips and Josh Wood in 2016): An Operator is a custom controller that uses CRDs to manage the lifecycle of complex, stateful applications. The etcd Operator (CoreOS’s own) was the first — it automated etcd cluster creation, scaling, backup, and failure recovery. The pattern generalized: a Prometheus Operator, a PostgreSQL Operator, a Kafka Operator.

The Operator pattern is the answer to the question “how do you encode operational knowledge into software?” A human operator knows how to deploy, scale, backup, and recover a database. An Operator codifies that knowledge into a controller loop.

# Operator pattern: watch CRD → reconcile → manage application
CRD (EtcdCluster) → Operator Controller watches → creates/updates Pods, Services, Snapshots

2. etcd: The distributed key-value store that backs the Kubernetes control plane. CoreOS built and maintained etcd. Red Hat acquiring CoreOS meant that the company maintaining Kubernetes’s most critical dependency (after the kernel) was now inside the Red Hat/IBM orbit.


Helm 2 and the Charts Ecosystem

By 2017–2018, Helm had become the de facto package manager for Kubernetes. The public Helm chart repository hosted hundreds of charts — databases (PostgreSQL, MySQL, Redis), monitoring (Prometheus, Grafana), ingress controllers (nginx), CI/CD tools (Jenkins, GitLab Runner).

Helm 2 introduced Tiller — a server-side component that managed release state in the cluster. Tiller became the most criticized security decision in the Kubernetes ecosystem: Tiller ran with cluster-admin privileges by default, meaning any user who could reach Tiller’s gRPC endpoint could do anything in the cluster.

Security teams hated Tiller. The Helm team addressed it in Helm 3 (2019) by removing Tiller entirely and storing release state as Kubernetes Secrets instead.


Key Takeaways

  • RBAC going stable in 1.8 was the single most important security event in early Kubernetes history — it gave enterprises the access control model they needed for production
  • CRDs replacing TPRs in 1.7 enabled the entire Operator ecosystem that would define the next phase of Kubernetes
  • Docker Inc.’s October 2017 announcement that it would support Kubernetes in Docker EE effectively ended the container orchestration wars
  • The three major cloud providers (GKE, AKS, EKS) all standardizing on managed Kubernetes drove enterprise adoption faster than any feature announcement could
  • The Operator pattern — Kubernetes controllers that encode operational knowledge — emerged from CoreOS and became the standard model for managing complex stateful applications
  • Helm filled a real gap but Tiller’s cluster-admin model was a security debt the community had to repay in Helm 3

What’s Next

← EP02: The Container Wars | EP04: The Operator Era →

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com

Cloud AMI Security Risks & How Custom OS Images Fix them and what’s wrong with defaults

Reading Time: 8 minutes

~2,800 words  ·  Reading time: 12 min  ·  Series: OS Image Security, Post 1 of 6

When you launch an EC2 instance from an AWS Marketplace AMI, or spin up a VM from a cloud-provider base image on GCP or Azure, you’re trusting a decision someone else made months ago about what your server should contain. That decision was made for the widest possible audience — not for your workload, your threat model, or your compliance requirements.

This post tears open what’s actually inside a default cloud image, compares it against what a production-hardened image should contain, and explains why the calculus changes depending on whether you’re deploying to AWS, an on-prem KVM host, or a Nutanix AHV cluster.


What a cloud provider is actually optimising for

AWS, Canonical, Red Hat, and every other publisher shipping to cloud marketplaces are solving a distribution problem, not a security problem. Their images need to:

  • Boot successfully on any instance type in any region
  • Work for the first-time user running their first workload
  • Support every possible use case — web servers, databases, ML training jobs, bastion hosts, everything

That constraint produces images that are, by design, permissive. Permissive gets out of the way. Permissive doesn’t break anything on day one. Permissive is also the opposite of what you want on a production server.

Let’s look at what “permissive” actually means in concrete terms.


Dissecting a default AWS AMI

Take Amazon Linux 2023 (AL2023), one of the more intentionally stripped-down cloud images available. Even with Amazon’s effort to reduce its footprint compared to AL2, a fresh AL2023 instance ships with more than most workloads need.

Services running at boot that most workloads don’t need

chronyd.service            # Fine — you need NTP
systemd-resolved.service   # Fine
dbus-broker.service        # Fine
amazon-ssm-agent.service   # Arguably fine if you use SSM
NetworkManager.service     # Debatable — most cloud workloads don't need NM

On a RHEL 8/9 or Ubuntu 22.04 Marketplace image, the list is longer. You’ll find avahi-daemon (mDNS/DNS-SD service discovery — on a server), bluetooth.service in some configurations, cups on some RHEL variants, and on Ubuntu, snapd running and occupying memory along with its associated mount units.

Every running service is an attack surface. Every socket it opens is a listening endpoint you didn’t ask for.

SSH configuration out of the box

The default sshd_config on most Marketplace images is not hardened. You’ll typically find:

PermitRootLogin prohibit-password   # Better than 'yes', but not 'no'
PasswordAuthentication no           # Usually disabled by cloud-init — good
X11Forwarding yes                   # On a headless server. Why?
AllowAgentForwarding yes            # Unnecessary for most workloads
PrintLastLog yes                    # Minor, but generates audit noise
MaxAuthTries 6                      # CIS recommends 4 or fewer
ClientAliveInterval 0               # No idle timeout

CIS Benchmark Level 1 for RHEL 9 has 40+ SSH-specific controls. A default image satisfies perhaps a third of them.

Kernel parameters that aren’t tuned

# Not set, or not set correctly, on most default images:
net.ipv4.conf.all.send_redirects = 1        # Should be 0
net.ipv4.conf.default.accept_redirects = 1  # Should be 0
net.ipv4.ip_forward = 0                     # Correct if not a router, but often left unset
kernel.randomize_va_space = 2               # Usually correct — verify anyway
fs.suid_dumpable = 0                        # Often not set
kernel.dmesg_restrict = 1                   # Rarely set

These live in /etc/sysctl.d/ and need to be explicitly applied. In a default AMI, they are not.

No audit daemon configured

auditd is installed on most RHEL-family images. It is not configured. The default audit.rules file is essentially empty — the daemon runs but captures almost nothing. On Ubuntu, auditd isn’t even installed by default.

CIS Benchmark Level 2 for RHEL 9 specifies 30+ auditd rules covering file access, privilege escalation, user management changes, network configuration changes, and more. None of them are present in a default AMI.

Package surface

Run rpm -qa | wc -l or dpkg -l | grep -c ^ii on a fresh instance. AL2023 comes in around 350 packages. Ubuntu 22.04 Server minimal sits around 500. RHEL 9 from Marketplace — depending on the variant — lands between 400 and 600.

How many of those packages does your application actually need? For a Python web service: Python, your runtime dependencies, and a handful of system libraries. The rest is exposure.


The on-prem story is different — and often worse

Cloud images at least get regular updates from their publishers. On-prem KVM and Nutanix environments tell a different story.

The KVM / QCOW2 situation

Most teams running KVM get their base images one of three ways:

  1. Download a cloud image (cloud-init enabled QCOW2) from the distro vendor and use it directly
  2. Convert an existing VMware VMDK or OVA and hope for the best
  3. Run a manual Kickstart/Preseed install once, then treat the result as the “golden image” forever

Option 1 gives you the same problems as the cloud image analysis above, plus you’re now responsible for handling cloud-init in an environment that might not have a metadata service — so you either ship a seed ISO with every VM, or you rip out cloud-init and manage first-boot differently.

Option 3 is the most common and the most dangerous. That “golden image” was created by someone who’s possibly no longer at the company, contains packages pinned to versions from 18 months ago, and has sshd configured however was convenient at the time. Worse, it gets cloned hundreds of times and none of those clones are ever individually updated at the image level.

The Nutanix AHV specifics

Nutanix AHV images have additional considerations that cloud images don’t deal with:

  • AHV uses a custom paravirtualised SCSI controller (virtio-scsi or the Nutanix variant). Images imported from VMware need pvscsi drivers removed and virtio_scsi added to the initramfs before the disk will be detected at boot.
  • The Nutanix guest tools agent (ngt) is separate from the kernel and needs to be installed inside the image for snapshot quiescence, VSS integration, and in-guest metrics.
  • cloud-init works on AHV but requires the ConfigDrive datasource — not the EC2 datasource that most cloud QCOW2 images default to. An unconfigured datasource means cloud-init times out at boot, costing 3–5 minutes on every first start.
  • NUMA topology on large AHV nodes affects memory allocation in ways that need kernel tuning (vm.zone_reclaim_mode, kernel.numa_balancing) — parameters no generic cloud image sets.

The result is that most Nutanix environments end up with a patchwork: partially converted images, manually applied guest tools, and hardening that was done once per environment rather than once per image.


What a hardened image actually looks like

A properly built hardened image isn’t just “a default image with some hardening applied at the end.” The hardening is architectural — decisions made at build time that change the fundamental shape of what’s inside the image.

Package set — minimal by design

Start from a minimal install group — @minimal-environment on RHEL/Rocky, --variant=minbase on Debian derivatives. Then add only what the image class requires. For a web server image: your runtime, a process supervisor, and nothing else. No man-db, no X11-common, no avahi.

Every package you don’t install is a CVE that can never affect you.

Filesystem hardening

Separate mount points with restrictive options prevent a class of privilege escalation attacks that depend on executing binaries from world-writable locations:

/tmp      nodev,nosuid,noexec
/var      nodev,nosuid
/var/tmp  nodev,nosuid,noexec
/home     nodev,nosuid
/dev/shm  nodev,nosuid,noexec

These are not applied by any default cloud image.

Kernel parameters — baked in at build time

# /etc/sysctl.d/99-hardening.conf

net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.log_martians = 1
net.ipv6.conf.all.accept_redirects = 0
kernel.randomize_va_space = 2
fs.suid_dumpable = 0
kernel.dmesg_restrict = 1
kernel.kptr_restrict = 2
net.core.bpf_jit_harden = 2

Applied at image build time. Present on every instance, every time, before your application code runs.

SSH locked down

Protocol 2
PermitRootLogin no
MaxAuthTries 4
LoginGraceTime 60
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no
PermitUserEnvironment no
Ciphers [email protected],[email protected],aes256-ctr
MACs [email protected],[email protected]
KexAlgorithms curve25519-sha256,diffie-hellman-group16-sha512
ClientAliveInterval 300
ClientAliveCountMax 3
Banner /etc/issue.net

This is approximately CIS Level 1 SSH hardening. It lives in the image — not in a post-deploy playbook.

auditd rules embedded

# Privilege escalation
-a always,exit -F arch=b64 -S execve -C uid!=euid -F euid=0 -k setuid

# Sudo usage
-w /etc/sudoers -p wa -k sudoers

# User and group management
-w /etc/passwd -p wa -k identity
-w /etc/group  -p wa -k identity

# Kernel module loading
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules

The full CIS L2 auditd ruleset runs to ~60 rules. They’re all committed to the image. Every instance generates audit logs from minute one of its existence.

Services disabled at build time

systemctl disable avahi-daemon
systemctl disable cups
systemctl disable postfix
systemctl disable bluetooth
systemctl disable rpcbind
systemctl mask debug-shell.service

The service list varies by distro. The principle is the same: if it’s not required by the image’s purpose, it doesn’t run.


The platform dimension: why you can’t use one image everywhere

This is where the complexity gets real. A CIS-hardened RHEL 9 image built for AWS doesn’t directly work on KVM, and it doesn’t directly work on Nutanix either. The security controls are the same — the platform-specific layer underneath them is not.

Here’s what needs to differ per target platform:

Concern AWS (AMI) KVM (QCOW2) Nutanix AHV
Disk format Raw / VMDK → AMI QCOW2 QCOW2 / VMDK
Boot mechanism GRUB2 + PVGRUB2 or UEFI GRUB2 GRUB2 + UEFI
Network driver ENA (ena kernel module) virtio-net virtio-net
Storage driver NVMe or xen-blkfront virtio-blk / virtio-scsi virtio-scsi
cloud-init datasource ec2 NoCloud / ConfigDrive ConfigDrive
Guest agent AWS SSM / CloudWatch qemu-guest-agent Nutanix Guest Tools
Metadata service 169.254.169.254 None (seed ISO) or local Nutanix AOS

A single pipeline needs to produce platform-specific artefacts from a single hardened source. The hardening doesn’t change. The drivers, datasources, and agents do.


Where this sits relative to CIS and NIST

The controls described above aren’t arbitrary. They map directly to published frameworks.

CIS Benchmark Level 1 covers controls with low operational impact and high security return — SSH configuration, kernel parameters, filesystem mount options, service reduction. Almost everything in the “what a hardened image looks like” section above is CIS Level 1.

CIS Benchmark Level 2 adds auditd configuration, PAM controls, additional filesystem protections, and more aggressive service disablement. It trades some operational flexibility for a significantly smaller attack surface.

NIST SP 800-53 CM-6 (Configuration Settings) directly requires that systems be configured to the most restrictive settings consistent with operational requirements. Baking hardening into the image is a stronger implementation of CM-6 than applying it post-deploy — because it’s guaranteed, auditable at build time, and consistent across every instance regardless of how it was launched.

NIST SP 800-53 SI-2 (Flaw Remediation) maps to your image patching cadence. An image rebuilt monthly against the latest package repositories satisfies SI-2 more completely than runtime patching alone, because it also eliminates packages you don’t need — packages that would need patching if they were present.

The full CIS and NIST control mapping will be covered in depth later in this series.


The build-time vs runtime hardening distinction

This is the most important concept in the entire post.

Hardening applied at runtime — via Ansible, Chef, cloud-init user-data, or a shell script — is conditional. It runs if the automation runs. It applies if nothing fails. It’s consistent only if every deployment goes through exactly the same path.

Hardening embedded in the image is unconditional. It cannot be skipped. It doesn’t depend on connectivity to an Ansible control node. It doesn’t require cloud-init to succeed. It cannot be accidentally omitted by a new team member who doesn’t know the runbook.

This distinction matters most at incident response time. When you’re investigating a compromised instance, the first question you want to answer confidently is: was this instance ever in a known-good state?

  • If your hardening is in the image: yes, from boot.
  • If your hardening is applied post-deploy: it depends on whether everything went right on that specific instance’s first boot.

What comes next

The practical question this raises: how do you build these images in a repeatable, multi-platform way, with CIS scanning integrated into the build pipeline?

Packer covers most of the builder layer. OpenSCAP provides the scanning. Kickstart, cloud-init, and Nutanix AHV-specific tooling fill the gaps. But the orchestration between these — producing a consistent hardened image for three different target platforms from a single source of truth — is where most teams hit friction.

The next post in this series covers the platform-specific differences between AWS, KVM, and Nutanix in depth: what actually needs to change per target when your security baseline is shared.

Next in the series: Cloud vs KVM vs Nutanix — why one image doesn’t fit all →


Questions or corrections? Open an issue or reach me on LinkedIn. If this was useful, the series index has the full roadmap.

The Container Wars: Kubernetes 1.0, CNCF, and the Fight for Orchestration (2014–2016)

Reading Time: 6 minutes


Introduction

Three orchestration systems entered the arena in 2015. Only one would still matter three years later.

Docker had created the container revolution. Now everyone needed to run containers at scale, and three camps formed around three very different philosophies. Understanding why Kubernetes won — and how close it came to not winning — explains most of the design choices that still shape Kubernetes today.


The State of Container Orchestration in 2014

When Kubernetes made its public debut at DockerCon 2014, it entered a space that didn’t yet have a name. “Container orchestration” wasn’t a category. It was a problem people had started to feel but not yet articulate.

Three approaches emerged nearly simultaneously:

Docker Swarm (announced December 2014): Docker’s answer to orchestration, built on the premise that the tool you use to run containers should also be the tool you use to cluster them. Swarm used the same Docker CLI and Docker API — zero new concepts for developers already using Docker.

Apache Mesos (Mesosphere Marathon): Mesos predated Docker. It was a distributed systems kernel originally developed at Berkeley, used in production at Twitter, Airbnb, and Apple. Marathon was the framework for running long-running services on top of Mesos. Mesos could run Docker containers, Hadoop jobs, and Spark workloads on the same cluster. Serious infrastructure engineers took it seriously.

Kubernetes: The newcomer with Google’s name behind it, but no track record outside Google, and early versions that required significant operational expertise to run.


Kubernetes v1.0: July 21, 2015

The 1.0 release landed at the first CloudNativeCon/KubeCon in San Francisco on July 21, 2015. The timing was deliberate — it coincided with the announcement of the Cloud Native Computing Foundation.

What shipped in 1.0:

  • Pods: The core scheduling unit — one or more containers sharing a network namespace and storage
  • Replication Controllers: Keep N copies of a pod running (later replaced by ReplicaSets and Deployments)
  • Services: A stable virtual IP and DNS name in front of a set of pods
  • Namespaces: Soft multi-tenancy boundaries within a cluster
  • Labels and Selectors: The flexible grouping mechanism that makes everything composable
  • Persistent Volumes (basic): Pods could mount persistent storage
  • kubectl: The command-line interface

What was not in 1.0:
– No RBAC (Role-Based Access Control)
– No network policy
– No autoscaling
– No Ingress resources
– No StatefulSets
– No DaemonSets (added in 1.1)
– Secrets were stored in plaintext in etcd

The security posture of a fresh Kubernetes 1.0 cluster was essentially: “trust everything inside the cluster.” That was the inherited assumption from Borg.


The CNCF Formation

Alongside the 1.0 release, Google donated Kubernetes to the newly formed Cloud Native Computing Foundation — a Linux Foundation project. This was a critical strategic move.

By donating Kubernetes to a neutral foundation, Google:
1. Removed the perception of a single vendor controlling the project
2. Created a governance model that made enterprise adoption politically safe
3. Invited competitors (Red Hat, CoreOS, Docker, Microsoft) to contribute without ceding control to them

The CNCF’s initial Technical Oversight Committee included engineers from Google, Red Hat, Twitter, Cisco, and others. This governance model would later become the template for every CNCF project that followed.


v1.1 — v1.5: Building the Foundation (Late 2015–2016)

Kubernetes 1.1 (November 2015)

  • Horizontal Pod Autoscaler (HPA): Automatically scale pod count based on CPU utilization
  • HTTP load balancing: Ingress API added as alpha — pods could now be exposed via HTTP routing rules
  • Job objects: Run a task to completion, not just keep it running
  • Performance: 30% throughput improvement, pods per minute scheduling rate improved significantly

Kubernetes 1.2 (March 2016)

  • Deployments promoted to beta: Rolling updates, rollback, pause/resume — the deployment primitive that engineers actually use for application deployments
  • ConfigMaps: Decouple configuration from container images (no more baking config into images)
  • Daemon Sets stable: Run exactly one pod per node — the pattern for node agents (log shippers, monitoring agents, network plugins)
  • Scale: Tested to 1,000 nodes and 30,000 pods per cluster

Kubernetes 1.3 (July 2016)

  • StatefulSets (then called PetSets, alpha): Ordered, persistent-identity pods — the first serious attempt to run databases and stateful applications
  • Cross-cluster federation (alpha): Run workloads across multiple clusters
  • PodDisruptionBudgets (alpha): Control how many pods can be unavailable during voluntary disruptions — critical for safe rolling updates
  • rkt integration (Rktnetes): First Container Runtime Interface experiment — the kubelet talking to something other than Docker

Kubernetes 1.4 (September 2016)

  • kubeadm: A tool to bootstrap a Kubernetes cluster in two commands. Before kubeadm, setting up a cluster required following Kelsey Hightower’s “Kubernetes the Hard Way” — valuable for learning, painful for production
  • ScheduledJobs (CronJobs): Run a job on a schedule
  • PodPresets: Inject common configuration into pods at admission time
  • Init Containers beta: Containers that run to completion before the main application containers start — the clean solution for initialization sequencing

Kubernetes 1.5 (December 2016)

  • StatefulSets promoted to beta
  • PodDisruptionBudgets to beta
  • Windows Server container support (alpha): First step toward a non-Linux node
  • CRI (Container Runtime Interface) alpha: The abstraction layer that would eventually allow Kubernetes to run containerd, CRI-O, and others instead of depending on Docker
  • OpenAPI spec: Machine-readable API documentation, enabling client code generation

Helm: The Missing Package Manager (February 2016)

Kubernetes gave you primitives. It did not give you a way to install applications composed of those primitives. In February 2016, Deis (later acquired by Microsoft) released Helm — a package manager for Kubernetes.

Helm introduced two concepts that stuck:
Charts: A collection of Kubernetes manifests bundled with templating and default values
Releases: An installed instance of a chart, with its own lifecycle (install, upgrade, rollback, delete)

Helm’s immediate adoption signaled something important: the community was already thinking in terms of applications, not just raw primitives. Infrastructure engineers needed a layer of abstraction above YAML.


The Battle Lines Harden

By mid-2016, the three-way contest was becoming clearer:

Docker Swarm’s advantage: Zero friction for existing Docker users. docker swarm init + docker stack deploy. No new CLI, no new API, no new mental model. For small teams running straightforward applications, it was compelling.

Mesos’s advantage: Proven at Google-scale before Kubernetes existed. Twitter ran Mesos in production. It could run heterogeneous workloads (Docker containers, Hadoop, Spark) on the same cluster. Enterprise data teams already had Mesos expertise.

Kubernetes’s advantage: The Google name, rapidly growing community, and a design that was clearly winning the feature race. But operational complexity was real — running Kubernetes well in 2016 required significant investment.


The Turning Point Nobody Talks About

The real moment that decided the container wars wasn’t a feature announcement. It was cloud provider behavior.

Google Kubernetes Engine (GKE) — then called Google Container Engine — had been running since 2014. It was the first managed Kubernetes service, and it worked. In 2016, both Microsoft and Amazon were working on managed Kubernetes offerings. Neither chose Docker Swarm. Neither chose Mesos.

When cloud providers converge on a technology, the market follows. By the time Amazon announced EKS and Microsoft announced AKS in late 2017, the decision was already made.


The Security Debt Accumulates

Running through the 1.0–1.5 feature list reveals a security architecture that was being designed in flight:

  • etcd stored secrets as base64-encoded strings — not encrypted. Kubernetes 1.7 (2017) would add encryption at rest, but it required explicit configuration
  • The API server was unauthenticated by default in early versions — you needed to configure authentication
  • Network traffic between pods was unrestricted — all pods could reach all other pods on all ports, across all namespaces. NetworkPolicy existed as alpha in 1.3 but required a CNI plugin that supported it
  • The kubelet’s API was open — in early Kubernetes, the kubelet’s HTTP API was accessible without authentication from within the cluster

These weren’t oversights — they were reasonable defaults for an internal cluster managed by a single team. They became liabilities as Kubernetes moved into multi-tenant enterprise environments.


KubeCon: A Community Forms

The first KubeCon conference ran November 9-11, 2015, in San Francisco — a small gathering of a few hundred engineers. By November 2016, KubeCon North America in Seattle drew thousands. The growth was not marketing-driven; it was practitioners solving real problems and sharing what they learned.

This community dynamic was qualitatively different from the Docker Swarm and Mesos ecosystems. Kubernetes had a contributor culture — pull requests, SIG (Special Interest Group) meetings, public design docs. The project was being built in the open, and engineers could see it happening.


Key Takeaways

  • Kubernetes 1.0 shipped in July 2015 with the basics functional but security model immature — no RBAC, no network policy, secrets stored in plaintext
  • The CNCF governance model was the strategic move that made enterprise adoption politically safe — no single vendor controls the project
  • Helm filled the missing application packaging layer that raw Kubernetes couldn’t provide
  • The container wars were decided not by technical superiority alone, but by cloud provider alignment — when Google, Microsoft, and Amazon all built managed Kubernetes, the market followed
  • v1.1–v1.5 established the core workload primitives: Deployments, StatefulSets, DaemonSets, Jobs, ConfigMaps, HPA — most of these remain the daily vocabulary of Kubernetes operations

What’s Next

← EP01: The Borg Legacy | EP03: Enterprise Awakening →

Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com

The Borg Legacy: How Google Built the Blueprint for Kubernetes (2003–2014)

Reading Time: 5 minutes


Introduction

Every piece of infrastructure has a lineage. Kubernetes didn’t appear from nowhere in 2014. It is, in almost every meaningful sense, Google’s Borg system rebuilt for the world — with a decade of hard lessons baked in.

To understand Kubernetes, you have to understand what came before it. And what came before it ran (and still runs) more compute than most organizations will ever touch.


Google’s Scale Problem (2003)

By the early 2000s, Google was running hundreds of thousands of jobs across tens of thousands of machines. Web indexing, ads, Gmail, Maps — all of these needed compute, and none of them could afford to waste it.

In 2003-2004, Google engineer Rohit Seth proposed a kernel feature called cgroups (control groups) — a mechanism to limit, prioritize, account, and isolate resource usage of process groups. The Linux kernel merged cgroups in 2.6.24 (2008). This was the primitive that would later make containers possible.

Simultaneously, Google built Borg — an internal cluster management system that could run hundreds of thousands of jobs, from many thousands of different applications, across many clusters, with each cluster having up to tens of thousands of machines. Borg was never open-sourced. It ran (and still runs) Google’s entire production workload.


What Borg Got Right

Borg introduced concepts that engineers didn’t yet have names for. They became the vocabulary of modern infrastructure:

Workload types:
Borg separated workloads into two classes: long-running services (high-priority, latency-sensitive) and batch jobs (best-effort, preemptible). Kubernetes would later call these Deployments and Jobs.

Declarative specification:
Borg jobs were described in a configuration language (BCL, a dialect of GCL). You declared what you wanted; Borg figured out how to achieve it. Sound familiar?

Resource limits and requests:
Borg tasks had both a request (what you need) and a limit (what you can use). Kubernetes adopted this model directly — resources.requests and resources.limits in pod specs trace directly back to Borg.

Health checking and rescheduling:
Borg monitored task health and automatically rescheduled failed tasks. The kubelet’s liveness and readiness probes are descendants of this.

Cell (cluster) topology:
Borg organized machines into “cells” — what Kubernetes calls clusters. The Borgmaster (control plane) managed the cell.


Omega: The Sequel That Didn’t Ship

Around 2011, Google started building Omega — a more flexible scheduler designed to address Borg’s limitations. Borg had a monolithic scheduler; Omega introduced a shared-state, optimistic-concurrency model where multiple schedulers could operate concurrently without stepping on each other.

A 2013 paper from Google (“Omega: flexible, scalable schedulers for large compute clusters”) made these ideas public. Omega itself stayed internal, but many of its scheduling concepts influenced Kubernetes’ extensible scheduler design.


The Docker Moment (March 2013)

On March 15, 2013, Solomon Hykes stood at PyCon and demonstrated Docker with a five-minute talk titled “The future of Linux Containers.” The demo ran a container. That was it. The room understood immediately.

Docker solved the packaging and distribution problem. Linux had had containers (via LXC and cgroups/namespaces) for years, but running one required deep kernel knowledge. Docker wrapped all of that in a UX that a developer could actually use.

Google’s engineers watched. They recognized the pattern: Docker was doing for containers what the smartphone did for mobile computing — making an existing capability accessible to everyone.

The Google engineers building the next generation of infrastructure realized: once containers become ubiquitous, someone will need to orchestrate them at scale. And they had already built that system internally, twice.


The Decision to Open-Source (Fall 2013)

In late 2013, a small group of Google engineers — Brendan Burns, Joe Beda, Craig McLuckie, Ville Aikas, Tim Hockin, Dawn Chen, Brian Grant, and Daniel Smith — began a new project internally codenamed “Project Seven” (a reference to the Borg drone Seven of Nine).

The core insight: Google’s competitive advantage in infrastructure came from what ran on the cluster management system, not the system itself. Open-sourcing a Kubernetes-like system would benefit Google by standardizing the ecosystem around patterns Google already understood better than anyone.

The initial design decisions were deliberate:

  • Go as the implementation language: Fast compilation, good concurrency primitives, easy deployment as static binaries
  • REST API as the primary interface: Everything in Kubernetes is an API resource. This is not accidental — it makes the system composable and automatable from day one
  • Labels and selectors over hierarchical naming: Borg used a hierarchical job/task naming scheme; Kubernetes chose a flat namespace with label-based grouping, which proved far more flexible
  • Reconciliation loops everywhere: Every Kubernetes controller is a loop that watches actual state and drives it toward desired state. This is the controller pattern, and it is the heart of Kubernetes extensibility

First Commit: June 6, 2014

The first public commit landed on GitHub on June 6, 2014: 250 files, 47,501 lines of Go, Bash, and Markdown.

Three days later, on June 10, 2014, Eric Brewer (VP of Infrastructure at Google) announced Kubernetes publicly at DockerCon 2014. The announcement framed it explicitly as bringing Google’s infrastructure learnings to the community.

By July 10, 2014, Microsoft, Red Hat, IBM, and Docker had joined the contributor community.


What Kubernetes Deliberately Left Out of Borg

The designers made intentional decisions about what not to carry forward:

No proprietary language: Borg’s BCL/GCL was Google-internal. Kubernetes used plain JSON (later YAML) manifests — standard formats any tool could read and write.

No magic autoscaling by default: Borg aggressively reclaimed resources. Kubernetes launched without this, adding HPA (Horizontal Pod Autoscaler) later, allowing operators to control the behavior.

No built-in service discovery tied to the scheduler: Borg had tight coupling between scheduling and name resolution. Kubernetes separated these: Services (kube-proxy, DNS) are distinct from the scheduler, allowing them to evolve independently.


The Borg Paper (2015)

In April 2015, Google published “Large-scale cluster management at Google with Borg” — the first public detailed description of the system. Reading it alongside the Kubernetes documentation reveals how directly the design decisions transferred.

Key numbers from the paper:
– Borg ran hundreds of thousands of jobs from thousands of applications
– Typical cell: 10,000 machines
– Utilization improvements from bin-packing: significant enough to justify the entire engineering investment

The paper is required reading for anyone who wants to understand why Kubernetes is designed the way it is — not as a series of arbitrary choices but as a deliberately evolved system.


The Lineage That Matters for Security

From a security architecture perspective, the Borg lineage matters because the isolation model was designed for a trusted-internal environment, not a multi-tenant hostile-external one. This created a debt that Kubernetes has spent years paying down:

  • Namespaces are a soft boundary, not a hard isolation primitive — just as Borg’s cells were
  • The default-allow network model reflects Borg’s assumption of a trusted internal network
  • No built-in admission control at launch — Borg trusted its job submitters

Understanding this history explains why features like NetworkPolicy, PodSecurity, RBAC, and OPA/Gatekeeper were retrofitted over years rather than built-in from day one. The system was designed by and for Google’s internal trust model. The security hardening came as it entered the wild.


Key Takeaways

  • Kubernetes is Google’s Borg system rebuilt for the world, carrying 10+ years of cluster management experience
  • Core Kubernetes primitives — resource requests/limits, declarative specs, health-based rescheduling, label-based grouping — map directly to Borg concepts
  • The decision to open-source was strategic, not altruistic: Google wanted to standardize the ecosystem on patterns it already mastered
  • The security gaps in early Kubernetes (no default network isolation, permissive RBAC, no pod-level security controls) trace directly to Borg’s trusted-internal-network assumptions
  • Docker’s accessibility breakthrough created the demand; Google’s Borg experience supplied the architecture

What’s Next

EP02: The Container Wars → — Kubernetes 1.0, the CNCF formation, and the three-way fight between Docker Swarm, Apache Mesos, and Kubernetes for control of the container orchestration market.


Series: Kubernetes: From Borg to Platform Engineering | linuxcent.com

EKS 1.33 Upgrade Blocker: Fixing Dead Nodes & NetworkManager on Rocky Linux

Reading Time: 5 minutes

The EKS 1.33+ NetworkManager Trap: A Complete systemd-networkd Migration Guide for Rocky & Alma Linux

TL;DR:

  • The Blocker: Upgrading to EKS 1.33+ is breaking worker nodes, especially on free community distributions like Rocky Linux and AlmaLinux. Boot times are spiking past 6 minutes, and nodes are failing to get IPs.
  • The Root Cause: AWS is deprecating NetworkManager in favor of systemd-networkd. However, ripping out NetworkManager can leave stale VPC IPs in /etc/resolv.conf. Combined with the systemd-resolved stub listener (127.0.0.53) and a few configuration missteps, it causes a total internal DNS collapse where CoreDNS pods crash and burn.
  • The Subtext: AWS is pushing this modern networking standard hard. Subtly, this acts as a major drawback for Rocky/Alma AMIs, silently steering frustrated engineers toward Amazon Linux 2023 (AL2023) as the “easy” way out.
  • The “Super Hack”: Automate the clean removal of NetworkManager, bypass the DNS stub listener by symlinking /etc/resolv.conf directly to the systemd uplink, and enforce strict state validation during the AMI build.

If you’ve been in the DevOps and SRE space long enough, you know that vendor upgrades rarely go exactly as planned. But lately, if you are running enterprise Linux distributions like Rocky Linux or AlmaLinux on AWS EKS, you might have noticed the ground silently shifting beneath your feet.

With the push to EKS 1.33+, AWS is mandating a shift toward modern, cloud-native networking standards. Specifically, they are phasing out the legacy NetworkManager in favor of systemd-networkd.

While this makes sense on paper, the transition for community distributions has been incredibly painful. AWS support couldn’t resolve our issues, and my SRE team had practically given up, officially halting our EKS upgrade process. It’s hard not to notice that this massive, undocumented friction in Rocky Linux and AlmaLinux conveniently positions AWS’s own Amazon Linux 2023 (AL2023) as the path of least resistance.

I’m hoping the incredible maintainers at free distributions like Rocky Linux and AlmaLinux take note of this architectural shift. But until the official AMIs catch up, we have to fix it ourselves. Here is the exact breakdown of the cascading failure that brought our clusters to their knees, and the “super hack” script we used to fix it.

The Investigation: A Cascading SRE Failure

When our EKS 1.33+ worker nodes started booting with 6+ minute latencies or outright failing to join the cluster, I pulled apart our Rocky Linux AMIs to monitor the network startup sequence. What I found was a classic cascading failure of services, stale data, and human error.

Step 1: The Race Condition

Initially, the problem was a violent tug-of-war. NetworkManager was not correctly disabled by default, and cloud-init was still trying to invoke it. This conflicted directly with systemd-networkd, paralyzing the network stack during boot. To fix this, we initially disabled the NetworkManager service and removed it from cloud-init.

Step 2: The Stale Data Landmine

Here is where the trap snapped shut. Because NetworkManager was historically the primary service responsible for dynamically generating and updating /etc/resolv.conf, completely disabling it stopped that file from being updated.

When we baked the new AMI via Packer, /etc/resolv.conf was orphaned and preserved the old configuration—specifically, a stale .2 VPC IP address from the temporary subnet where the AMI build ran.

Step 3: The Human Element

We’ve all been there: during a stressful outage, wires get crossed. While troubleshooting the dead nodes, one of our SREs mistakenly stopped the systemd-resolved service entirely, thinking it was conflicting with something else.

Step 4: Total DNS Collapse

When the new AMI booted up and joined the EKS node group, the environment was a disaster zone:

  1. NetworkManager was dead (intentional).
  2. systemd-resolved was stopped (accidental).
  3. /etc/resolv.conf contained a dead, stale IP address from a completely different subnet.

When kubelet started, it dutifully read the host’s broken /etc/resolv.conf and passed it up to CoreDNS. CoreDNS attempted to route traffic to the stale IP, failed, and started crash-looping. Internal DNS resolution (pod.namespace.svc.cluster.local) totally collapsed. The cluster was dead in the water.

Flowchart showing the cascading DNS failure in EKS worker nodes
The perfect storm: How stale data and disabled services led to a total CoreDNS collapse.

Linux Internals: How systemd Manages DNS (And Why CoreDNS Breaks)

To understand how to permanently fix this, we need to look at how systemd actually handles DNS under the hood. When using systemd-networkd, resolv.conf management is handled through a strict partnership with systemd-resolved.

Architecture diagram of systemd-networkd and systemd-resolved D-Bus communication
How systemd collects network data and the critical symlink choice that dictates EKS DNS health.

Here is how the flow works: systemd-networkd collects network and DNS information (from DHCP, Router Advertisements, or static configs) and pushes it to systemd-resolved via D-Bus. To manage your DNS resolution effectively, you must configure the /etc/resolv.conf symbolic link to match your desired mode of operation. You have three choices:

1. The “Recommended” Local DNS Stub (The EKS Killer)

By default, systemd recommends using systemd-resolved as a local DNS cache and manager, providing features like DNS-over-TLS and mDNS.

  • The Symlink: ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
  • Contents: Points to 127.0.0.53 as the only nameserver.
  • The Problem: This is a disaster for Kubernetes. If Kubelet passes 127.0.0.53 to CoreDNS, CoreDNS queries its own loopback interface inside the pod network namespace, blackholing all cluster DNS.

2. Direct Uplink DNS (The “Super Hack” Solution)

This mode bypasses the local stub entirely. The system lists the actual upstream DNS servers (e.g., your AWS VPC nameservers) discovered by systemd-networkd directly in the file.

  • The Symlink: ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
  • Contents: Lists all actual VPC DNS servers currently known to systemd-resolved.
  • The Benefit: CoreDNS gets the real AWS VPC nameservers, allowing it to route external queries correctly while managing internal cluster resolution perfectly.

3. Static Configuration (Manual)

If you want to manage DNS manually without systemd modifying the file, you break the symlink and create a regular file (rm /etc/resolv.conf). While systemd-networkd still receives DNS info from DHCP, it won’t touch this file. (Not ideal for dynamic cloud environments).


The Solution: A Surgical systemd Cutover

Knowing the internals, the path forward is clear. We needed to not only remove the legacy stack but explicitly rewire the DNS resolution to the Direct Uplink to prevent the stale data trap and bypass the notorious 127.0.0.53 stub listener.

Here is the exact state we achieved:

  1. Lock down cloud-init so it stops triggering legacy network services.
  2. Completely mask NetworkManager to ensure it never wakes up.
  3. Ensure systemd-resolved is enabled and running, but with the DNSStubListener explicitly disabled (DNSStubListener=no) so it doesn’t conflict with anything.
  4. Destroy the stale /etc/resolv.conf and create a symlink to the Direct Uplink (ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf).
  5. Reconfigure and restart systemd-networkd.

Pro-Tip for Debugging: To ensure systemd-networkd is successfully pushing DNS info to the resolver, verify your .network files in /etc/systemd/network/. Ensure UseDNS=yes (which is the default) is set in the [DHCPv4] section. You can always run resolvectl status to see exactly which DNS servers are currently assigned to each interface over D-Bus!

The Automation: Production AMI Prep Script

Manual hacks are great for debugging, but SRE is about repeatable automation. We’ve open-sourced the eks-production-ami-prep.sh script to handle this cutover automatically during your Packer or Image Builder pipeline. It standardizes the cutover, wipes out the stale data, and includes a strict validation suite.


The Results

By actively taking control of the systemd stack and ensuring /etc/resolv.conf was dynamically linked rather than statically abandoned, we completely unblocked our EKS 1.33+ upgrade.

More impressively, our system bootup time dropped from a crippling 6+ minutes down to under 2 minutes. We shouldn’t have to abandon fantastic, free enterprise distributions just because a cloud provider shifts their networking paradigm. If your team is struggling with AWS EKS upgrades on Rocky Linux or AlmaLinux, integrate this automation into your pipeline and get your clusters back in the fast lane.

Supercharge Your Nginx Security: A Practical Guide to Enabling TLS 1.3 on Rocky Linux 9

Reading Time: 4 minutes

Alright, let’s get straight to it. You’re running a modern web stack on Linux. You’ve been diligent, you’ve secured your URL endpoints, and you’re serving traffic over HTTPS using TLS 1.2. That’s a solid baseline. But in the world of infrastructure, standing still is moving backward. TLS 1.3 has been the standard for a while now, and it’s not just an incremental update; it’s a significant leap forward in both security and performance.

The good news? If you’re on a current platform like Rocky Linux 9.6, you’re already 90% of the way there. The underlying components are in place. This guide is the final 10%—a no-nonsense, command-line focused walkthrough to get you from TLS 1.2 to the faster, more secure TLS 1.3, complete with the validation steps and pro-tips to make it production-ready.

Prerequisites Check: Ensure that your OS is updated to the latest and we’re Good to Go

Before we touch any configuration files, let’s confirm your environment is ready. Enabling TLS 1.3 depends on two critical pieces of software: your web server (Nginx) and the underlying cryptography library (OpenSSL).

  • Nginx: You need version 1.13.0 or newer.
  • OpenSSL: You need version 1.1.1 or newer.

Rocky Linux 9.6 and its siblings in the RHEL 9 family ship with versions far newer than these minimums. Let’s verify it. SSH into your server and run this command:

nginx -V

The output will be verbose, but you’re looking for two lines. You’ll see something like this (your versions may differ slightly):

nginx version: nginx/1.26.x
built with OpenSSL 3.2.x ...

With Nginx and OpenSSL versions well above the minimum, we’re cleared for takeoff.

The Upgrade: Configuring Nginx for TLS 1.3

This is where the rubber meets the road. The process involves a single, targeted change to your Nginx configuration.

Step 1: Locate Your Nginx Server Block

Your SSL configuration is defined within a server block in your Nginx files. If you have a simple setup, this might be in /etc/nginx/nginx.conf. However, the best practice is to have separate configuration files for each site in /etc/nginx/conf.d/.

Find the relevant file for the site you want to upgrade. It will contain the listen 443 ssl; directive and your ssl_certificate paths.

Step 2: Modify the ssl_protocols Directive

Inside your server block, find the line that begins with ssl_protocols. To enable TLS 1.3 while maintaining compatibility for clients that haven’t caught up, modify this line to include TLSv1.3. The best practice is to support both 1.2 and 1.3.

# BEFORE
# ssl_protocols TLSv1.2;

# AFTER: Add TLSv1.3
ssl_protocols TLSv1.2 TLSv1.3;

It is critical that this directive is inside every server block where you want TLS 1.3 enabled. Settings are not always inherited from a global http block as you might expect.

Validation and Deployment: Trust, but Verify

A configuration change isn’t complete until it’s verified. This two-step process ensures you don’t break your site and that the change actually worked.

Step 1: Test and Reload Nginx

Never apply a new configuration blind. First, run the built-in Nginx test to check for syntax errors:

sudo nginx -t

If all is well, you’ll see a success message. Now, gracefully reload Nginx to apply the changes without dropping connections:

sudo systemctl reload nginx

Step 2: Verify TLS 1.3 is Active

Your server is reloaded, but how do you know TLS 1.3 is active? You must verify it with an external tool.

  • Quick Command-Line Check: For a fast check from your terminal, use curl:
    curl -I -v --tlsv1.3 --tls-max 1.3 https://your-domain.com

    Look for output confirming a successful connection using TLSv1.3.

  • The Gold Standard: The most comprehensive way to verify your setup is with the Qualys SSL Labs SSL Server Test. Navigate to their website, enter your domain name, and run a scan. In the “Configuration” section of the report, you will see a heading for “Protocols.” If your setup was successful, you will see a definitive “Yes” next to TLS 1.3.

Advanced Hardening: Pro-Tips for Production

You’ve enabled a modern protocol. Now, let’s enforce its use and add other layers of security that a production environment demands.

Pro-Tip 1: Implement HSTS (HTTP Strict Transport Security)

HSTS is a header your server sends to tell browsers that they should only communicate with your site using HTTPS. This prevents downgrade attacks. Add this header to your Nginx server block:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
  • max-age=63072000: Tells the browser to cache this rule for two years.
  • includeSubDomains: Applies the rule to all subdomains. Use with caution.
  • preload: Allows you to submit your site to a list built into browsers, ensuring they never connect via HTTP.

Pro-Tip 2: Enable OCSP Stapling

Online Certificate Status Protocol (OCSP) Stapling improves performance and privacy by allowing your server to fetch the revocation status of its own certificate and “staple” it to the TLS handshake. This saves the client from having to make a separate request to the Certificate Authority.

Enable it by adding these lines to your server block:

# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem; # Use your fullchain certificate
resolver 8.8.8.8 1.1.1.1 valid=300s; # Use public resolvers

Pro-Tip 3: Modernize Your Cipher Suites

While TLS 1.3 has its own small set of mandatory, highly secure cipher suites, you can still define the ciphers for TLS 1.2. The ssl_prefer_server_ciphers directive should be set to off for TLS 1.3, which is the default in modern Nginx versions, allowing the client’s more modern cipher preferences to be honored. However, you should still define a strong cipher list for TLS 1.2.

Here is a modern configuration snippet combining these tips:

server {
    listen 443 ssl http2;
    server_name your-domain.com;

    # SSL Config
    ssl_certificate /path/to/fullchain.pem;
    ssl_certificate_key /path/to/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers off;

    # HSTS Header
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /path/to/fullchain.pem;

    # ... other configurations ...
}

TL;DR

  • Enable TLS 1.3 by adding it to the ssl_protocols directive in your Nginx server block: ssl_protocols TLSv1.2 TLSv1.3;. Rocky Linux 9.6 ships with the required Nginx and OpenSSL versions.
  • Always validate your configuration before and after applying it. Use sudo nginx -t to check syntax, and then use an external tool like the Qualys SSL Labs test to confirm TLS 1.3 is active on your live domain.
  • Go beyond the basic setup by implementing advanced hardening. Add the Strict-Transport-Security (HSTS) header and enable OCSP Stapling to build a truly robust and secure configuration.

Conclusion

Upgrading to TLS 1.3 on a modern stack like Nginx on Rocky Linux 9 is refreshingly simple. The core task is a one-line change. However, as a senior engineer, your job doesn’t end there. The real “super hack” is in the full workflow: making the change, rigorously validating it from an external perspective, and then hardening the configuration with production-grade features like HSTS and OCSP Stapling. By following these steps, you’ve done more than just flip a switch; you’ve demonstrably improved your site’s security posture and performance, confirming your stack is compliant with the latest standards.

Implementing ILM with Write Aliases (Logstash + Elasticsearch)

Reading Time: 3 minutes

In this blog post, I demonstrate the creation of a new elasticsearch index with the ability to rollover using the aliases.

We will be implementing the ILM (Information lifecycle Management) in Elasticsearch with Logstash Using Write Aliases

Optimize Elasticsearch indexing with a clean, reliable setup: use Index Lifecycle Management (ILM) with a dedicated write alias, let Elasticsearch handle rollovers, and keep Logstash writing to the alias instead of hardcoded index names. This approach improves stability, reduces manual ops, and scales cleanly as log volume grows.

Implementing ILM with Write Aliases (Logstash + Elasticsearch)

Optimize Elasticsearch indexing with a clean, reliable setup: use Index Lifecycle Management (ILM) with a dedicated write alias, let Elasticsearch handle rollovers, and keep Logstash writing to the alias instead of hardcoded index names. This approach improves stability, reduces manual operations, and scales cleanly as log volume grows.

What you’ll set up

  • Write to a single write alias.
  • Apply ILM via an index template with a rollover alias.
  • Bootstrap the first index with the alias marked as is_write_index:true.
  • Point Logstash at ilm_rollover_alias (not a date-based index).

Prerequisites

  • Elasticsearch with ILM enabled.
  • Logstash connected to Elasticsearch.
  • An ILM policy (example: es_policy01).

1) Create index template with rollover alias

Define a template that applies the ILM policy and the alias all indices will use.

PUT _index_template/test-vks
{
  "index_patterns": ["vks-nginx-*"],
  "priority": 691,
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "es_policy01",
          "rollover_alias": "vks-nginx-write-alias"
        },
        "number_of_shards": 1,
        "number_of_replicas": 0
      }
    },
    "mappings": {
      "dynamic": "runtime"
    }
  }
}

Notes:

  • Only set index.lifecycle.rollover_alias here; do not declare the alias body in the template.
  • Tune shards/replicas for your cluster and retention goals.

2) Bootstrap the first index

Create the first managed index and bind the write alias to it.

PUT /<vks-nginx-error-{now/d}-000001>
{
  "aliases": {
    "vks-nginx-write-alias": {
      "is_write_index": true
    }
  }
}

Notes:

  • The -000001 suffix is required for rollover sequencing.
  • is_write_index:true tells Elasticsearch where new writes should go.

3) Configure Logstash to use the write alias

Point Logstash to the rollover alias and avoid hardcoding an index name.

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    manage_template => false
    template_name   => "test-vks"
    # index => "vks-nginx-error-%{+YYYY.MM.dd}"   # keep commented when using ILM
    ilm_rollover_alias => "vks-nginx-write-alias"
  }
}

Notes:

  • manage_template => false prevents Logstash from overwriting your Elasticsearch template.
  • Restart Logstash after changes.

How rollover works

  • When ILM conditions are met, Elasticsearch creates the next index (...-000002), moves the write alias to it, and keeps previous indices searchable.
  • Reads via the alias cover all indices it targets; writes always land on the active write index.

Common issues and quick fixes

  • rollover_alias missing: Ensure index.lifecycle.rollover_alias is set in the template and matches the alias used in bootstrap and Logstash.
  • Docs landing in the wrong index: Remove index in Logstash; use only ilm_rollover_alias.
  • Alias conflicts on rollover: Don’t embed the alias body in the template—bind it during the bootstrap call only.
Complete Flow of Implementing ILM with Write Aliases (Logstash + Elasticsearch)
Implementing ILM with Write Aliases (Logstash + Elasticsearch)

Quick checklist

  • ILM policy exists (e.g., es_policy01).
  • Template includes index.lifecycle.name and index.lifecycle.rollover_alias.
  • First index created with -000001 and is_write_index:true.
  • Logstash writes to the alias (no concrete index).
  • Logstash restarted and ILM verified.

Verify your setup (optional)

Run these in Kibana Dev Tools or via curl:

GET _ilm/policy/es_policy01 GET _index_template/test-vks GET vks-nginx-write-alias/_alias POST /vks-nginx-write-alias/_rollover # non-prod/manual test 

Install java on Linux centos

Reading Time: 3 minutes

In this tutorial we will quickly setup java on linux centos,

We will be using the yum command to download the openjdk 1.8 and install

[vamshi@node01 ~]$ sudo yum install java-1.8.0-openjdk.x86_64

We have installed the java openjdk 1.8 and we can check the version using java -version

[vamshi@node01 ~]$ java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

 

We make use of the alternatives command in centos which lists if we have any other version of java installed on the machine, and then enabling the default java version on the system wide.

[vamshi@node01 ~]$ alternatives --list | grep java
java auto /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java
jre_openjdk auto /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre
jre_1.8.0 auto /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre
jre_1.7.0 auto /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.261-2.6.22.2.el7_8.x86_64/jre
[vamshi@node01 ~]$ sudo alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*  1           java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)
 + 2           java-1.7.0-openjdk.x86_64 (/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.261-2.6.22.2.el7_8.x86_64/jre/bin/java)

Enter to keep the current selection[+], or type selection number: 1

This enabled openjdk1.8 to be the default version of java.

Setting JAVA_HOME path
In order to set the system JAVA_HOME path on the system we need to export this variable, for the obvious reasons of other programs and users using the classpath such as while using maven or a servlet container.

Now there are two levels we can setup the visibility of JAVA_HOME environment variable.
1. Setup JAVA_HOME for single user profile
We need to update the changes to the ~/.bash_profile

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/

PATH=$PATH:$JAVA_HOME

export PATH

Now we need enforce the changes with reloading the .bash_profile with a simple logout and then login into the system or we can source the file ~/.bash_profile as follows:

[vamshi@node01 ~]$ source .bash_profile

Verifying the changes:

[vamshi@node01 ~]$ echo $PATH
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/vamshi/.local/bin:/home/vamshi/bin:/home/vamshi/.local/bin:/home/vamshi/bin:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/

2. Setup JAVA_HOME for the system wide profile and available to all the users.

[vamshi@node01 ~]$ sudo sh -c "echo -e 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/' > /etc/profile.d/java.sh"

This echo command writes the JAVA_HOME path to the system profile.d and creates a file java.sh which is read system wide level.

Ensure the changes are written to /etc/profile.d/java.sh

[vamshi@node01 ~]$ cat /etc/profile.d/java.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/

Now source to apply the changes immediately to the file /etc/profile.d/java.sh as follows

[vamshi@node01 ~]$ sudo sh -c ' source /etc/profile.d/java.sh '

Or login to the root account and run the source command

Ensure to run the env command

[vamshi@node01 ~]$ env  | grep JAVA_HOME
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/

How do I download and install Java on CentOS?

Install Java On CentOS

  1. Install OpenJDK 11. Update the package repository to ensure you download the latest software: sudo yum update. …
  2. Install OpenJRE 11. Java Runtime Environment 11 (Open JRE 11) is a subset of OpenJDK. …
  3. Install Oracle Java 11. …
  4. Install JDK 8. …
  5. Install JRE 8. …
  6. Install Oracle Java 12.

Is Java installed on CentOS?

OpenJDK, the open-source implementation of the Java Platform, is the default Java development and runtime in CentOS 7. The installation is simple and straightforward.

How do I install Java on Linux?

  • Java for Linux Platforms
  • Change to the directory in which you want to install. Type: cd directory_path_name. …
  • Move the . tar. gz archive binary to the current directory.
  • Unpack the tarball and install Java. tar zxvf jre-8u73-linux-i586.tar.gz. The Java files are installed in a directory called jre1. …
  • Delete the . tar.

How do I install latest version of Java on CentOS?

To install OpenJDK 8 JRE using yum, run this command: sudo yum install java-1.8. 0-openjdk.

Where is java path on CentOS?

They usually reside in /usr/lib/jvm . You can list them via ll /usr/lib/jvm . The value you need to enter in the field JAVA_HOME in jenkins is /usr/lib/jvm/java-1.8.

How do I know if java is installed on CentOS 7?

  • To check the Java version on Linux Ubuntu/Debian/CentOS:
  • Open a terminal window.
  • Run the following command: java -version.
  • The output should display the version of the Java package installed on your system. In the example below, OpenJDK version 11 is installed.

Where is java path set in Linux?

Steps

  • Change to your home directory. cd $HOME.
  • Open the . bashrc file.
  • Add the following line to the file. Replace the JDK directory with the name of your java installation directory. export PATH=/usr/java/<JDK Directory>/bin:$PATH.
  • Save the file and exit. Use the source command to force Linux to reload the .

How do I install java 14 on Linux?

Installing OpenJDK 14

  • Step 1: Update APT. …
  • Step 2: Download and Install JDK Kit. …
  • Step 3: Check Installed JDK Framework. …
  • Step 4: Update Path to JDK (Optional) …
  • Step 6: Set Up Environment Variable. …
  • Step 7: Open Environment File. …
  • Step 8: Save Your Changes.

How do I know where java is installed on Linux?

This depends a bit from your package system … if the java command works, you can type readlink -f $(which java) to find the location of the java command. On the OpenSUSE system I’m on now it returns /usr/lib64/jvm/java-1.6. 0-openjdk-1.6. 0/jre/bin/java (but this is not a system which uses apt-get ).

How do I install java 11 on Linux?

Installing the 64-Bit JDK 11 on Linux Platforms

  1. Download the required file: For Linux x64 systems: jdk-11. interim. …
  2. Change the directory to the location where you want to install the JDK, then move the . tar. …
  3. Unpack the tarball and install the downloaded JDK: $ tar zxvf jdk-11. …
  4. Delete the . tar.

Signals in Linux; trap command – practical example

Reading Time: 4 minutes

The SIGNALS in linux

The signals are the response of the kernel to certain actions generated by the user / by a program or an application and the I/O devices.
The linux trap command gives us a best view to understand the SIGNALS and take advantage of it.
With trap command can be used to respond to certain conditions and invoke the various activities when a shell receives a signal.
The below are the various Signals in linux.

vamshi@linuxcent :~] trap -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

Lets take a look at some Important SIGNALS and their categorization of them:

Job control Signals: These Signals are used to control the Queuing the waiting process
(18) SIGCONT, (19) SIGSTOP , (20) SIGSTP

Termination Signals: These signals are used to interrupt or terminate a running process
(2) SIGINT , (3) SIGQUIT, (6) SIGABRT,  (9) SIGKILL,  (15) SIGTERM.

Async I/O Signals: These signals are generated when data is available on a Input/Output device or when the kernel services wishes to notify applications about resource availability.
(23) SIGURG,  (29) SIGIO,  (29) SIGPOLL.

Timer Signals: These signals are generated when application wishes to trigger timers alarms.
(14) SIGALRM,  (27) SIGPROF,  (26) SIGVTALRM.

Error reporting Signals: These signals occur when running process or an application code endsup into an exception or a fault.
(1) SIGHUP, (4) SIGILL, (5) SIGTRAP, (7) SIGBUS, (8) SIGFPE,  (13) SIGPIPE,  (11) SIGSEGV, (24) SIGXCPU.

Trap command Syntax:

trap [-] [[ARG] SIGNAL]

ARG is a command to be interpreted and executed when the shell receives the signal(s) SIGNAL.

If no arguments are supplied, trap prints the list of commands associated with each signal.
to unset the trap a – is to be used followed by the [ARG] SIGNAL] which we will demonstrate in the following section.

How to set a trap on linux through the command line?

[vamshi@linuxcent ~]$ trap 'echo -e "You Pressed Ctrl-C"' SIGINT

Now you have successfully setup a trap:>

When ever you press Ctrl-c on your keyboard, the message “You Pressed Ctrl-C” gets printed.

[vamshi@linuxcent ~]$ ^CYou Pressed Ctrl-C
[vamshi@linuxcent ~]$ ^CYou Pressed Ctrl-C
[vamshi@linuxcent ~]$ ^CYou Pressed Ctrl-C

Now type the trap command and you can see the currently set trap details.

[vamshi@node01 ~]$ trap
trap -- 'echo -e "You Pressed Ctrl-C"' SIGINT
trap -- '' SIGTSTP
trap -- '' SIGTTIN
trap -- '' SIGTTOU

To unset the trap all you need to do is to run the following command,

[vamshi@node01 ~]$ trap - 'echo -e "You Pressed Ctrl-C"' SIGINT

The same can be evident from the below output:

[vamshi@node01 ~]$ trap
trap -- '' SIGTSTP
trap -- '' SIGTTIN
trap -- '' SIGTTOU
[vamshi@node01 ~]$ ^C
[vamshi@node01 ~]$ ^C

What is trap command in Linux?

A built-in bash command that is used to execute a command when the shell receives any signal is called `trap`. When any event occurs then bash sends the notification by any signal. Many signals are available in bash. The most common signal of bash is SIGINT (Signal Interrupt).

What is trap command in bash?

If you’ve written any amount of bash code, you’ve likely come across the trap command. Trap allows you to catch signals and execute code when they occur. Signals are asynchronous notifications that are sent to your script when certain events occur.

How do you Ctrl-C trap?

To trap Ctrl-C in a shell script, we will need to use the trap shell builtin command. When a user sends a Ctrl-C interrupt signal, the signal SIGINT (Signal number 2) is sent.

What is trap shell?

trap is a wrapper around the fish event delivery framework. It exists for backwards compatibility with POSIX shells. For other uses, it is recommended to define an event handler. The following parameters are available: ARG is the command to be executed on signal delivery.

What signals Cannot be caught?

There are two signals which cannot be intercepted and handled: SIGKILL and SIGSTOP.

How does shell trap work?

The user sets a shell trap. If the user is hit by a physical move, the trap will explode and inflict damage on the opposing Pokémon. The user sets a shell trap. If the user is hit by a physical move, the trap will explode and inflict damage on opposing Pokémon.

How do I wait in Linux?

Approach:

  1. Creating a simple process.
  2. Using a special variable($!) to find the PID(process ID) for that particular process.
  3. Print the process ID.
  4. Using wait command with process ID as an argument to wait until the process finishes.
  5. After the process is finished printing process ID with its exit status.

How use stty command in Linux?

  1. stty –all: This option print all current settings in human-readable form. …
  2. stty -g: This option will print all current settings in a stty-readable form. …
  3. stty -F : This option will open and use the specified DEVICE instead of stdin. …
  4. stty –help : This option will display this help and exit.

Can I trap Sigkill?

You can’t catch SIGKILL (and SIGSTOP ), so enabling your custom handler for SIGKILL is moot. You can catch all other signals, so perhaps try to make a design around those. be default pkill will send SIGTERM , not SIGKILL , which obviously can be caught.

What signal is Ctrl D?

Ctrl + D is not a signal, it’s EOF (End-Of-File). It closes the stdin pipe. If read(STDIN) returns 0, it means stdin closed, which means Ctrl + D was hit (assuming there is a keyboard at the other end of the pipe).

How to Shutdown or Reboot a remote Linux Host from commandline

Reading Time: 6 minutes

The Shutdown process in a Linux system is an intelligent chain process where in the system ensures the dependent process have successfully terminated.

TL;DR:

Difference between the Halt and Poweroff in Linux?
What is a Cold Shutdown and Warm Shutdown?
Linux System System Halt : The Halt process instructs the hardware to Stop the functioning of the CPU. Can be referred as a Warm Shutdown.
Linux System Poweroff/Shutdown : The Poweroff function sends the ACPI(Advanced Configuration and Power Interface) to power down the system. Can be referred as a Cold Shutdown.

As you may be aware the Linux runtime environment is a duo combination of processes running in User space and the Kernel space, All the major system activities and resources are initiated and governed and terminated by Kernel space.
So we got the Kernel space and the User space, The kernel space is where all the reseurce related processes run, which follows a finite behaviour, and the the userspace where the processes are dependent on the user actions, most of the userspace programs depend on the kernel space and make a context switch to get the CPU scheduling and such..
So, In the sequence of Shutdown on a linux machine, the userspace processes are first terminated in a systematic fashion through the scripts triggered by the core systemd processes which ensures clean exit and termination all the processes.

The Linux system provides us quite a few commands to enforce fast shutdown or a graceful shutdown of the operating system, each having their own consequences.

Firstly the init or the systemd which is the pid 1 process is what controls the runlevel of the system and it determines which processes are launched and running in that runlevel

The init is a powerful command which executes the runlevel it is told to.
Here the init 0 proceeds to Power-off the machine

$ sudo init 0

Here the init 6 proceeds to Reboot the machine

$ sudo init 6

These commands are real quick as it triggers the kernel space shutdown invocation process.. most often resulting in unclean termination of processes resulting system recovery and journaling while booting.

The following commands Shutdown the machine in seconds after issuing the command But tend to follow the kill sequence and clean exit of the processed.

$ sudo shutdown
$ sudo poweroff
$ sudo systemctl poweroff

Prints a wall message to all users.
All the processes are killed and the volumes are unmounted or converted to be in Read-Only mode while system power off is in progress.
Puts the system into a complete poweroff mode cutting out power supply to the machine completely.

$ sudo halt
$ sudo systemctl halt

Prints a message of “System halted” and Puts the machine in halt mode
If the --force or -f when specified twice the operation is immediately executed without terminating any processes or unmounting any file systems and resulting in data loss

The servers can only be brought back online through physical poweron or Remote Power manager console such as IPMI or ILOM.

To reboot or [/code]systemctl kexec[/code] will to restart the operating system which is one power cycle or equivalent of shutdown followed by the startup.

$ sudo reboot

$ sudo systemctl kexec

$ sudo systemctl reboot

If the --force or -f when specified twice the operation is immediately executed without terminating any processes or unmounting any file systems and resulting in data loss

 

It is important to understand that the commands are all softlinks to systemctl shutdown command. and ensure in proper shutdown sequence process

[vamshi@linuxcent cp-command]$ ls -l /usr/sbin/halt
lrwxrwxrwx. 1 root root 16 Jan 13 14:41 /usr/sbin/halt -> ../bin/systemctl
[vamshi@linuxcent cp-command]$ ls -l /usr/sbin/reboot
lrwxrwxrwx. 1 root root 16 Jan 13 14:41 /usr/sbin/reboot -> ../bin/systemctl
[vamshi@linuxcent cp-command]$ ls -l /usr/sbin/poweroff
lrwxrwxrwx. 1 root root 16 Jan 13 14:41 /usr/sbin/poweroff -> ../bin/systemctl

It is important to observe that all the commands are softlink to the systemctl process, When issuing a shutdown or reboot

The best practice to poweroff the system by enabling broadcast the notification message to all the users connected actively either through the Pseudo remote connection terminal or TTY terminals, Demonstrated as follows:

$ sudo systemctl poweroff

# this writes an entry into the journal, the wtmp and broadcasts the shutdown message to all the users connected through PTS and TTY terminals

What is the difference between systemctl poweroff and systemctl halt ?

The Linux System when put to a Halt state, stops the all the applications and ensures they’re safely exited, filesystems and volumes are unmounted and it is taken into a halted state where in the power connection is still active. And Can only be brought  online with a power reset effectively with a simple reset.
The Halt process instructs the hardware to Stop the functioning of the CPU.
Commonly can be referred as a Warm Shutdown.

Below is the screenshot to demonstrate the same
systemctl halt command in linux

The Poweroff function sends the ACPI(Advanced Configuration and Power Interface) to power down the system.
The Linux System when put to a Poweroff state it becomes completely offline following the systematical clean termination of processes.. and power input is cut off to the external peripherals, which is also sometimes called as cold shutdown, and the startup cold start.
Commonly can be referred to as a Cold Shutdown.

If you found the article worth your time, Please share your inputs in the comments section and share your experiences with shutdown and reboot issues

Can I reboot Linux remotely?

How to shutdown the remote Linux server. You must pass the -t option to the ssh command to force pseudo-terminal allocation. The shutdown accepts -h option i.e. Linux is powered/halted at the specified time.

Can you reboot a server remotely?

Open command prompt, and type “shutdown /m \\RemoteServerName /r /c “Comments”“. … Another command to restart or shutdown the Server remotely is Shutdown /i. Type Shutdown /i on the command prompt and it will open another dialogue box.

What is the Linux command to reboot?

To reboot Linux using the command line:

  1. To reboot the Linux system from a terminal session, sign in or “su”/”sudo” to the “root” account.
  2. Then type “ sudo reboot ” to reboot the box.
  3. Wait for some time and the Linux server will reboot itself.

How do I reboot from remote desktop?

Procedure. Use the Restart Desktop command. Select Options > Restart Desktop from the menu bar. Right-click the remote desktop icon and select Restart Desktop.

What does sudo reboot do?

sudo is short for “Super-user Do”. It has no effect on the command itself (this being reboot ), it merely causes it to run as the super-user rather than as you. It is used to do things that you might not otherwise have permission to do, but doesn’t change what gets done.

How do I remotely turn on a Linux server?

Enter the BIOS of your server machine and enable the wake on lan/wake on network feature. …
Boot your Ubuntu and run “sudo ethtool -s eth0 wol g” assuming eth0 is your network card. …
run also “sudo ifconfig” and annotate the MAC address of the network card as it is required later to wake the PC.

How do I restart a terminal server remotely?

From the remote computer’s Start menu, select Run, and run a command line with optional switches to shut down the computer:
To shut down, enter: shutdown.
To reboot, enter: shutdown –r.
To log off, enter: shutdown –l

How do I send Ctrl Alt Del to remote desktop?

Press the “CTRL,” “ALT” and “END” keys at the same time while you are viewing the Remote Desktop window. This command executes the traditional CTRL+ALT+DEL command on the remote computer instead of on your local computer.

How do I remotely restart a server by IP address?

Type “shutdown -m \ [IP Address] -r -f” (without quotes) at the command prompt, where “[IP Address]” is the IP of the computer you want to restart. For example, if the computer you want to restart is located at 192.168. 0.34, type “shutdown -m \ 192.168.

How do I reboot from command prompt?

  1. From an open command prompt window:
  2. type shutdown, followed by the option you wish to execute.
  3. To shut down your computer, type shutdown /s.
  4. To restart your computer, type shutdown /r.
  5. To log off your computer type shutdown /l.
  6. For a complete list of options type shutdown /?
  7. After typing your chosen option, press Enter.

How does Linux reboot work?

The reboot command is used to restart a computer without turning the power off and then back on. If reboot is used when the system is not in runlevel 0 or 6 (i.e., the system is operating normally), then it invokes the shutdown command with its -r (i.e., reboot) option.

Git config setup on linux; Unable to pull or clone from git; fatal: unable to access git; Peer’s Certificate has expired

Reading Time: 3 minutes

Facing an issue with pulling the repository while dealing with an expired SSL certificate.

[vamshi@workstation ~]$ git pull https://gitlab.linuxcent.com/linuxcent/pipeline-101.git
fatal: unable to access 'https://gitlab.linuxcent.com/linuxcent/pipeline-101.git/': Peer's Certificate has expired.
[vamshi@workstation ~]$

SSL error while cloning git URL

If you have faced the error, then we can work around it by ignoring SSL certificate check and continue working with the git repo.

[vamshi@workstation ~]$ git clone https://gitlab.linuxcent.com/linuxcent/pipeline-101.git
Cloning into 'pipeline-101'...
fatal: unable to access 'https://gitlab.linuxcent.com/linuxcent/pipeline-101.git/': Peer's Certificate has expired.

It doesn’t allow the clone or pull or push to the gitlab website as its certificate is not valid, and the certificate is unsigned by a Valid CA. In most cases, we will have the corporate gitlab repo in our internal network and not publicly exposed.
We therefore trust the gitlab server as we have a bunch of our code on it.. Why not, I say?
We have to ensure to disable the check for the SSL certificate verification

Set the Variable GIT_SSL_NO_VERIFY=1 or GIT_SSL_NO_VERIFY=false and try to execute your previous command.

[vamshi@workstation ~]$ GIT_SSL_NO_VERIFY=1 git clone https://gitlab.linuxcent.com/linuxcent/pipeline-101.git
Cloning into 'pipeline-101'...
Username for 'https://gitlab.linuxcent.com': vamshi
Password for 'https://[email protected]': 
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.

Make a permanent entry to system wide user level profiles as below. The following change works at the system level

[vamshi@workstation ~]$ sudo bash -c "echo -e export GIT_SSL_NO_VERIFY=1 > /etc/profile.d/gitconfig.sh "
[vamshi@workstation ~]$ cat /etc/profile.d/gitconfig.sh
export GIT_SSL_NO_VERIFY=false

The practical use case of setting the environment variable can be made while building container images, using  the GIT_SSL_NO_VERIFY false as an environment variable in Dockerfile and building an image.

[vamshi@workstation ~]$ cat Dockerfile
FROM jetty:latest
-- CONTENT TRUNCATED --
env GIT_SSL_NO_VERIFY 1
-- CONTENT TRUNCATED --

We can also setup the container build agent with Jenkins Pipeline code with similar configuration to fetch a gitrepo in our next sessions.

Why is git clone not working?

If you have a problem cloning a repository, or using it once it has been created, check the following: Make sure that the path in the git clone call is correct. … If you have an authorization error, have an administrator check the ACLs in Administration > Repositories > <repoName> > Access.

How do I fix fatal unable to access?

How to resolve “git pull,fatal: unable to access ‘https://github.com… \’: Empty reply from server”

  1. If you have configured your proxy for a VPN, you need to login to your VPN to use the proxy.
  2. to use it outside the VPN use the unset command: git config –global –unset http.proxy.

How do I bypass SSL certificate in git?

Prepend GIT_SSL_NO_VERIFY=true before every git command run to skip SSL verification. This is particularly useful if you haven’t checked out the repository yet. Run git config http. sslVerify false to disable SSL verification if you’re working with a checked out repository already.

How do I open a cloned git repository?

Clone Your Github Repository

  • Open Git Bash. If Git is not already installed, it is super simple. …
  • Go to the current directory where you want the cloned directory to be added. …
  • Go to the page of the repository that you want to clone.
  • Click on “Clone or download” and copy the URL.

Can not clone from GitHub?

If you’re unable to clone a repository, check that:
You can connect using HTTPS. For more information, see “HTTPS cloning errors.”
You have permission to access the repository you want to clone. For more information, see “Error: Repository not found.”
The default branch you want to clone still exists.

Do I need git for GitLab?

To install GitLab on a Linux server, you first need Git software. We explain how to install Git on a server in our Git tutorial. Next, you should download the GitLab omnibus package from the official GitLab website.

How do I clone a project from GitHub?

Cloning a repository

  • In the File menu, click Clone Repository.
  • Click the tab that corresponds to the location of the repository you want to clone. …
  • Choose the repository you want to clone from the list.
  • Click Choose… and navigate to a local path where you want to clone the repository.
  • Click Clone.

How do I push code to GitHub?

Using Command line to PUSH to GitHub

  • Creating a new repository. …
  • Open your Git Bash. …
  • Create your local project in your desktop directed towards a current working directory. …
  • Initialize the git repository. …
  • Add the file to the new local repository. …
  • Commit the files staged in your local repository by writing a commit message.