From Zero to HA Hero: Building On-Prem Kubernetes Clusters with Ansible

After eight years as a Customer Success Engineer at Naverisk, living almost entirely in Windows, I felt something needed to shift.
I loved solving problems, helping clients, and managing systems, but I needed something harder.
Something messier.
Something new.

This realization led me to the world of Kubernetes.
Transitioning from a Windows-centric role to building and managing Kubernetes clusters was no small feat.
I had to acquaint myself with Linux systems, containerization, orchestration, and a plethora of tools and concepts that were previously outside my comfort box.
It was a steep learning curve, but one that reignited my passion for technology and innovation.

In this article, I’ll share my journey from the familiar grounds of Windows to the dynamic landscape of Kubernetes.
I’ll delve into the challenges I faced, the lessons I learned, and how I leveraged tools like Ansible and Rancher to build robust, on-premises Kubernetes clusters.
Whether you’re considering a similar transition or simply curious about the process, I hope my experiences provide valuable insights and inspiration.

New Skills, New Life: Learning Linux While Everything Else Was On Fire

As I prepared to pivot from Windows-heavy systems work into more Linux-native territory, I enrolled in the Linux Foundation’s System Administrator course.
I knew I needed a solid foundation, no pun intended, but I didn’t quite expect the mental gymnastics that came with it.

The course was… a lot. Don’t get me wrong, it was well-structured, but the authors were blunt: “You will not pass the certification exam with course content alone.” That warning turned out to be dead serious. The course gave you the map, but no vehicle. You had to build your own and fill in wide knowledge gaps along the way. Which meant diving into man pages, exploring obscure config files, and breaking (then fixing) many things in my home lab.

And here’s the kicker: I wasn’t doing this in a vacuum.

At the same time, my wife was pregnant. We had just moved into a new house that needed renovation, top to bottom. I was still working full-time in a demanding support role.

And then boom! The baby arrived.

Between diapers, baby cries, SSH sessions, broken configs, tailing logs and service restarts, I kept grinding.
Sometimes I studied Linux at 3AM while keeping watch on the kid and playing with my wife’s hair(she sleep well when I do that).
Sometimes I was debugging a service, a script or an install while the sound of my baby boy crying in his mom’s arms echoed from the next room.
It was chaotic and painful.
It was exhausting.
But it was also kind of exhilarating.
This wasn’t just a career upgrade; it was a life transformation.
I wanted more for myself, but even more for my new family. I knew upgrading myself would upgrade them too.

New Role, New Reality: Trial by Fire

Eventually, my efforts paid off and I landed the job I now hold. The interview went great, the team seemed solid, and everything felt like the natural next step in my career.
And it was, just with one small twist: it was go-time from day one.

We were a small team, and there wasn’t much room for hand-holding. I had to dive deep and fast.
No slow ramp-up, no months-long onboarding. Just “Here’s what we need—go build it.”

There was a 6 month probation period, where I was not allowed full admin privileges, but that ended in 1 month when hungry for information and lack of team-mate support(small team, remember), I demanded I get the rights to the kingdom’s keys. And I was given said keys, then started working.

And that’s when it hit me. I now have goals. Official goals! Every. Quarter.

“We need production-grade, on-prem Kubernetes clusters. Use Ansible. Build or get a new Ansible platform. Your pick!”

Cue minor internal crisis.

The team had trialed Ansible Tower. It was $30k/year. I saved that in my 2nd month, by switching to Ansible AWX. Had to build the think properly, using K8s. All guides pointed to K3s. I had to learn K8s.

From scratch!

Sure, I’d messed with Kubernetes before, but only in my home lab. Now I was in an enterprise environment.

I had a folder of copy-pasted kubectl commands and some vague memory of tinkering with services and pods.
But little did I know… I had been playing with K3s the whole time. That was my entire “experience.”
I didn’t even fully grasp the difference between a control plane and a worker node, let alone know what “HA cluster with VIP failover” meant in the real world.

And Ansible? I only had a simple playbook that run yum update on a few VMs. That was about it.

But the mission was non-negotiable:
✅ Build clusters.
✅ Make them reproducible.
✅ Make them rock-solid.
✅ Oh—and do it on bare-metal, on-prem infrastructure.

It was one of those “sink or swim” moments. Spoiler: I decided I wasn’t going to drown.

So I fucking swam. Not even Jesus could catch me.

Learning By Doing (and Googling… a Lot)

Before anything else, I had to bootstrap the infrastructure. We weren’t using Terraform or Spacelift back then. No IaC, no GitOps.
Just me and Ansible, deploying VMs on bare metal.

That’s where the real learning began. I went down the Kubernetes rabbit hole:

  • What’s the deal with Docker vs Containerd?
  • How do systemd slices and cgroups actually control resource limits?
  • What are CNIs, and why does Flannel sometimes just… stop working?
  • How do you bootstrap a control plane, generate join tokens, and manage cert expiry?
  • What are kubeconfigs, and why do they feel like secret scrolls?

At one point I had so many tabs open on certificate chains and API server arguments, I felt like I was studying for bar exams, not building infra.

But slowly, it clicked. I built scripts, then turned those scripts into Ansible playbooks.
Then into a workflow.

Eventually, it was one click.

The “Fun” Bit: HA Clusters with Virtual IPs

The final boss was an HA setup: 3 synchronized control plane nodes and several workers, all load-balanced behind a single Virtual IP.

We used:

  • HAProxy + Keepalived on two separate nodes, all built and configured by Ansible, dynamically.
  • VIP failover between them in case one dropped. Again, part of a dynamic, logic based configuration.

It worked.

It booted, joined, and scaled. Everything that had to talk to the cluster hit the VIP, which routed to healthy control planes. And it was all automated.

I could go from bare VM to HA K8s cluster with a single command. 10 fucking VMs all provisioned, configured and tied together. One click.

4 years later, my clusters still run, with not issues. We migrated all workloads from Windows workers, running on individual servers, to K8s pods, running on one cluster. Money was saved, efficiency was served and everything clicked together nicely.

I still maintain that stuff.

Rancher + AWX = Final Form

We weren’t done yet.

Once the clusters came online, they had to be managed. So I automated Rancher cluster registration using their API. No UI clicking. Rancher became our control tower.

At this point we were finally in a good place. Git-based updates, Rancher governance, AWX automation. And beefy servers that meant we didn’t even bother with autoscaling.

Each control plane and worker had its own blade. We had power. We had control.

Now we’re planning to move to EKS. Easy peasy!

Leave a Reply