In the beginning...

…there was an old, unused tower.

The beginning

Of course, that depends on how you define “beginning.” There are a lot of places you could call a beginning: the first time I used Linux way back in 1998 (sorry Dad… but not really); when I built my first computer from scratch in college; when I started working in the data center. Of course, these are all foundational pieces, but they’re not really the beginning of the homelab.

I built myself a new computer from the ground up in 2018, for the first time since 2011. At the time the old one was “decommissioned,” it was outfitted with an AMD FX-4100, 16GB of RAM, a multi-terabyte hard drive, and a not terrible video card (which, in retrospect, was likely seriously CPU-bound). That old machine ended up in the basement, powered down and unused; there was no network connection down there, as the cable into the house ran straight up to the consumer-grade WiFi router on the main level.

Adventures in networking

In 2019, my friend and coworker Cory introduced me to Mikrotik networking equipment; at the time, I was fairly well disenchanted with consumer-grade WiFi, with most of the “good” options not playing nicely with our CenturyLink fiber and it’s unfortunate attachment to the 90’s-era PPP protocol. He was kind enough to lend (which became a gift) me a Mikrotik RB2011, and I repurposed one of the old routers as a bridge and access point. The real enabler, though, was that the router lived in the basement, and sat in between the ONT feed and the WiFi access point. Now I had a place to plug in!

So what does any self-respecting Linux geek do? He fires up that old machine, plugs it into the network, and throws Ubuntu on it. And thus, the “homelab” was born. Of course, I didn’t really do much with it at that point, really just playing around with virtual machines and some for-fun programming.

Before long, I determined that the RB2011 didn’t allow me to take full advantage of the lovely gigabit fiber connection, so I upgraded to an RB4011, which is still our gateway to the internet now.

Kuber-whatnow?

Around this time, I had started using containers heavily at work. The promise of containerized apps was great, and I eventually stopped running VMs on the FX box and just ran Docker on it instead; the primary tenant was an installation of Plex that I used to play my ripped and transcoded DVDs and Blu-Rays (yes, I ripped my own discs, and continued to until Apple started streaming in 4K. The DMCA is in direct violation of personal use rights. But I digress again…).

And then our primary deployment platform at work changed from an in-house PaaS to Kubernetes. Now, at the time, Kubernetes seemed hopelessly complicated. All I wanted to do was deploy my applications! (I later learned that most of that complexity was due to the use of the Istio service mesh and ingress in what was at that time a very early version, but I digress yet again…) As tends to happen with me, after I got over my initial frustrations, I became interested and started to learn. The problem was, while you can run Kubernetes on one node, you really lose out on most of the benefits. And so I subdivided my one tower node into virtual machines, but even then… having to reboot that one host for an OS upgrade brought down the whole cluster, which wasn’t an issue for me at the time as I was running a single control-plane node, but I was ready to dive deeper.

Deploy the life Raft

So it was that I converted my three virtual machines from a one control-plane/two workers configuration to a combined three control-plane/worker configuration. And then I did an OS upgrade on the hypervisor and rebooted it.

And that was the end of that cluster. Again, at the time I didn’t understand what had happened, but later discovered that the cluster was never able to regain quorum (Raft, the distributed consensus protocol at the heart of the distributed database backing Kubernetes, requires a majority of nodes online and in agreement. When I rebooted the hypervisor host, the three nodes were never able to form a quorum again after the reboot).

I realized that if I was really going to dig into this, I needed more physical nodes. But computers are expensive, and while we weren’t hurting for money at the time, neither did I have money to just toss at more computers, especially since I had just made the decision to buy a dedicated NAS.

Would anybody like some Raspberry Pi?

My father-in-law had gifted me a 1st-generation Raspberry Pi a couple of years prior. I had had some fun with it, and at the time I started looking at these things, it was running the DHCP and DNS service for my home so that I could have more control than the router allowed me.

Raspberry Pi 4 had been released in 2019 with memory capacities up to 4GB, enough to run the Kubernetes control plane. The problem was, where am I going to plug four of these things in. The answer came to me in the form of a Google advertisement in early March, 2020: a cluster case called the PicoCluster, which had an integrated power supply and switch, allowing 5 Raspberry Pis to be connected in a cluster.

I ordered this on March 9, 2020. My last in-person day of work was March 12. It arrived on March 14. State-mandated COVID lockdown started on March 19.

Lockdown

Leading off here, COVID was awful. I hope we never encounter a situation like this again, that said, it really gave me some time to experiement with Kubernetes in a multiple control-plane situation. I had three of the Raspberry Pi nodes acting as control planes, with two acting solely as workers (I did allow scheduling on the control-plane nodes). Before too long, I had added on the FX box as another worker node and began experimenting with heterogenous clusters. It was all great. And then I wanted to do more.

Step one came in mid-2021, when I decided that the NAS and its spinning disks were just too slow for some of my storage needs, so I began investigating distributed storage. Longhorn came to my attention, impressing me with its simplicity of use and the fact that, while not yet GA, it supported ARM64. So I made what I consider to be my first cluster upgrade, adding 512GB USB SSDs to each Raspberry Pi and configuring them as distributed disks in Longhorn.

As it turns out, around this time all of my Kubernetes experimentation paid off, as I accepted a role at a Kubernetes-focused startup and was able to directly apply the skills I learned to my work there. This was also in the era of quickly-rising salaries for software engineers, which allowed me a bit more disposable income toward my hobby which was also directly applicable to my work.

I’ve got a fever, and the only prescription is more cowbe – err, power

I had ended up purchasing a PS5 on launch day. When I started at Replicated and we turned our former guest bedroom into my dedicated office, I made the decision that I was no longer doing much PC gaming and the tower I built in 2018 joined its older brother downstairs in the cluster. This necessitated buying a second UPS as I had run out of battery-backed plugs on the first. Once the box was up, I rearranged my control plane so that the towers were two of the nodes, with one Raspberry Pi making the third.

Unfortunately, the power imbalance became evident quickly. The Raspberry Pi just could not keep up as a control plane any longer, and so I started looking into my options. I was still cost-conscious, but also didn’t want (nor did I have space for) a big power-hungry tower. My research led me toward a mini PC based on a 5W Celeron N5100. Despite a similar power budget to the Raspberry Pi, this box outperformed the Pi by 3-4x, and could be outfitted with 16GB of RAM and a 1TB SSD for a total cost of ~$250. And so it was that in early 2022 I bought one of these machines and had three x86-64 nodes to run as my control planes, officially delegating the Pis to worker status.

The cluster existed in this state for about a year, but I began to hunger for more power, especially as I had more disposable income. In early 2023, I caught a sale on the same mini PC I had previously purchased, and so bought and fit out a second one.

Hardware decoding and power domain nirvana

In late 2023, I joined NVIDIA. Around that time, I began to get frustrated with essentially needing to commit one of the tower nodes to Plex due to the fact that AMD hardware decoding is relatively terrible and the Celerons, while having decent decoders, could not really handle both decoding and running the control plane. Plus, I was starting to get the itch to do some PC gaming again. I treated myself to my first full build since early 2023, a 24-core Intel Core i7 (the first Intel box I’d ever built) with an Arc A750 that would be decent for gaming if I got the urge, and absolutely excellent for video decoding when it was in the cluster, which I (correctly) guessed would be most of the time. This new box retired the now-12+-year-old AMD FX box, which was really beginning to show its age (the Celeron mini PCs were faster at just 10% of the power draw).

Not too long after this in early 2024, I decided that it was time to truly have three failure domains, and purchased a third UPS. With my control plane nodes spread out across three UPSes, I finally had the peace of mind that I would never (barring a long power outage) again experience the complete quorom collapse of my first cluster failure back in 2020.

Of course, what I did not account for was my OCD. I currently had two Celron N5100 control plane nodes, and one Ryzen 1700 node (the 2018 tower). Thus, when a closeout sale came up on the Celeron N5100 mini PCs, I bought a third one to complement the other two, now owning three identical control plane nodes. The towers were dedicated exclusively to workloads, and the Celerons handled the control plane and some other lightweight tasks, with the Raspberry Pis getting random low-power assignments from the Kubernetes scheduler.

The last piece

Finally, a few weeks ago, the OCD kicked in again. I had 2x redundancy across two power zones, but the third only had one x86 node and the Raspberry Pi cluster. Additionally, as I’ve been getting involved with OpenShift at work, I’ve been looking to more closely mirror the upstream stack Red Hat uses for OpenShift. One component here is the cluster storage operator Rook-Ceph, which would replace Longhorn. Coincidentally, I had been getting frustrated with Longhorn’s performance and inability to quickly sync up a replica that had been offline a short time (say, for a package upgrade and reboot). Ceph ideally requires six nodes with dedicated storage disks, so I added slower/cheaper boot/OS SSDs to each of the x86-64 nodes, leaving the faster NVMe SSDs for cluster storage.

However, even with this upgrade, I was left with only five Ceph-capable nodes, as the Raspberry Pis just don’t have enough resources to run Ceph effectively. So, I decided to add a sixth x86-64 node. The Celeron N5100 nodes were no longer available, having been replaced with the less capable Intel Processor N100 (which, among other things, only supports a single memory channel). Additionally, I had determined at this point that I would prefer a box closer in performance to the towers, effectively creating performance tiers.

After a lot of research, I found a mini PC that met my requirements, based on the Ryzen 5700U laptop chip. It was effectively equivalent in CPU power to my Ryzen 1700 node, only drew 15 watts, and was only slightly more expensive than the original Celeron node was when I bought it. I bought one, and effectively “completed” my cluster. (Quotes because a computing environment is completed much like a work of art is completed, that is to say, never.) This node was added to my cluster, giving me 6 Ceph-capable nodes across three power domains, with the Raspberry Pi cluster available in a domain for smaller workloads.

Current state

My “complete” cluster now consists of the following:

Performance tier 1: Three Celeron N5100 nodes (4C/4T) with 16GB of RAM, a 250GB SATA boot/OS SSD, and a 1TB PCIe 3.0 NVMe SSD for distributed cluster storage. These nodes are my control plane, and take workloads as capacity allows. These nodes are split across three power zones.
Performance tier 2:
- One Ryzen 1700 node (8C/16T) with 32GB of RAM, a 250GB SATA boot/OS SSD, and a 1TB PCIe 3.0 NVMe SSD for distributed cluster storage
- One Ryzen 5700U node (8C/16T) with 32GB of RAM, a 250GB PCIe 3.0 NVMe boot/OS SSD, and a 1TB PCIe 3.0 NVMe SSD for distributed cluster storage
- One Core i7-13700 node (16C/24T) with 64GB of RAM, a 250GB PCIe 3.0 NVMe boot/OS SSD, and a 2TB PCIe 4.0 NVMe SSD for distributed cluster storage, plus an Intel Arc A750 for video decoding These nodes take the majority of the workload and are split across three power zones.
Performance tier 3: Four Raspberry Pi 4B (4C/4T) with 4GB of RAM and a 500GB USB 3.0 SSD boot drive, currently unused for cluster storage. Small workloads are scheduled to these nodes. All nodes are in one power zone, powered by one plug.
A NAS with a Celeron J3160, 2GB of RAM, and 32TB (4x8GB) of raw spindle disk storage in RAID5 (24TB usable).

All of this is connected to a Mikrotik RB4011 via gigabit links (with the Raspberry Pis sharing a single gigabit link)

Future considerations

Some upgrades that I’m considering making:

Upgrade the networking for the x86-64 nodes to 2.5GbE. All except the Ryzen 1700 node already support it natively, so the upgrade would involve a new router or 2.5GbE switch and a 2.5GbE NIC for the Ryzen 1700 node
Upgrade the CPU in the Ryzen 1700 (Zen 1) node to a Ryzen 5700X (Zen 3) now that the motherboard supports it; same core count but much improved IPC and memory bandwidth
Upgrade memory in the Ryzen 1700 and Ryzen 5700U nodes to 64GB
Upgrade SSDs in all x86-64 cluster nodes (except the Core i7 box which already has it) to 2TB; Rook-Ceph has been complaining about imbalances due to the larger SSD in the Core i7 box
Upgrade the NAS to a newer model with 2x2.5GbE ports (5x improvement in network bandwidth) and the capability to add NVMe SSDs for caching (the 1TB SSDs in the x86-64 nodes would go to this effort)
Upgrade the memory in the Celeron nodes to 32GB. I’m not sure I will do this one, because I’m not convinced of the necessity from a CPU/RAM balance perspective based on current cluster utilization

If you’re still awake, thanks for sticking with me. This has really been quite the journey over the last four years and I’m really quite happy and proud of the current state of the lab from a hardware perspective, the above upgrades notwithstanding.