💾 Archived View for tobykurien.com › articles › 2022-05-17-smol-data-centre.gmi captured on 2023-01-29 at 02:42:21. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Smol Data Centre

I've been listening to several podcasts and watching some YouTube videos of tech hobbyists and many of them talk about setting up a data centre at home for running various things (like NAS, Home Assistant, etc.). Almost all of them seem to want to go as over the top as possible, e.g. having terabytes of NVMe storage with ZFS or RAID or some other complicated setup, running Kubernetes clusters on their Pi's, complaining about how they aren't getting their full 10 gigabits of throughput from their NAS even though they re-wired the house with Cat6a, how their Starlink isn't roaming so they have to suffer the indignation of using 5G, etc.

Meanwhile, back at my house, I'm very happily routing my (capped) LTE internet connection through a Pi 2, connected to a 2.4GHz WiFi AP at 100mbit, and total shared storage of about 120Gb (of which more than 50% is unused). I didn't want to waste my Pi 4 by using it in place of the bottleneck that is the Pi 2, even though it would increase the throughput of my internet connection, because it's fast enough as it is already. That is to say, my setup is the polar opposite of what seems to be the norm these days amoung tech enthusiasts, even though I consider myself a tech enthusiast.

I get that for many of these enthusiasts they want to use the hobby as an opportunity to learn some enterprise-grade technology. However, I still clearly remember the days before technology got "good enough" (by my standards anyway), and so I started thinking deeply about what I would consider a minimum viable home data centre, or as I'm calling it here, a Smol Data Centre (inspired by the smolnet). The intention is to host smolnet stuff like Gemini content, but also maybe run my mail server, calendar/contact DAV server, and other small services at home.

Simplicity

My over-arching guiding philosophy in this endeavour is to make it as cheap, simple, and efficient as possible. Unfortunately "simple" can mean different things to different people, so I'd like to define it a bit more clearly. For me, simplicity means:

Being able to understand all of the components that make up the system and what they do
Being able to intuitively identify a problem when something goes wrong and rectify it
Tangentially, not being reliant on a specific product or service that may not exist in 10 years time

This basically translates to: I would prefer to use simple UNIX command line tools that have been around forever, rather than YAML incantations to big shiny new monoliths. Also, for efficiency, I would like to have as few "moving parts" as possible.

Physical

My current physical implementation of my Smol Data Centre is:

Any spare Raspberry Pi I find in my drawers, which currently consists of several Pi 1's, Pi Zero's, a Pi 2, and a Pi 3 A+
Mount them vertically (like dishes in a dish rack) for passive cooling and minimal dust accumulation
Put a "roof" over them to protect them from dust and possible water leaks from the ceiling.
Powered by car phone chargers (i.e. 12V to 5V USB), connected to a 12V Lithium battery that is charged constantly from mains. This acts as a strong DC UPS at low cost. I could have used powerbanks, but the 12V battery also supplies my modem, switch, and home alarm.
The Pi's boot off a microSD card, and mount a USB thumbdrive as their main storage (/home partition). USB thumbdrives are really cheap, and I find 32Gb to be more than enough for my needs. I have several spare that I can swap in when needed.
The Pi's with ethernet jacks are wired into a switch, but the Pi 3 A+ is connected via WiFi (I know! The horror! It's placed closed to the AP though).

OS

I would have liked to run OpenBSD on all the Pi's, but alas it only runs on a Pi 3 or Pi 4. I have it running on the Pi 3. One thing I really like about OpenBSD is that it's easy to get my head around and to administer. As an example, running `mount`, `set`, `export`, or even `ps ax` returns a sane screenful or less of understandable output, unlike in most modern Linuxes, where you have to `grep` the result to find the needle in the messy haystack.

For the older Pi's, I chose to run Alpine Linux, which is also small, stable, and easy to understand.

Where possible, I make the OS microSD card read-only (for reliability), and mount `tmpfs` for things like `/var`. I have to temporarily make the filesystem read-write to do OS updates.

Backups

I'm not using RAID or ZFS, but plain old ext4 on a USB thumbdrive. Sounds crazy, but I've got thumbdrives from over 15 years ago that are still working reliably! As long as it's backed up, a failure shouldn't be a problem. Here are my thoughts on backups:

I prefer backups to be interactive and manual, rather than automated, for several reasons: I back up to hard drives so I'd like them to be powered off when not in use (reliability and security); I'd have to enter passwords to do backups (security); I get to see if there are errors; I know for sure the backups ran.
I make a `dd` backup of the OS microSD cards once every OS update. Easy to recover.
I make incremental `rsync` backups of the attached USB drives. Also easy to recover.
I backup to 2 or more drives and create checksums of all backup images/sets to verify integrity
I don't encrypt backups (sensitive files should be encrypted on the original storage) for reliability and future-proofing (got bitten by TrueCrypt)
Offsite backups: how "offsite" do I really need my backup to be? I'd simply like to protect my backups from fire, theft, or minor flooding. So my "offsite" backup can be one of the backup hard drives stored in the garage, or outhouse, or maybe in a Tupperware container inside the pool motor housing outside!

Security and connecting to the Internet

In order to simplify connecting my servers to the Internet, I decided to use a tiny VPS server at a provider. I am currently just using `autossh` to maintain a constant SSH port forward from each server to the VPS. I would like to learn and use WireGuard, but that adds a lot of complexity and for now the port forward works fine. For web-based services, I have `nginx` on the VPS server that manages the Let's Encrypt SSL certificate, reverse proxying to my various servers, and adds rate-limiting and other protections. This greatly simplifies the admin required on each of my servers as it's centrally managed.

Other layers of security can be added to each server, for example, instead of running services inside Docker containers, I'm running them in sandboxes created using BubbleWrap. I think `bwrap` is a very under-appreciated tool and I've been using it to great effect even for my dev workflow, but that's a topic for another post.

Thoughts?

This is an ongoing exploration for me, and I'm struggling to find online discussions around the topic of minimal self-hosting at home. If you have ideas or want to have a discussion, do reply to this post on Antenna or get in touch, I'd love to hear from you!