Managing a fleet of NixOS Part 2 - A KISS design

In this series of articles, I'll explain my steps toward designing an infrastructure to centrally manage a fleet of NixOS systems.

Comment on Mastodon

Introduction

Let's continue my series trying to design a NixOS fleet management.

Yesterday, I figured out 3 solutions:

1. periodic data checkout

2. pub/sub - event driven

3. push from central management to workstations

I retained solutions 2 and 3 only because they were the only providing instantaneous updates. However, I realize we could have a hybrid setup because I didn't want to let the KISS solution 1 away.

In my opinion, the best we can create is a hybrid setup of 1 and 3.

A new solution

In this setup, all workstations will connect periodically to the central server to look for changes, and then trigger a rebuild. This simple mechanism can be greatly extended per-host to fit all our needs:

The mechanism is so simple, it could be adapted to many cases, like using GitHub or any data source instead of a central server. I will personally use this with my laptop as a central system to manage remote servers, which is funny as my goal is to use a server to manage workstations :-)

File access design

One important issue I didn't approach in the previous article is how to distribute the configuration files:

Addressing each of these requirements is hard, but in the end I've been able to design a solution that is simple and flexible:

Design pattern for managing users

The workflow is the following:

The directory holding configuration is likely to have a flake.nix file (can be a symlink to something generic), a configuration file, a directory with a hierarchy of files to copy as-this in the system to copy things like secrets or configuration files not managed by NixOS, and a symlink to a directory of nix files factorized for all hosts.

The NixOS clients will connect to their dedicated users with ssh using their private key, this allows to separate each client on the host system and restrict what they can access using the SFTP chroot feature.

A diagram of a real world case with 3 users would look like this:

Real world example with 3 users

Work required for the implementation

The setup is very easy and requires only a few components:

Conclusion

I absolutely love this design, it's simple, and each piece can easily be replaced to fit one's need. Now, I need to start writing all the bits to make it real, and offer it to the world 🎉.

There is a NixOS module named autoUpgrade, I'm aware of its existence, but while it's absolutely perfect for the average user workstation or server, it's not practical for managing a fleet of NixOS efficiently.