💾 Archived View for perso.pw › blog › articles › managing-a-fleet-of-nixos-part1.gmi captured on 2023-09-28 at 16:08:13. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-05-24)

-=-=-=-=-=-=-

Managing a fleet of NixOS Part 1 - Design choices

Comment on Mastodon

Introduction

I have a grand project in my mind, and I need to think about it before starting any implementation. The blog is a right place for me to explain what I want to do and the different solutions.

It's related to NixOS. I would like to ease the management of a fleet of NixOS workstations that could be anywhere.

This could be useful for companies using NixOS for their employees, to manage all the workstations remotely, but also for people who may manage NixOS systems in various places (cloud, datacenter, house, family computers).

In this central management, it makes sense to not have your users with root access, they would have to call their technical support to ask for a change, and their system could be updated quickly to reflect the request. This can be super useful for remote family computers when they need an extra program not currently installed, and that you took the responsibility of handling your system...

With NixOS, this setup totally makes sense, you can potentially reproduce users bugs as you have their configuration, stage new changes for testing, and users can roll back to a previous working state in case of big regression.

Cachix company made it possible before I figure a solution. It's still not late to propose an open source alternative.

Cachix Deploy

Defining the project

The purpose of this project is to have a central management system on which you keep the configuration files for all the NixOS around, and allow the administrator to make the remote NixOS to pick up the new configuration as soon as possible when required.

We can imagine three different implementations at the highest level:

These designs have all pros and cons. Let's see them more in details.

Solution 1 - Scheduled job

In this scenario, The NixOS system would use a cron or systemd timer to periodically check for changes and trigger the update.

Pros

Cons

Solution 2 - Remote systems are listening for changes (publisher / subscriber)

In this scenario, the NixOS system would always be connected to the central management, using some kind of protocol like MQTT, BOCH or similar.

Pros

Cons

Solution 3 - The central management pushes the updates to the remote systems

In this scenario, the NixOS system would be reachable over a protocol allowing to run commands like SSH. The central management system would run a remote upgrade on it, or push the changes using tools like deploy-rs, colmena, morph or similar...

Awesome-nix list: deployment-tools

Pros

Cons

Making a choice

I tried to state the pros and cons of each setup, but I can't see a clear winner. However, I'm not convinced by the Solution 1 as you don't have any feedback or direct control on the systems, I prefer to abandon it.

The Solutions 2 and 3 are still in the competition, we basically ended with a choice between a PUSH and a PULL workflow.

Conclusion

In order to choose between 2 and 3, I will need to experiment with the Solution 2 technologies as I never used them (MQTT, RabbitMQ, BOCH etc…).