Thursday, 25. November 2021
[This article has been bi-posted to Gemini and the Web]
This is the first of a two articles on cross-platform package management / package building. It covers the basics by discussing why it is actually surprisingly (to many people) difficult to do and what some of the problems are. It also takes a quick look at some strategies to solve the problem.
One of the strengths of today's Unix-like operating systems is that they offer proper package management. This makes installing and maintaining software both simple and convenient. _If_ we are talking homogeneous environments, this is! As soon as you have a heterogeneous environment to work in, things get complicated rather quickly. Packages and package systems are usually closely tied to their platform. It's not a coincidence that people talk about e.g. "RHEL-like" distributions and "RPM-based" ones kind of interchangeably. The package manager is so close to the core of an OS or a distribution that it can be used to refer to that synonymously.
It's usually not the package manager itself (unless it is a special one offering features that define a whole distribution that uses it like Gentoo's _portage_ that allows for fine-grained custom building of applications thanks to its _USE flags_ among other things or NixOS's _nix_ which as a purely functional package manager that supports concurrent installations of multiple versions, atomic upgrades, etc.). Whenever an application (or something else like a theme pack, documentation, etc.) is to be packaged, the maintainer has to make a couple of decisions which maintainers of the same software for another OS / distribution might have another opinion on.
One of the first things to mind is that package names may differ between operating systems / distributions. A popular example is the Apache webserver being available as _httpd_ in Linux distributions based on Red Hat's while in Debian-based ones it's known as _apache2_. FreeBSD uses the name _apache24_ and OpenBSD calls the same webserver _apache-httpd_. But that's only the names, any configuration management system (or even every admin by hand!) can handle that easily. The software is the same after all, right? Yes and no!
While it's all built from the code that is released by the same upstream project, all the platforms _organize_ the software differently. Sticking to the Apache example, it's pretty well known that Debian-based distributions use a mechanism called "sites-enabled" while others like Red Hat do not. This means to either embrace multiple schemes that are native to the platforms which you use or having to create your own and bent the default installations on all platforms to use that. It's not such an uncommon thing to harmonize configuration and doing it is not incredibly hard.
It comes at a price, though. Hired a new admin? You could probably expect him or her to be familiar with the standard scheme. But if you're using a custom one, the new employee will need time to become familiar with it. You also don't do it once and be done. The default configuration is likely to change over time. In case of our webserver for example the recommended ciphers for TLS encryption may change. If you use the default configuration you'll probably get important changes like this for free when updating. Forsaking it and doing your own means more homework for you to keep the configuration in good shape.
Speaking of which: On FreeBSD you will even find the configuration files in another place (/usr/local/etc/apache24) while on Linux it's commonly something below /etc (like /etc/apache2 or /etc/httpd). Other things like databases are frequently in different place, too. On FreeBSD it's /var/db/mysql for example while on Linux it's usually /var/libexec/mysql. By itself that's just another small detail to take into account. But those add up and should not be neglected entirely.
And even worse: The package maintainers for each platform make decisions on _compile-time options_! So the resulting software _will_ differ even if you go the extra mile and configure them alike with your custom runtime configuration scheme! Even seemingly basic things should not be just taken for granted. Once upon a time, the Apache webserver often came _without_ SSL support - sometimes there was an extra package which had it enabled for people who needed that functionality! Sometimes you had to build the software yourself (or use ports that let you set or unset the option as you please). On Debian, the Apache package has Lua support enabled but not that for LuaJIT. On FreeBSD both are disabled by default. The FreeBSD port offers more than _130_ (!) options - it's not hard to see how much of a difference the choice of the package maintainer can make for more advanced software!
While probably any reader will understand pretty well by now that our topic is rather far away from all peaceful unicorns and sunny weather, it gets worse still. It's not uncommon that package maintainers choose to apply patches instead of using the code as upstream provided it. This may be due to an incompatibility (maybe some dependency that this OS / distro ships is too old or too new for this software and so a patch is required to make it play along nicely). It may be because the maintainer feels that a fix that was not deemed important enough to warrant another release should still be applied. It could be because additional features not supported upstream are desired (many maintainers chose to ship a pretty popular but unofficial additional MPM called _ITK_ for example). Or it could because of any number of other good or bad reasons. Therefore software might even differ between various platforms if the exact same compile-time options were chosen...
And because we of course saved the best for last, the biggest problem is an even more demanding one: Package versions... Not only do the various operating systems / distributions update to newer versions at their own pace, but they may or may not backport fixes or newer features into the packages that they release! Keeping track of this is already a major hassle for a couple of programs - and it becomes a downright daunting task if you need to do it for many! And you _have_ to. It's _not_ optional. Why? Because newer versions of your software might introduce newer features or configuration directives that you definitely want to use. However you cannot simply enable them for all of your servers as the older versions will probably refuse to even start due to invalid (unknown) settings!
Newer versions may also deprecate or remove previously supported features. There are features that may only be available on certain platforms (maybe additional dependencies are required which are not ported to every platform that you use). All this kind of fun stuff that can totally ruin your weekend when it eventually bites you despite you having been lucky for a long time before it.
The more diverse your environment is, the more the consequences of what was just scratched above are going to make your job look like one of the inner circles of hell. What's the best way out of this misery?
Well, what about compiling the most important software yourself and deploying it to e.g. /opt? This is technically very much possible, but is it feasible in large scale? You're almost guaranteed to drown eventually. Don't give in to the temptations of going down this road! This way lies insanity.
If you've only got _a few_ different systems in use and not too many complex programs, you might get away with careful planning and careful configuration management. It won't be pretty but it's possible to do. Got _several_ different systems that you need to support? Do yourself a favor and find another solution.
There is in fact a proper solution to this: Using a package framework _that supports multiple platforms_. Doing so comes with its own set of challenges and pains, but they are much easier to bear. You probably have to learn to use a new package management tool. (Depending on your choice) you might need to understand the concept of a _ports tree_ if you are not already familiar with it. But doing so you will be able to use packages of the _same version_, built with the _same compile-time options_ (or at least very close if various platforms force diverging settings) and so on across your entire landscape!
Sounds too good to be true? Let's put two options which claim to be able to do just that to the test! For the Advance!BSD project we plan to use at least four different operating systems (for more information see the links below). Using the native packages on each one is basically out of question. Especially since we anticipate that we'll need to add some software packages of our own to the mix and totally lack the manpower to maintain that across four package systems.
Advance!BSD - thoughts on a not-for-profit project to support *BSD (1/2)
Advance!BSD - thoughts on a not-for-profit project to support *BSD (2/2)
The next article will introduce Pkgsrc and Ravenports and present the results of a 2 months evaluation of Pkgsrc on four BSD operating systems + Linux and illumos. It will also compare the advantages and disadvantages of both contenders in heterogenous environments.
Cross-platform package building: Pkgsrc vs. Ravenports (2/2)