IT File

The IT file is basically an assignment where we have to write about 6 different topics in an encyclopedia fashion in about 500 words. Everything talked about in the text should be facts and should be sourced.

Technically it should be bullet points and not paragraphs but Gemini doesn't support bullet points, so I'll just break them down later

You can find some other draft topics here, I couldn't include them here because they weren't "technical enough"

Linux

Description

Linux is a family of open source operating systems based on the Linux kernel. The kernel of an operating system is the part that bridges the hardware with the software of the computer.

The operating systems based on that kernel are called "distributions" or shortened to "distro".

Many Linux distributions are shipped with default tooling from the GNU project, which is why some people ask for Linux distributions to be called GNU/Linux distributions.

The GNU project is an open source and free software project of development of utilities (text editors, command-line utilities, etc.) and system libraries (the piece of software used by the developers to build their system and interact with the operating system).

Types of Linux distributions

There are a lot of Linux distributions because technically anyone can create one. However, the most defining characteristic remain the package manager.

The package manager is a software used for installing, removing and updating software on a computer. On Linux-powered machines, unlike on Windows ones. It's generally not recommended to install software from installation files. Software for Linux should be verified and included in the official package manager. Packages are the units of the package manager they can contain one or some software, and one or several libraries.

Distributions using Pacman, such as Arch Linux, Manjaro, EndeavourOS or CachyOS, are rolling-release distributions. That means there are no big updates, but that the packages are all updated along the way. Pacman also have the specificity to only ship the latest packages. That's why those distributions are called "bleeding-edge" distributions. It should however be noted that this can be dangerous because some of those packages haven't yet been properly tested and might break or be vulnerable. This distribution is mostly targeting amateurs wanting to tinker with their system.

Distributions using APT, such as Ubuntu (and its derivatives), Debian or Tail, are stable long term support (LTS) distributions. The packages available on APT are usually outdated but very tested, stable and lightweight. This is one reason why Debian is used a lot to build lightweight Linux distributions. Debian also have a testing branch where more up-to-date software (similar to Pacman) can be installed.

Distributions using RPM, such as Fedora, CentOS or openSUSE, are semi-rolling release distributions, which means it tries to balance fixed release (with LTS) and rolling release for less critical software. Those distributions are often viewed as targeting "professionals".

Distributions using Nix (so only NixOS) is a very special kind of package management. Packages are made to be reproducible at all costs, easy to create, never enter in conflict, can be installed temporarily or permanently, and the state of the system can be rolled-back at any time. Nix supports both stable LTS releases and rolling-releases, its rolling-release branch also has more packages and more up-to-date packages than Pacman-based distributions.

Use cases of Linux

Linux distributions are the most used operating systems for servers. Most of all the world's websites run on Linux.

Many embedded systems, such as operating systems in televisions, cars. Typically, with distributions such as Android Auto or Android TV.

All mobile devices running Android uses Linux. Android itself being a Linux distribution.

Sources

Android Auto's website

Android TV's website

Wikipedia - Linux Adoption

Wikipedia - Kernel (operating system)

Wikipedia - Linux

The Arch Linux documentation

Comparison of Linux release models

NixOS

Description

NixOS is a Linux-based operating system based on the Nix package manager. All of its components called "modules" are built by the package manager. Nix aims to offer the most reliable, declarative and reproducible system possible.

The Nix package manager is a purely functional package manager. That means that the packages are built by functions taking inputs and returning some output (for example, a package). And every function can use other functions to do its work, but there can never be any side effect on the system.

Nix supports having several versions of the same software installed without any interference because it follows a different file organization than most distributions. Each software version is identified by a hash (a unique identifier) and directly reference other software/libraries by their hash also in a directory called the nix store. The Nix Store is the directory where every package that ever was installed or downloaded lives.

Nix also supports atomic upgrades and rollbacks, which means that it's possible to roll back to the previous state of a system at any time in case something breaks.

Nix also has its own functional packaging language, which is used to define packages using other functions.

It also supports "trying" new tools by installing them temporarily in the current session but will be then forgotten later. It also supports creating cloud, docker and operating system images which lets the user create their own operating system based on NixOS and re-distributing it easily.

Nix also helps developers define a shared development environment, where any other developer using Nix can load it and have the same exact environment as other developers.

Use cases

Nix is mostly used by system administrators to define an entire server or computer configuration easily and without fear of breaking.

Nix is also used a lot among Linux enthusiast who tinker their systems as a way to prevent their system from breaking, and in case of any mistake, rollback the changes.

It is mostly used by developers who have to manage a lot of different tools who often enter in conflict between one another. It's also used a lot among developers because it's an easy way to make sure everybody shares the same environment.

Installation and usage

To be used, Nix has to either be installed on an existing Linux or macOS installation or installed as an operating system as NixOS. There are limitations to using Nix on other systems than NixOS: any graphical software won't work, modules will be unavailable and rollbacks will be impossible. The rest of this file will consider we're using NixOS.

The user then has to define its preferences in the configuration.nix file. This file contains options for configuring the modules of the operating system. Anything from the software installed, the fonts installed, the timezone, the language, the users, and more can be configured via this file. Some software can even be individually configured via this file.

Once this is done, the user then has to perform a rebuild of its system by running a command. Their new system is then configured and install.

Sources

The official NixOS website

The "How Nix Works" guide

NixOS wikipedia article

NixOS installation guide

YouTube channel explaining NixOS concepts in short videos

Rust Programming Language

Definition

Rust is a programming language developed by a Mozilla employee in 2006, that emphasizes performance, type safety and concurrency, all without a virtual machine or a garbage collector.

Features

In order to prevent memory leaks, Rust tracks the "lifetime" of every variable by imposing some rules on the developer via the compiler in such a way that any memory-unsafe program can't be compiled.

Rust is also known for its aggressive compiler checks because many things that are deemed as "good practices" in other languages such as verification of possible null values or exceptions are mandatory in Rust by its design.

Null values are impossible in Rust, when a value can be "something" or "nothing" it's represented by the Option type. This type is an enum (note in Rust, enum can contain values), that is either None or Some(value). In order to get to the value, the developer is forced to treat the possibility of the absence of value.

Exceptions throwing also don't exist in Rust, any function that can fail returns a Result type that is also an enum which can be either Err(some error) or Ok(some value). Again, the developer is forced to handle the possibility of an error.

Note, the developer can also convert an Option to Result or a Result to an Option easily.

Rust is also known for its performance. Rust is really fast and unlike C which is also very fast is memory-safe. Since it doesn't have a garbage collector, it also gains a lot of performance on large systems, which is why Discord has decided to replace their old programming language by Rust for some parts of its infrastructure.

Language specificities

Every value in Rust has a defined owner, the owner is the function that currently uses the value. When a function A passes one of its value to a function B, the function A cannot access the value anymore. To still access the value, it needs to either use a reference or use a smart pointer to share the ownership of a value (it's like malloc in C).

Every value also has a defined scope, usually the scope is defined by the curly braces, so if a variable is in an if statement, a loop or a function, it will be freed once quitting the matching curly brace. The lifetime of a value represents the scope for which a reference to a value is valid.

Sometimes the compiler cannot tell what the value of a structure or function return should be. For a function taking arguments A and B, should the returning value be of the same lifetime of A or the lifetime of B? In such cases, the compiler asks to annotate the lifetime explicitly to avoid ambiguity.

Use cases

Rust can be used for any purpose, but it's particularly efficient for low-level work as a safe alternative to C or for anything demanding very high performance and reliability.

Rust is for example used in critical infrastructure such as the Linux Kernel or Android core libraries. It is also used by many companies with large infrastructures such as Google, Meta, Amazon, Microsoft, Discord, Dropbox, etc.

Sources

The official Rust documentation "The Rust Book"

Wikipedia article about Rust

Article on the Google Security Blog explaining why they started using Rust to build core Android modules

YouTube playlist of videos presenting various aspects of Rust, including a lot that aren't covered in this instroduction

Gemini protocol

Description

Gemini is an application-layer internet protocol for accessing remote documents (mostly gemtext files). It's meant to be a middle ground between HTTPS and Gopher. HTTPS is the protocol used on the web today, it's very secure but complex and tends to become heavier as time passes.

The fact websites and software in general become heavier and complex with time is called the Wirth's law. This principle describes that software becomes more complex, and more resource-intensive, faster than hardware becomes better.

Gopher is an old web protocol meant for small internet connections. Unlike HTTP, it doesn't allow embedding images or videos, adding scripts or complex markup. However, Gopher is insecure because it was created at a time when online security wasn't what it is today with standards such as TLS.

Gemini is a middle ground it has a very simple markup, like Gopher but has the security of the modern web, like HTTPS.

On Gemini websites are called "capsules"

Protocol specification

Gemini is a very lightweight protocol which consists of mostly distributing gemtext files, just like HTTP primarily shares HTML files. The gemtext markup is based on a simplification of Markdown which supports quote blocks, simple lists, headlines, preformatted text, paragraphs and links. However, like in Gopher there can only be one link per line, unlike Markdown or HTML where links can be within a paragraph.

Requests in Gemini are as simple as sending the URL of the resource to the server via TLS on the port 1965 of the server. The responses contain a status code, a meta-information (for error messages, redirections and mime type) and the content body.

Goals

The goal of the Gemini protocol is to provide a simpler, human scale, distraction-free and privacy protecting internet. Where creating and viewing content becomes easier as the looks will be the same everywhere, where there is no popups or advertisement, and where the content is the only thing that matters.

Gemini puts an emphasis on textual content, anything other than that is secondary, and any large file is discouraged by the Gemini developers.

Finding content

Finding content on Gemini is done via directories (collection of links), aggregators (who combine the posts of all the users) or search engines. Many capsules include a personal list of favorite Gemini capsules, so it's possible to explore more capsules once we found one.

Following content

For following content on Gemini, it can be done using feeds such as Atom or RSS. Atom feeds are improvements made on RSS feed in 2005. Unlike RSS, they are more standardized and support more features such as multilingual feeds, better date formats, etc.

The feeds can then be followed or automatically downloaded by the viewers or fed into aggregators for creating a sort of "public timeline" of posts. Those aggregators work by simply requesting the feed file from time to time and checking if anything was modified.

Creating content

For creating content on Gemini, the only thing required is a host and a Gemini server. Since the protocol is very simple, there is a vast number of hosts and servers available. Once a host is found, the user needs to create a file named "index.gmi" in the content directory. This file will be made using gemtext markup (a very simple markup language) and will be the entry point of their capsules. From there, people can then organize their capsule as they want it.

Writers might also have to create a feed if they want people to be able to follow them, either through their own client or through a public aggregator.

Sources

Quickstart guide about gemini

Gemini protocol's official website

Gemini's protocol Wikipedia page

Wirth's law Wikipedia page

BitTorrent & Kademlia hash table

Description

BitTorrent is a peer-to-peer communication protocol to distribute data in a decentralized manner. It's mostly intended for sharing large files such as operating systems, videos, software, archives of books, etc.

In BitTorrent, files are downloaded from several "seeders". Seeders include the original uploader of the file, as well as all the people who downloaded the file afterward. Once the data downloaded, the downloader (also called leecher) then becomes a seeder.

This way of working means that the more a file is downloaded, the faster it will be downloaded in the future. Unlike with HTTP servers, the more a file is downloaded, the slower it becomes because the server gets overloaded.

How it works

To identify a file in BitTorrent, it is identified by a hash of the file called the "infohash". The infohash is the unique identifier of the resource to be downloaded or shared.

In order to get the list of seeders, that means the list of computers who have the file we're looking for, the leecher has to ask one or several trackers or use the Mainline DHT.

Trackers are special entities in the BitTorrent network. They are servers who keep track of which computer has what file. That way, they can give the addresses of the computers who have a requested file. The trackers have the advantage to be very fast, but also have the disadvantage of being points of failure for the network.

The Mainline DHT consists of sharing the information of which computer has which file to all the computers. So each computer has a part of the information and can communicate to the other computers to have the rest of the information. While this system is slower, it's a lot more resilient than trackers.

How the DHT works

The Mainline DHT is based on another DHT called Kademlia. In this implementation, each computer has a big random generated number as a unique identifier. It then asks a "bootstrap node", which is the initial node of the network the client knows beforehand for getting more nodes with number close to its own identifier.

The BitTorrent client software will then store the information of those other nodes in such a way to remember more information about the nodes with identifiers close to its own than the ones far away. It will then repeat the process to complete its database of peers.

Then, when someone seeds a file, the information of who has that file is sent into the DHT. The infohash of the file is converted to a number, and the computer who has en identifier closest to that number gets to store that information.

When a computer then wants to know who has that file, it asks the closest peer they know to the hash, which will then asks the closest one they know to get the information. Then the BitTorrent client will ask the seeders for the file and the transfer will start.

This DHT system is built in such a way that each node knows more about its neighbors (as identifier numbers) than about nodes far away, in order to have an approximate map of all the network without having to hold too much data either.

Sources

BitTorrent's Wikipedia page

Kademlia's Wikipedia page

Relatively short video explanation of the Kademlia hash table

Longer video explanation of the Kademlia hash table

Git and Plaintext

Description

Git is a distributed version system that tracks changes in computer files (mostly plain text based ones) and is used a lot in software development for managing the source code of applications.

A version system is a system that keeps track of all the history and parallel work in a project. Git is "distributed" because everybody working on the project has a full copy of that project, thus the project isn't centralized on one single server. So if the server goes down, everybody having a copy of the project will be able to still work on it.

When working with Git, each member has to first download (clone) the project on their local machine. Then they can create a new branch for their work, start working on it. And for each change, add the files they've modified and give a description of the changes. They can then push the changes to a "remote", remotes are servers who host the Git projects in order for everyone to be able to stay up-to-date.

Once the feature is finished, the branch has to be merged in the new one. This can either be done manually or true pull requests, where other members of the project will then be able to review the code. Once everything has been verified, the code is then merged into the main branch and everyone can then update their project by merging the newly updated branch into their branch.

Use cases

Git can be used and works best for anything based on plaintext documents. It's mostly used for writing software, writing documentation for technical projects, but some people also use it for building wikis, books, or other artistic works.

Git forges

The websites hosting Git project often come with a lot of additional features that help the users with project management. This can be by including a sort of forum called "issues" where collaborators can talk about different aspects of the software and organize themselves. Often the git forges also include a "merge request" or "pull request" feature where code can be reviewed before being officially merged into the project.

Most Git forges also include a wiki, mostly used to write documentation alongside the project without hassle.

Continuous integration

Some more advanced Git forge also include option for adding continuous integration. Continuous integration is an automatic process triggered by Git events such as pushing (sending changes to the server), merging (merging branch together) or tagging (marking code as ready). Those automatic actions usually are automatic tests of the code's quality and reliability, but can also be automatic building and upload of the source code or automatic deployment of the software, for example in the case of websites.

Universality

Plain text files are encoded in UTF-8 format which is a universally understood format without even needing any software installed, unlike with Microsoft Word or LibreOffice Writer which requires special odt or docx encoding to be understood.

Most source code is written as plain text, but many other media can be written as plain text. A lot of documentation and other files are written in plaintext using formats such as Markdown, which is a lightweight, very easy markup language or for more complex rendering, Typst or latex. All these formats can be written first in plaintext and then be converted into other formats such as webpages, PDF documents, etc.

Those formats also have the advantage of being very reliable, since they can be understood with close to no software at all.

Sources

Git's Wikipedia article

Video talking about the use of Git and plaintext

Wikipedia page about distributed version control

Wikipedia page about continuous integration

Wikipedia page about code forges

Wikipedia page about Markdown markup language

Wikipedia page about UTF-8 encoding