My BTRFS cheatsheet

Comment on Mastodon

Introduction

I recently switched my home "NAS" (single disk!) to BTRFS, it's a different ecosystem with many features and commands, so I had to write a bit about it to remember the various possibilities...

BTRFS is an advanced file-system supported in Linux, it's somehow comparable to ZFS.

Layout

A BTRFS file-system can be made of multiple disks and aggregated in mirror or "concatenated", it can be split into subvolumes which may have specific settings.

Snapshots and quotas are applying on subvolumes, so it's important to think beforehand when creating BTRFS subvolumes, one may want to use a subvolume for /home and for /var for most cases.

Snapshots / Clones

It's possible to take an instant snapshot of a subvolume, this can be used as a backup. Snapshots can be browsed like any other directory. They exist in two flavors: read-only and writable. ZFS users will recognize writable snapshots as "clones" and read-only as regular ZFS snapshots.

Snapshots are an effective way to make a backup and rolling back changes in a second.

Send / Receive

Raw filesystem can be sent / receive over network (or anything supporting a pipe) to allow incremental differences backup. This is a very effective way to do incremental backups without having to scan the entire file-system each time you run your backup.

Deduplication

I covered deduplication with bees, but one can also use the program "duperemove" (works on XFS too!). They work a bit differently, but in the end they have the same purpose. Bees operates on the whole BTRFS file-system, duperemove operates on files, it's different use cases.

duperemove GitHub project page

Bees GitHub project page

Compression

BTRFS supports on-the-fly compression per subvolume, meaning the content of each file is stored compressed, and decompressed on demand. Depending on the files, this can result in better performance because you would store less content on the disk, and it's less likely to be I/O bound, but also improve storage efficiency. This is really content dependent, you can't compress binary files like pictures/videos/music, but if you have a lot of text and sources files, you can achieve great ratios.

From my experience, compression is always helpful for a regular user workload, and newer algorithm are smart enough to not compress binary data that wouldn't yield any benefit.

There is a program named compsize that reports compression statistics for a file/directory. It's very handy to know if the compression is beneficial and to which extent.

compsize GitHub project page

Defragmentation

Fragmentation is a real thing and not specific to Windows, it matters a lot for mechanical hard drive but not really for SSDs.

Fragmentation happens when you create files on your file-system, and delete them: this happens very often due to cache directories, updates and regular operations on a live file-system.

When you delete a file, this creates a "hole" of free space, after some time, you may want to gather all these small parts of free space to have big chunks of free space, this matters for mechanical disks has the physical location of data is tied to the raw performance. The defragmentation process is just physically reorganizing data to order files chunks and free space into continuous blocks.

Defragmentation can be used to force compression in a subvolume, like if you want to change the compression algorithm or enabled compression after saving the files.

The command line is: btrfs filesystem defragment

Scrubbing

The scrubbing feature is one of the most valuable feature provided by BTRFS and ZFS. Each file in these file-system is associated with its checksum in some metadata index, this mean you can actually check each file integrity by comparing its current content with the checksum known in the index.

Scrubbing costs a lot of I/O and CPU because you need to compute the checksum of each file, but it's a guarantee for validating the stored data. In case of a corrupted file, if the file-system is composed of multiple disks (raid1 / raid5), it can be repaired from mirrored copies, it should work most of the time because such file corruption is often related to the drive itself, thus other drives shouldn't be affected.

Scrubbing can be started / paused / resumed, this is handy if you need to operate heavy I/O and you don't want the scrubbing process to increase time. While the scrub commands can take a device or a path, the path parameter is only used to find the related file-system, it won't just scrub the files in that directory.

The command line is: btrfs scrub

Rebalancing

When you are aggregating multiple disks into one BTRFS file-system, files are written on a disk and some other files are written to the other, after a while, a disk may contain more data than the other.

The rebalancing purpose is to redistribute data across the disks more evenly.

Swap file

You can't create a swap file on a BTRFS disk without a tweak. You must create the file in a directory with the special attribute "no COW" using "chattr +C /tmp/some_directory", then you can move it anywhere as it will inherit the "no COW" flag.

If you try to use a swap file with COW enabled on it, swapon will report a weird error, but you get more details in the dmesg output.

Converting

It's possible to convert a ext2/3/4 file-system into BTRFS, obviously it must not be currently in use. The process can be rolled back until a certain point like defragmenting or rebalancing.