Sharing some statistics about BTRFS compression

Comment on Mastodon

Introduction

As I'm moving to Linux more and more, I took the opportunity to explore the BTRFS file system which was mostly unknown to me.

Let me share some data about compression ratio with BTRFS (ZFS should give similar results).

Work laptop

First data

This is my work computer with a big Nix store, and some build programs involving a lot of cache files and many git repositories.

Processed 3570629 files, 894690 regular extents (1836135 refs), 2366783 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       61%       55G          90G         155G
none       100%       35G          35G          52G
zlib        37%       20G          54G         102G
prealloc   100%      138M         138M          67M

The output reads that the real disk usage is 61%, so 39% of the disk compressed data. We have more details per compression algorithm about the content, `none` represents uncompressed data and `zlib` the files compressed using this algorithm.

Files compressed with zlib are down to 37% of their real size, this is not bad. I made a mistake when creating the BTRFS mount point: I used zlib compression algorithm which is quite obsolete nowadays. For history record, zlib is the library used to provide the "deflate compression algorithm" found in zip or gzip.

Let's change the compression to use zstd algorithm instead. This can be changed with the command `btrfs filesystem defrag -czstd -r /`. Basically, all files are scanned, if they can be compressed with zstd, they are rewritten on the disk with the new algorithm.

Data after switching to zstd

After 37 minutes of recompressing everything, the results are surprising. It didn't change much!

Processed 3570427 files, 928646 regular extents (1869080 refs), 2364661 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       60%       54G          90G         155G
none       100%       33G          33G          51G
zstd        37%       21G          56G         104G
prealloc   100%      138M         138M          67M

Real data usage on the disk is now 60% instead of 61% with zlib, not much of an improvement, I'd have expected zstd to perform a lot better.

However, I didn't measure compression and decompression times. zstd should perform a lot better in this area, so I'll stick with zstd.

LinuxReviews: comparison of compression algorithms

Personal computer

My own laptop has a huge Nix store, a lot of binaries files (music, pictures), a few hundreads of gigabytes of video games. I suppose it's quite a realistic and balanced environment.

Processed 1804099 files, 755845 regular extents (1295281 refs), 980697 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       93%      429G         459G         392G
none       100%      414G         414G         332G
zstd        34%       15G          45G          59G
prealloc   100%       92M          92M          91M

The saving due to compression is 30 GB, but this only count as 7% of the global file system. That's not impressive compared to the other computer, but having an extra 30 GB for free is clearly something I enjoy.