As I'm moving to Linux more and more, I took the opportunity to explore the BTRFS file system which was mostly unknown to me.
Let me share some data about compression ratio with BTRFS (ZFS should give similar results).
This is my work computer with a big Nix store, and some build programs involving a lot of cache files and many git repositories.
Processed 3570629 files, 894690 regular extents (1836135 refs), 2366783 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 61% 55G 90G 155G none 100% 35G 35G 52G zlib 37% 20G 54G 102G prealloc 100% 138M 138M 67M
The output reads that the real disk usage is 61%, so 39% of the disk compressed data. We have more details per compression algorithm about the content, `none` represents uncompressed data and `zlib` the files compressed using this algorithm.
Files compressed with zlib are down to 37% of their real size, this is not bad. I made a mistake when creating the BTRFS mount point: I used zlib compression algorithm which is quite obsolete nowadays. For history record, zlib is the library used to provide the "deflate compression algorithm" found in zip or gzip.
Let's change the compression to use zstd algorithm instead. This can be changed with the command `btrfs filesystem defrag -czstd -r /`. Basically, all files are scanned, if they can be compressed with zstd, they are rewritten on the disk with the new algorithm.
After 37 minutes of recompressing everything, the results are surprising. It didn't change much!
Processed 3570427 files, 928646 regular extents (1869080 refs), 2364661 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 60% 54G 90G 155G none 100% 33G 33G 51G zstd 37% 21G 56G 104G prealloc 100% 138M 138M 67M
Real data usage on the disk is now 60% instead of 61% with zlib, not much of an improvement, I'd have expected zstd to perform a lot better.
However, I didn't measure compression and decompression times. zstd should perform a lot better in this area, so I'll stick with zstd.
LinuxReviews: comparison of compression algorithms
My own laptop has a huge Nix store, a lot of binaries files (music, pictures), a few hundreads of gigabytes of video games. I suppose it's quite a realistic and balanced environment.
Processed 1804099 files, 755845 regular extents (1295281 refs), 980697 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 93% 429G 459G 392G none 100% 414G 414G 332G zstd 34% 15G 45G 59G prealloc 100% 92M 92M 91M
The saving due to compression is 30 GB, but this only count as 7% of the global file system. That's not impressive compared to the other computer, but having an extra 30 GB for free is clearly something I enjoy.