💾 Archived View for perso.pw › blog › articles › secure-backups-with-s3.gmi captured on 2024-12-17 at 10:10:10. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
NIL# Introduction
In this blog post, you will learn how to make secure backups using Restic and a S3 compatible object storage.
Backups are incredibly important, you may lose important files that only existed on your computer, you may lose access to some encrypted accounts or drives, when you need backups, you need them to be reliable and secure.
There are two methods to handle backups:
Both workflows have pros and cons. The pull backups are not encrypted, and a single central server owns everything, this is rather bad from a security point of view. While push backups handle all encryption and accesses to the system where it runs, an attacker could destroy the backup using the backup tool.
I will explain how to leverage S3 features to protect your backups from an attacker.
S3 is the name of an AWS service used for Object Storage. Basically, it is a huge key-value store in which you can put data and retrieve it, there are very little metadata associated with an object. Objects are all stored in a "bucket", they have a path, and you can organize the bucket with directories and subdirectories.
Buckets can be encrypted, which is an important feature if you do not want your S3 provider to be able to access your data, however most backup tools already encrypt their repository, so it is not really useful to add encryption to the bucket. I will not explain how to use encryption in the bucket in this guide, although you can enable it if you want. Using encryption requires more secrets to store outside of the backup system if you want to restore, and it does not provide real benefits because the repository is already encrypted.
S3 was designed to be highly efficient for retrieving / storage data, but it is not a competitor to POSIX file systems. A bucket can be public or private, you can host your website in a public bucket (and it is rather common!). A bucket has permissions associated to it, you certainly do not want to allow random people to put files in your public bucket (or list the files), but you need to be able to do so.
The protocol designed around S3 was reused for what we call "S3-compatible" services on which you can directly plug any "S3-compatible" client, so you are not stuck with AWS.
This blog post exists because I wanted to share a cool S3 feature (not really S3 specific, but almost everyone implemented this feature) that goes well with backups: a bucket can be versioned. So, every change happening on a bucket can be reverted. Now, think about an attacker escalating to root privileges, they can access the backup repository and delete all the files there, then destroy the server. With a backup on a versioned S3 storage, you could revert your bucket just before the deletion happened and recover your backup. In order to prevent this, the attacker should also get access to the S3 storage credentials, which is different from the credentials required to use the bucket.
Finally, restic supports S3 as a backend, and this is what we want.
There is a list of open source and free S3-compatible storage, I played with them all, and they have different goals and purposes, they all worked well enough for me:
A quick note about those:
You need to pick a S3 provider, you can self-host it or use a paid service, it is up to you. I like backblaze as it is super cheap, with $6/TB/month, but I also have a local minio instance for some needs.
Create a bucket, enable the versioning on it and define the data retention, for the current scenario I think a few days is enough.
Create an application key for your restic client with the following permissions: "GetObject", "PutObject", "DeleteObject", "GetBucketLocation", "ListBucket", the names can change, but it needs to be able to put/delete/list data in the bucket (and only this bucket!). After this process done, you will get a pair of values: an identifier and a secret key
Now, you will have to provide the following environment variables to restic when it runs:
If you want a simple script to backup some directories, and remove old data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly backups:
restic backup -x /home /etc /root /var restic forget --prune -H 5 -d 2 -w 2 -m 2
Do not forget to run `restic init` the first time, to initialize the restic repository.
I really like this backup system as it is cheap, very efficient and provides a fallback in case of a problem with the repository (mistakes happen, there is not always need for an attacker to lose data ^_^').
If you do not want to use S3 backends, you need to know Borg backup and Restic both support an "append-only" method, which prevents an attacker from doing damages or even read the backup, but I always found the use to be hard, and you need to have another system to do the prune/cleanup on a regular basis.
This approach could work on any backend supporting snapshots, like BTRFS or ZFS. If you can recover the backup repository to a previous point in time, you will be able to access to the working backup repository.
You could also do a backup of the backup repository, on the backend side, but you would waste a lot of disk space.