💾 Archived View for d.moonfire.us › blog › 2024 › 10 › 25 › recovering-seaweedfs captured on 2024-12-17 at 10:12:43. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Recovering SeaweedFS

Up a Level

Lately, I've been quite fond of [[SeaweedFS]]. It isn't as powerful as [[Ceph]] but it considerably easier to maintain and manage. There are some tradeoffs, such as finding bit rotting (when the disks start to fail), but I find it not quite as “fragile” when it comes to using a random collection of Linux machines.

One of the features I want to play with SeaweedFS is the ability to upload a directory transparently to a S3 bucket (not AWS though, they are too big). I'm thinking about that for later, when I want to make an extra, off-site back up critical files including Partner's photo shoots.

Overfilling

Last week, I worked on one of the tasks I've been stalling on: archiving my dad's artwork. He had a lot of copies of nearly identical files and I didn't have the working storage on my laptop. I figured I had this huge (22 TB, though mostly full) cluster, I could use that.

Yeah… not the best of ideas.

I didn't realize I had made a mistake until everything started to fail because all of the nodes were at 98% or more full and the system couldn't replicate even the replication logs. I didn't even realize that until Partner said [[Plex]] was down.

Well, with replication down, I couldn't even use the weed shell to remove a file. When I did that, it just hung for hours.

$ weed-shell
> rm -rf in/dad-pictures

Nix Shell Scripts

Above, I use `weed-shell`. This is a custom script I generate with [[NixOS]] that is installed in any server that can talk to my SeaweedFS.

inputs:
let
  shellScript = (
    pkgs.writeShellScriptBin "weed-shell" ''
      weed shell -filer fs.local:8888 -master fs.local:9333 "$@"
    ''
  );
in
{
  environment.systemPackages = [ shellScript ];
}

This lets me handle common functions I use when maintaining things. In this case, I don't have to enter the common parameters needed to talk to my SeaweedFS cluster.

Cleaning Up

I tried a bunch of things, such as trying to force a more extreme of vacuuming (cleaning deleted files):

> volume.vacuum --help
Usage of volume.vacuum:
  -collection string
    	vacuum this collection
  -garbageThreshold float
    	vacuum when garbage is more than this limit (default 0.3)
  -volumeId uint
    	the volume id
> volume.vacuum -garbageThreshold 0.1

This didn't help as much as I hoped, but it did allow some replication and some commands to go through. I needed to clear up a lot more space so I could remove files properly and do a wholesale `rm -rf` to blow away father's files and try again later once I get some more space.

Replication

I have my volumes set to `010` replication. These are three numbers as data center, rack, and host.

If I ever got a friend where I could set up a local server, I would consider setting up a second “data center” to have an off-site backup. That probably would require [[Tailscale]] but that's beyond my current scope.

Volumes

SeaweedFS basically creates multiple 30 GB files which act as a blob with multiple files inside it. That way, the problems with thousands of small files aren't an issue since everything is done on the 30 GB files called “volumes”.

Replication is done at the volume level, which means I was able to turn off replication for a series of volumes.

> lock
> volume.configure.replication -replication 000 -volumeId 1
> volume.configure.replication -replication 000 -volumeId 2
> volume.fix.replication
> volume.balance -force
> unlock

The `lock` and `unlock` are important when making changes like this, they prevent some critical operations from corrupting the cluster. The commands will tell you when it is needed.

`volume.configure.replication` basically changed those volumes to no replication (risky). Once that is done, `volume.fix.replication` and `volume.balance -force` deletes the excessive copies and shuffle things around, giving me some breathing room to get replication running again so I can mass delete files.

When I'm done, I just go and change all the nodes back to `-replication 010` to give me the second backup.

Data Hoarding

The problem ultimate is data hoarding. Both my father and I both have multiple copies of files running around. It isn't great, but when you don't have time to clean out a copy of a laptop dying, it is sometimes easier to `rsync` the entire laptop into a directory of the new machine and then move on.

In this case, I needed to do some trimming of the duplicates from his files. The script is based on the one from a StackOverflow answer[1]:

1: https://stackoverflow.com/a/19552048

find . -not -empty -type f -printf "%s\n" \
    | sort -rn \
    | uniq -d \
    | xargs -I{} -n1 find . -type f -size {}c -print0 \
    | xargs -0 sha256sum \
    | sort \
    | uniq -w32 --all-repeated=separate

The output is pretty simple because it only lists duplicates and the paths to find them.

$ echo one > a.txt
$ echo one > b.txt
$ echo two > c.txt
$ echo two > d.txt
$ echo three > e.txt
$ find . -not -empty -type f -printf "%s\n" \
    | sort -rn \
    | uniq -d \
    | xargs -I{} -n1 find . -type f -size {}c -print0 \
    | xargs -0 sha256sum \
    | sort \
    | uniq -w32 --all-repeated=separate
27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a  ./c.txt
27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a  ./d.txt

2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806  ./a.txt
2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806  ./b.txt
$

With this, I can find the duplicates in my own system and delete them to clear out a few terabytes worth of data. It takes time, but I haven't done it before so I pointed it at `/mnt/seaweed` and let it run.

Once that is done, I can turn replication back on, fix replication, rebalance, and I should be good to go.

Forward Steps

I knew I was running out of storage for a while now, so I blew my monthly budget and got the fourth server ordered. This one has three 4 TB NVMe sticks (about 11 TB which will be added to the cluster) and should give me enough room to get my dad's files collected, deduplicated, and then look into uploading them to a cheap S3 storage ([[Backblaze]]) for later.

Metadata

Categories:

Development

Tags:

Backblaze

Ceph

NixOS

Plex

SeaweedFS

Tailscale

Footer

Below are various useful links within this site and to related sites (not all have been converted over to Gemini).

Now

Contact

Biography

Bibliography

Support

Fiction

Fedran

Coding

The Moonfires

Categories

Tags

Privacy

Colophon

License

Mailing List

https://d.moonfire.us/blog/2024/10/25/recovering-seaweedfs/