Lunch break. What can I do?
I would prefer to use torrent files. Let’s see.
Firstly you do need to start with the Advanced Search form. Using the second form on that page, in the query box put collection:georgeblood, select the identifier field (only), set the format to CSV. Set the limit to 30000 (there are about 25000+ records), and download the huge CSV… – Downloading all the 78rpm rips at the Internet Archive
Downloading all the 78rpm rips at the Internet Archive
If you read the page, you’ll find that there are a lot more these days. Use an appropriate number.
$ wc -l search.csv 200001 search.csv
Let’s turn these into a list of URLs of the format `https://archive.org/download/<id>/<id>_archive.torrent` and skipping the header.
tail --lines=+2 search.csv \ | sed --expression='s/^.\(.*\).$/https:\/\/archive.org\/download\/\1\/\1_archive.torrent/' \ > torrents.txt
Ready to download all the torrent files?
wget --quiet --show-progress --continue --execute robots=off --input-file=torrents.txt
It’s going to take a long time. I stopped after a few thousand files. You can always come back later and get more.
There are so many torrent files, we need a torrent daemon to manage it all, and a Perl script to do it…
Ok, daemon first.
Install the Transmission daemon, make sure it doesn’t run, edit settings and change the password, start it again.
sudo apt install transmission-daemon sudo systemctl stop transmission-daemon $EDITOR /etc/transmission-daemon/settings.json sudo systemctl start transmission-daemon
Check it out via your web browser at localhost:9091. You should see a web user interface to Transmission.
By default the files end up in `/var/lib/transmission-daemon/downloads` (as seen in the `settings.json` file).
Create a Perl script called `add-torrents.pl` with the following code.
#!/usr/bin/env perl use Modern::Perl; use File::Slurper qw(read_dir); my $pwd = shift or die "No password provided\n"; my @list = qx(transmission-remote --auth=transmission:$pwd --torrent all --list); shift @list; # drop header pop @list; # drop footer my %states; my %names; my @stopped; for my $line (@list) { chomp($line); my $status = substr($line, 57,13); $status =~ s/\s+//g; $states{$status}++; my $name = substr($line, 70); $names{$name} = 1; push(@stopped, $1) if $status eq 'Stopped' and $line =~ /(\d+)/; } if ($states{'Queued'}) { say "There are $states{'Queued'} queued torrents, try again later?"; exit; } if (not $states{'Stopped'}) { my $limit = 100; for my $file (read_dir('.')) { next unless $file =~ /\.torrent$/; my $name = substr($file, 0, -8); $name =~ s/_archive$//; # some torrent files have this attached? next if $names{$name}; say "Adding $name.torrent"; qx(transmission-remote --auth=transmission:$pwd --start-paused --add $file); last if $limit-- <= 0; } # wait for the daemon to list them say "Giving the daemon some time..."; sleep(10); # need to find their ids! @list = qx(transmission-remote --auth=transmission:$pwd --torrent all --list); shift @list; # drop header pop @list; # drop footer for my $line (@list) { chomp($line); my $status = substr($line, 57,13); $status =~ s/\s+//g; push(@stopped, $1) if $status eq 'Stopped' and $line =~ /(\d+)/; } } say "Looking at " . scalar(@stopped) . " paused torrents."; for my $id (@stopped) { my $n = 0; # don't get all the files qx(transmission-remote --auth=transmission:$pwd --torrent $id --no-get all); # find the mp3 files that don't start with an underscore and get those for (qx(transmission-remote --auth=transmission:$pwd --torrent $id --files)) { chomp; next if /\/_/; next unless /\.mp3$/; next unless /^ *(\d+)/; qx(transmission-remote --auth=transmission:$pwd --torrent $id --get $1); $n++; } qx(transmission-remote --auth=transmission:$pwd --torrent $id --start); say "Started torrent $id with $n files"; }
You run it with the password:
perl add-torrents.pl *secret*
Here’s what it does.
It creates a list of all the torrents you already added to the daemon and notes their state. We are interested in the stopped torrents because those are the ones we want to start. We’re also interested in the queued torrents. If there are queued torrents, we don’t want to do anything. The problem is that if I start too many torrents at once, I’m getting errors about there being too many open files. Yikes.
If you get into that state, I’m not sure what to do. Shutting down the daemon, starting it up again, something like that? Stopping all the torrents and running the script again?
If there are no stopped and queued torrents, we go through the current directory, looking for torrent files, and if we find one that wasn’t listed by the daemon, we add it in “paused” state, i.e. stopped. If we don’t do that, you’ll get all the FLAC and WAV files, too. Your poor disk.
So now we have a bunch of stopped torrents. The script then goes through the torrents one by one, telling the daemon not to get any files, then it lists their files, finding the good MP3 files and tells the daemon to get just those MP3 files (usually one per torrent). And then it starts the torrent.
You can call this script regularly.
You might be wondering: How do we find the “good” MP3 files?
The preferred version suggested by an audio engineer at George Blood, L.P. is the equalized version recorded with the 3.5 mil truncated eliptical stylus, and has been copied to have the more friendly filename.
I think the filenames you don’t want start with an underscore, so all the Perl script does is ignore those.
Good luck!
If you’re anything like me, you’ll get quite experienced at running `transmission-remote` from the command line because of all the ways this can mess up. I’m trying to keep the Perl script up to day to help my future self, but working with so many files all at once is a pain.
On my system, the transmission daemon saves the files in `/var/lib/transmission-daemon/downloads`.
While the transmission-daemon continues running, you don’t want to move the downloaded files away. At the same time, you can’t listen to them because they don’t belong to your username. My solution is to create a hard link in my own music directory, pointing to the same file, but with myself being the owner. The only problem when doing that is that there are so many files, you have to use `find` to run anything on all of them.
mkdir -p ~/Music/78rpm/ cd ~/Music/78rpm/ find /var/lib/transmission-daemon/downloads \ -name '*.mp3' \ -links 1 \ -exec echo sudo ln -f '{}' . ';' env user=$(id -un) group=$(id -gn) find . \ -name '*.mp3' \ -user debian-transmission \ -exec sudo chown $user:$group '{}' ';'
I don’t quite understand how this is possible, but whenever I check the downloads directory, there are always a bunch of PNG files or partial FLAC files that also got downloaded. How annoying. This is how I clean out the garbage, every now and then. Also useful if you’re tinkering with the script and make mistakes.
If you like what you see, remove the ’echo’ and rerun to actually do it, or pipe into a shell.
sudo find /var/lib/transmission-daemon/downloads -type f -not -name '*.mp3' -exec echo rm '{}' ';' sudo find /var/lib/transmission-daemon/downloads -type f -name '_*' -exec echo rm '{}' ';'
Related:
2020-12-15 The David W. Niven Collection of Early Jazz Legends, 1921-1991
#Music