2024-01-21 Download the whole podcast

Feeling evil… Assume that there's a podcast with a feed and you want to download all the episodes. Like a music show, for example.

First, get the feed, saving it as `rss.xml` because we're going to need it in the next statement.

curl -o rss.xml https://musicforprogramming.net/rss.xml

Then, get every enclosure:

for f in $(perl -ne '/<enclosure url="(.*?)"/ && print "$1\n"' rss.xml); do
  curl --continue-at - --remote-name "$f"
done

The `--continue-at -` option should make it possible to resume downloads at a later stage. `curl` will skip all the bytes already downloaded, effectively skipping the download. That's what I'm hoping for, in any case.

The `--remote-name` option saves the episode under its remote name. This is what most people would expect, I think.

You know how it goes, though: never parse HTML (or XML) using regular expressions. I'm doing it here and it is definitely a bad idea!

I'm also too lazy to install a real XML parser and doing the real thing, so it is what it is.

​#Podcast

@takeonrules@dice.camp wrote in to say that as someone "who doesn't use Perl" the have a solution using Ripgrep and xargs. I like it very much and feel like I should be using rg and xargs more often.

rg "<enclosure url=['\"]([^'\"]+)['\"]" rss.xml -r '$1' --only-matching \
  | xargs curl --continue-at - --remote-name