💾 Archived View for gamma.lyk.so › systems › food › scripts › scraping-foodista › recipe-urls.sh captured on 2024-06-16 at 12:30:13.
⬅️ Previous capture (2023-07-22)
-=-=-=-=-=-=-
#!/usr/bin/env sh set -e # The "pause" indicates how many seconds to wait between pages. # Pages are 0-indexed. To start from the beginning, pass "0" as the start. # The "end" is exclusive. To pull through page 282, pass "282" as the end. # (Page 282 is at index 281.) [ "$3" ] || { >&2 echo "usage: $0 <pause> <start> <end>" && exit 1; } pause="$1" page="$2" while [ "$page" != "$3" ]; do # only show diagnostic output if this is an interactive terminal [ ! -t 1 ] || echo "Fetching page $(expr $page + 1)..." curl -s "https://www.foodista.com/browse/recipes?page=$page" \ | grep -oP '<a href="/recipe/\K[^\"]+' \ | sed 's|^|https://www.foodista.com/recipe/|' sleep $pause page=$(expr $page + 1) done