💾 Archived View for gamma.lyk.so › systems › food › scripts › scraping-foodista › recipe-urls.sh captured on 2024-06-16 at 12:30:13.

View Raw

More Information

⬅️ Previous capture (2023-07-22)

-=-=-=-=-=-=-

#!/usr/bin/env sh

set -e

# The "pause" indicates how many seconds to wait between pages.
# Pages are 0-indexed. To start from the beginning, pass "0" as the start.
# The "end" is exclusive. To pull through page 282, pass "282" as the end.
# (Page 282 is at index 281.)

[ "$3" ] || { >&2 echo "usage: $0 <pause> <start> <end>" && exit 1; }

pause="$1"
page="$2"

while [ "$page" != "$3" ]; do
  # only show diagnostic output if this is an interactive terminal
  [ ! -t 1 ] || echo "Fetching page $(expr $page + 1)..."

  curl -s "https://www.foodista.com/browse/recipes?page=$page" \
  | grep -oP '<a href="/recipe/\K[^\"]+' \
  | sed 's|^|https://www.foodista.com/recipe/|'

  sleep $pause

  page=$(expr $page + 1)
done