💾 Archived View for siiky.srht.site › kB45oC › pagat-archive.gmi captured on 2023-03-20 at 17:51:53. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

🚧 View Differences

-=-=-=-=-=-=-

Pagat Archive

siiky

2022/11/09

2022/11/09

en

Sent an email asking for permission to make a mirror/archive. This time it was for Pagat, a site with tons and tons of card games. And like last time, permission was given provided that I don't make any archives/mirrors public. Fair enough!

Pagat

Content-based Mirrors

My Raspberry Pi has been busy downloading the whole thing:

wget -o download.log -w 30 --random-wait --mirror -k -K -p -i links.txt

The links.txt file was generated from the sitemap.xml with this CHICKEN script:

(import srfi-1 ssax)
(let* ((sitemap (ssax:xml->sxml (current-input-port) '()))
       (entries (cdaddr sitemap))
       (urls (map (o car (cute alist-ref 'http://www.sitemaps.org/schemas/sitemap/0.9:loc <>) cdr) entries)))
  (for-each print urls))

https://www.pagat.com/sitemap.xml

Some details so far:

$ find www.pagat.com/ -type f | wc -l
2941
$ find www.pagat.com/ -type f -iname '*.html' | wc -l
1812
$ du -bchs www.pagat.com/
66M	www.pagat.com/
66M	total