💾 Archived View for danq.me › posts › far-side-new-stuff-xpath captured on 2024-05-12 at 15:25:09. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-09-28)
-=-=-=-=-=-=-
2023-03-29
I got some great feedback to yesterday's post about using FreshRSS + XPath to subscribe to Forward, including helpful comments from FreshRSS developer Alexandre Alapetite and from somebody who appreciated it and my Far Side "Daily Dose" recipe and wondered if it was possible to get the new Far Side content in FreshRSS too.
Wait, there's new Far Side content? Yup: it turns out Gary Larson's dusted off his pen and started drawing again. That's awesome! But the last thing I want is to have to go to the website once every few... what: days? weeks? months? He's not syndicated any more so he's not got a deadline to work to! If only there were some way to have my feed reader, y'know, do it for me and let me know whenever he draws something new.
Screenshot showing new content from The Far Side in my FreshRSS reader.
Here's my setup for getting Larson's new funnies right where I want them:
This isn't a valid address for any of the new stuff, but always seems to redirect to somewhere that is, so that's nice.
Turns out all the "recent" new stuff gets loaded in the HTML and then JavaScript turns it into a slider etc.; some of the CSS classes change when the JavaScript runs so I needed to View Source rather than use my browser's inspector to find everything.
Ugh. The easiest place I could find a "clean" comic ID number was in a data- attribute of the "share" button, where it's presumably used for engagement tracking. Still, whatever works right?
When Larson captions a comic, the caption is important.
The URLs work as direct links to the content, and because they're unique, they make a reasonable unique ID too (so long as their numbering scheme is internally-consistent, this should stop a re-run of new content popping up in your feed reader if the same comic comes around again).
The Far Side uses Referer: headers as an anti-hotlinking measure, which prevents us easily loading the images directly in an RSS reader. I use this tiny PHP script as a proxy to mitigate that. If you don't have such a proxy set up, you could simply omit the "Item thumbnail" and "Item content" fields and click the link to go to the original page.
The date is spread through two separate text nodes, so we get the content of their wrapper and use normalize-space to tidy the whitespace up. The date format then looks like "Wednesday, March 29, 2023", which we can parse using a custom date/time format string:
I promise I'll stop writing about how awesome FreshRSS + XPath is someday. Today isn't that day.
Meanwhile: if you used to use a feed reader but gave up when the Web started to become hostile to them and big social media systems started to wall you in, you should really consider picking one up again. The stuff I write about is complex edge-cases that most folks don't need to think about in order to benefit from RSS... but it's super convenient to have the things you care about online (news, blogs, social media, videos, newsletters, comics, search trends...) collated and sorted for you... without interference from algorithms that want to push "sticky" content, without invasive tracking or advertisements (or cookie banners or privacy popups), without something "disappearing" simply because you put off reading it for a few days.
My blog post: Subscribing to Forward using FreshRSS's XPath scraping
My blog post about subscribing to The Far Side "Daily Dose" in FreshRSS using XPath scraping
My PHP proxy script for adding Referer: headers to bypass anti-hotlinking code