💾 Archived View for shit.cx › tech › meta › 2020-12-31-guardian-shit-cx captured on 2024-08-18 at 17:33:56. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-07-16)
-=-=-=-=-=-=-
2020-12-31T08:32
A few days ago I created a mirror of The Guardian.
Between 2016-2018 I used an HTC Desire phone running a very old version of Android that didn't support SNI¹. This was while Lets Encrypt were giving away certs and the whole HTTPS everywhere thing was in full swing. Needless to say, the mobile web wasn't a fun place for me so I mostly avoided it. During that time I discovered CNN Lite which became my main source of news, mostly because it was the only news site accessible to me. I've been reading CNN again a lot more because it's mirrored on Gemini News². What was happening in the US back when I was using that old HTC was pretty interesting but now that the Trump circus is packing up, I would prefer to read local news.
As I've become more involved in Gemini, I'm coming here instead of the web to kill time, but there are still a few sites that drag me back to the web: Hacker News and The Guardian.
I had an idea that making a fast mirror of The Guardian that had clean, consistent and complete articles wouldn't be hard and that I might even prefer it over The Guardian website. I would send the RSS feed through one of the readability³ tools like rdrview⁴ then convert it to Gemtext. After making some indexes, the job would be done.
A few hours later, with few surprises I pretty much had it made. Parsing feed.xml with xidel or pup was too awkward, so I ported it to Ruby (unbelievably for someone around here, I haven't got around to learning Go yet).
As is always the way, it isn't until I've satisfied my curiosity that I stop thinking about whether it *can* be made and begin to really consider *what* I've made.
The biggest thing now on my mind is is whether it's in the spirit of Gemini. Are 30 minutes updates too frequent? Is it too close to a never-ending stream that encourages unhealthy consumption of news? I don't know. Tell me if you have opinions.
And I've also worked out my position on how it should be used:
Should I provide an RSS/Atom feed to the articles? Nope, subscribe to the official RSS feeds.
Should I format the links so that the page is subscribable⁵? Nope, subscribe to the official RSS feeds.
How long should the articles be retained? 3 days, but may require further tuning.
Should people link to articles hosted here? Nope, it won't be found in a few days, and I won't change the policy. I would prefer you to link to www.theguardian.com or archive.org instead. They have a better track-record for running a service than shit.cx.
Will I allow robots to index the articles? Nope, the articles will likely be gone before anyone finds them.
Is it wrong to mirror The Guardian? That's a hard one. They claim (on the page that I can't find which encourages you to sign in) that their content will always be accessible, even to those whom cannot pay. I donate to them, but refuse to use their app or sign in. I think that allowing myself and perhaps others to be more engaged in their content meets their goal of creating an informed public.
What's wrong with just subscribing to the RSS feed using an RSS reader? Nothing. I should be using RSS readers more than I do.
Can I see the code? Yep, on Sourcehut⁶ and I welcome contributions.
See it at gemini://guardian.shit.cx/
⁶ guardian-mirror on Sourcehut
Oh, and happy new year everyone. I would be out partying, only NYE is my least favourite day of the year and I don't like parties. I'll chill out at home, look after a sleeping toddler and go to bed at my 9pm bedtime. Hmm. It seems that is just 8m away. I better hurry.
---
The content for this site is CC-BY-SA-4.0.