💾 Archived View for danq.me › posts › bbc-news-without-the-crap captured on 2024-03-21 at 14:45:02. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

BBC News... without the crap

2024-03-09

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

My very recent blog post about how RSS is better than ActivityPub

My blog post about using RSS for joy, and not persuing "RSS Zero"

My 2021 blog note about starting and ending my days in FreshRSS

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn't care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

Screenshot of what annoys me

But lately - presumably as a result of technical changes at the Beeb's side - this feed has found two fresh ways to annoy me:

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let's look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
 source 'https://rubygems.org'
 gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }

It's amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it's sports news or from BBC Sounds or iPlayer, and also strips any "anchor" part of the <guid> before re-exporting the feed. Much better.

You're free to take and adapt the script to your own needs, or - if you don't mind being tied to my opinions about what should be in BBC News' RSS feed - just subscribe to my copy: link below -

Links

My earlier blog post about scripting-out sport from BBC News' RSS feed

Script on GitHub

https://fox.q-t-a.uk/bbc-news-no-sport.xml - my filtered RSS feed of what I think BBC News should look like