💾 Archived View for gemi.dev › gemlog › 2022-08-20-adding-aggregators.gmi captured on 2023-04-19 at 23:16:14. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

-=-=-=-=-=-=-

News Aggregators and 🧇 NewsWaffle

2022-08-22 | #newswaffle | @Acidus

I added a news aggregator to 🧇 NewsWaffle, powered by Yahoo News. You can access it here:

Current News via NewsWaffle (Powered by Yahoo News)

Stories are automatically sorted into categories, and you can switch categories.

Switch category

"News Aggregator"

The term "news aggregator" is a little overloaded. People use it mean anything from a RSS reader, to Google News, to something like Pocket or Instapaper. What I mean by "news aggregator" is a service that takes news articles from multiple sources, combines them together, and automatically sorts them into sections by category or topic. This sorting is usually done algorithmically, meaning that stories from the BBC may appear under "World" or "Technology", depending on the contents of the story.

Advanced news aggregators go beyond just grouping by category, and are able to determine when multiple news articles are about the same event. Google News is the best example of this. It will take a news story, such as upcoming premiere of 'House of the Dragon', and group together 5 news stories, all from different sources, all related to that specific event or article:

'House Of The Dragon': HBO's Largest Marketing Push Ever Valued At $100M+ Tentpole Proportions - Deadline
1. 'House Of The Dragon': HBO's Largest Marketing Push Ever Valued At $100M+ Tentpole Proportions - Deadline
2. House of the Dragon: Series Premiere Review - IGN
3. House of the Dragon is a grim remix of an already gruesome song you've heard before - The Verge
4. 'House of the Dragon' Review: Domesticating 'Game of Thrones' - The New York Times
5. House of the Dragon Review: HBO's Game of Thrones Prequel Flies High, But Does Take a While to Catch Fire - TVLine

Why?

I'm really happy with NewsWaffle. It's become the primary way I read news, whether on my phone, laptop, or desktop. I've heard from many people, via email or Station, that it's one of their favorite things on Gemini. Thank you all for these kind words.

The primary thing I did miss was having news sorted in categories. With NewsWaffle, if I wanted to read technology stories, I would have to read a news site that was specific to that category, such as Wired, or The Verge, or Six Colors. If I wasn't interesting in stories on one of these sites, I'd have to click into another. If I wanted to read World news, I would need to read a news site that focused on that type of reporting. While I have some broad categories of news sites on the home page of NewsWaffle, there really was no way to say "show me all the technology news article."

I could just solve this problem by using an RSS reader. However: 1) I didn't want to install a new app, 2) I really like NewsWaffle's streamlined reading interface, and 3) I wanted to be able to access this from anywhere I had a gemini client.

So while I still read specific sites like Al Jazeera, the New York Times, Apple Insider, and Hacker New via NewsWaffle, now I can dip into a news aggregator to more easily read specific categories of news without leaving the site.

Why Yahoo News?

Why not build this using Google News? After all, Google News is, by far, the best news aggregator out there:

In fact, many of my favorite smol web news sites, like 68k.news, are just wrappers around Google News.

68k News

The answer is I'm trying hard to remove Google services from my personal life. The biggest web-based news aggregators I know besides Google News are Bing News and Yahoo News.

Neither Bing New nor Yahoo News includes a link to an RSS or Atom feed in their markup, or on their site. So just finding the URLs to the RSS feeds is challenge. The URLs for Bing's RSS/Atom feeds that you can find online no longer work. Yahoo's feeds are a little more discoverable. Unfortunately, some of the category RSS feeds seem abandoned. For example, "Sports" is not a topic available via the news aggregator, because the Yahoo News Sports RSS feed is full of stores from 3+ years ago!

The good news is that for the most part, Yahoo News does have updated RSS feeds for the categories I do care about. And despite the gross, fluffy pop nonsense content I think of when I think of Yahoo's media content, their news aggregation actually has pretty good stuff. So Yahoo News kind of wins by default, and luckily it's pretty good too.

Future Work

Yahoo News doesn't have aggregation at the article level like Google News, which is something I think it really cool. So there are definitely fun areas to hack on here. I started looking at some papers on algorithmic classification of news content. Some are quite complex, leveraging training machine learning models. I wonder how much something like a modified version of a string distance algorithm like Levenshtein would go for article grouping. Seems like something fun.

More broadly, I wonder how difficult building a good enough news aggregator can be? NewsWaffle already has a way to extract links to news stories from arbitrary HTML, as well as discover RSS feeds... So getting a firehose of current news stories from a variety of sources isn't an issue.

Levenshtein Distance on Gemipedia