RSS-proxy: create an RSS/ATOM or JSON feed of almost any website

Author: ingve

Score: 198

Comments: 25

Date: 2021-11-28 15:39:40

________________________________________________________________________________

smbv wrote at 2021-11-28 19:23:43:

As a big fan of RSS (I read HN through hnrss[0]), this is wonderful. It seems to be a nice catch-all for whatever websites rss-bridge doesn't support.

If any of you are looking into RSS, check out FreshRSS[2] (an RSS aggregator). A mobile client I especially like is NetNewsWire[3] (for Apple products only, but is imo amazing)

[0]

https://hnrss.github.io/

[1]

https://github.com/RSS-Bridge/rss-bridge

[2]

https://www.freshrss.org/

[3]

https://netnewswire.com/

phil294 wrote at 2021-11-28 19:54:39:

To elaborate on this, RSS-Bridge maintains custom extraction rules for many sites [1], but RSS-proxy attempts to do that for _any_ site using some pretty nifty logic [2]. I tried it on a few pages and it seems to do its job accurately, if the HTML is good enough.

Semantic HTML5 can be so helpful, especially in regards to tools like this one, or crawlers, or generally accessibility-wise. Unfortunately, its syntax is not enforced by browsers (for good reasons), so many developers never leverage it - and we end up with many custom-built sites falling back to <div>s everywhere, JS click handlers on non-interactive elements etc. I really wish modern framework docs and guides would more strongly hint to accessibility concerns.

[1]

https://github.com/RSS-Bridge/rss-bridge/tree/master/bridges

[2]

https://github.com/damoeb/rss-proxy/blob/master/packages/cor...

stanislavb wrote at 2021-11-28 19:42:44:

Thanks! RSSHub is another option we could add to the mix

https://github.com/DIYgod/RSSHub

Also, all projects mentioned on this thread can be tracked and easily compared on LibHunt

https://www.libhunt.com/posts/500676-rss-proxy-create-an-rss...

mderazon wrote at 2021-11-28 19:45:49:

Netnewawire looks great! Any similar recommendations for Android ?

lelandfe wrote at 2021-11-28 21:34:57:

Seconding NetNewsWire. A recent version added iCloud sync so your feeds and unread statuses sync across the desktop and mobile version. Somehow it's free.

rambambram wrote at 2021-11-28 19:28:09:

Ah, I like FreshRSS, based on what I can see from it. It's written in PHP and free.

ss2f wrote at 2021-11-29 02:40:43:

Monitoring Public Facebook pages? A very painful process. Facebook requires a login to view a public pages -- and most public offices like governments post content exclusively on Facebook so as things stand, you have to have an account on Facebook to access government communication, which is plain sad.

I do not remember any good tool to access Facebook pages. If you get one (rss-hub), it breaks in a week after Facebook changes their API endpoints.

If you have one, let me know.

smusamashah wrote at 2021-11-30 03:21:16:

It doesn't work well everywhere and there isn't a way to customize its selection of divs e.g. see this thread itself

https://rssproxy-v1.migor.org/?url=https:%2F%2Fnews.ycombina...

xrd wrote at 2021-11-29 02:03:24:

Great idea, but it seems it does not work for basic static sites like those generated by SvelteKit (

http://svekyll.com

generated this site:

https://webiphany.com

). That's too bad.

pedro1976 wrote at 2021-11-29 08:28:06:

Author here, thanks for those tough nuts :) I am working on a successor of this POC called rich-rss [0], which contains a full rewrite of rss-proxy in a strongly typed language.

[0]

https://github.com/damoeb/rich-rss

lapinot wrote at 2021-11-29 08:48:11:

Well that site is a bunch of divs. Afaik the tool leverages semantic html (section, article etc).

davidpfarrell wrote at 2021-11-30 00:38:14:

Throwing

https://heartfeedrss.wordpress.com/usage/

into the ring as its been my go-to rss reader since LinkedIn's Pulse was decommissioned.

I really dig the grid-based approach for both increased content density and easy cross-feed navigation.

thirdplace_ wrote at 2021-11-28 20:11:10:

Good work there. Imagine if we had a universal feed generator that could generate a feed out of everything. No more manual checking labor.

acidburnNSA wrote at 2021-11-28 20:26:49:

RSS Hub also attempts to be exactly that:

https://docs.rsshub.app/en/

mmahemoff wrote at 2021-11-29 00:16:29:

Interesting idea, but the fact so many of the feeds are broken probably indicates how difficult this is to maintain. Ideally something that needs to be done via algos and machine learning rather than custom parsing for each site (not unlike what search engines already do internally).

smbv wrote at 2021-11-28 20:25:51:

RSS saved me the "right click on a bookmark folder -> open all" every hour or so. Does RSS increase my productivity or decrease it? On one hand, I'm not obsessively checking a bunch of websites. On the other hand, I have a firehose of content directed at my face with hundreds of feeds.

walterbell wrote at 2021-11-29 04:34:36:

Next step is local analytics to sort articles based on local-only rules/analytics/goals.

lostsoul8282 wrote at 2021-11-28 23:48:46:

This is great. Many scrapers require constantly monitoring which is a bit heavy for the scraping process. I can see this tool making it easy to monitor for changes fairly quickly. Great work!

thedougd wrote at 2021-11-29 02:24:47:

A replacement for Yahoo Pipes!

In the early Android days, I wrote a few apps that relied on web site scraping to provide their data. Even back then, these websites evaded scraping. Yahoo Pipes to the rescue. You'd set it up to scrape their site and provide a well-formed RSS feed of the data. Since these sites saw Yahoo connecting, they typically didn't block it, presumably because they thought it was search crawling.

eric4smith wrote at 2021-11-29 01:28:10:

A good way to find new pages on a site is to use the site map feed in xml or text format.

Then look for open graph tags in the page for titles descriptions, canonical urls and images.

Most modern websites have all of those these days and they are automatically updates.

Making an RSS feed reader is thus easier and less prone to errors from analyzing the markup of pages.

Neat product though.

busymom0 wrote at 2021-11-29 00:27:23:

This is quite useful, especially the self host option.

What type of websites don't work?

1vuio0pswjnm7 wrote at 2021-11-28 21:05:11:

"... almost any website."

What are the domain names of some (dynamic) websites where this will not work.

marginalia_nu wrote at 2021-11-28 21:55:48:

Only guessing, but

https://oeis.org/

shellmachine wrote at 2021-11-29 01:26:50:

Great tool, something I didn't know I need in my life. Thanks.

dirtyid wrote at 2021-11-29 05:14:33:

RSS scrapers I want: something that consolidates twitter threads like threadreaderapp into articles.