2020-01-20 How to deal with weird stuff in feed aggregation

Planet Jupiter is entering the phase where all the features I want have been implemented and now I’m only dealing with weird stuff and workarounds. I’ve seen blog posts containing double escaped HTML. I unescape the first layer and turn the post to plain text (adding pilcrow signs for paragraph breaks) and so on. But if somebody adds double escaped HTML, that still leaves escaped HTML in the output. I’m guessing this is a bug in their Wordpress plugins responsible for indie web support or something. But what am I going to do: tell them to fix their setup? Or add a workaround that simply tries to unscape HTML a second time? Guess what I’m doing...

I had to consider other things as well: what about elements that aren’t being rendered by browser? They also shouldn’t be “rendered” by the feed aggregator. One feed, for example, contains a style element. The style element contains CSS instructions ­– this should not end up in the page excerpt. Anything else I need to remove from the excerpt?

​#Blogs ​#Jupiter

Comments

(Please contact me if you want to remove your comment.)

Blocking CSS (and more) is easy if you use CSP: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy

– Ashwin 2020-04-21 00:40 UTC

Ashwin

---

Good point.

– Alex Schroeder 2020-04-21 06:05 UTC