A few days ago I came across a new syndication feed format (like RSS (Really Simple Syndication) [1] or Atom [2])—JSON Feed [3]:
We — Manton Reece and Brent Simmons — have noticed that JSON (JavaScript Object Notation) has become the developers’ choice for API (Application Programming Interface)s, and that developers will often go out of their way to avoid XML (eXtensible Markup Language). JSON is simpler to read and write, and it’s less prone to bugs.
So we developed JSON Feed, a format similar to RSS (Really Simple Syndication) and Atom but in JSON. It reflects the lessons learned from our years of work reading and publishing feeds.
See the spec. It’s at version 1, which may be the only version ever needed. If future versions are needed, version 1 feeds will still be valid feeds.
“JSON Feed: Home [4]”
It's not like I need another syndication format, and it's still unclear just how popular JSON Feed really is, but hey, I thought, it should be pretty easy to add this. It looks simple enough:
{ "version": "https://jsonfeed.org/version/1", "title": "My Example Feed", "home_page_url": "https://example.org/", "feed_url": "https://example.org/feed.json", "items": [ { "id": "2", "content_text": "This is a second item.", "url": "https://example.org/second-item" }, { "id": "1", "content_html": "<p>Hello, world!</p>", "url": "https://example.org/initial-post" } ] }
I just need to add another entry to the template section of the configuration file [5], create a few templates files, and as they say in England, “the brother of your mother is Robert [6]” (how they know my mother's brother is Robert, I don't know—the English are weird [7] like that).
But the issue is filling in the content_text field. The first issue—JSON is encoded using UTF-8 [8]. For me, that's not an issue, as I'm using UTF-8 (and even before I switched to using UTF-8, I was using ASCII (American Standard Code for Information Interchange) [9], which is valid UTF-8 by design). But in theory, someone could be using mod_blog [10] with some other encoding scheme, which means an invalid JSON Feed unless fed through a character set conversion routine, which I don't support in mod_blog.
But even assuming I did, that still doesn't mean I'm out of the water.
Suppose this was my content:
<p>"Hello," said the politician, lying.</p> <p>"Back up!" I said, using my left hand to quickly cover my wallet in my back pocket. "You aren't getting any money from me!"</p>
If you check the syntax of JSON [11], you'll see that the double quote character " needs to be converted to \". A similar transformation is required for the blank line, being converted to \n. And I have no code written in mod_blog for such conversions.
It's not like it would be that much code to write. When I added support for RSS and Atom, I had to write code. But it irks me that I have to special case a lot of string processing.
Yes, yes, I know—mod_blog is written in C, which is a horrible choice for string processing. But even if I picked a better language suited to the task, I would still have to write code to manually transform strings from, say, ISO-8859-1 [12] to UTF-8 and code to convert HTML (HyperText Markup Language) to a form of non-HTML:
<p>"Hello," said the politician, lying.</p> <p>"Back up!" I said, using my left hand to quickly cover my wallet in my back pocket. "You aren't getting any money from me!"</p>
(Not to get all meta, but to display the first example HTML, I had to encode it into the non-HTML you see above, and to display the non-HTML you see above, I have to encod the non-HTML into non-non-HTML—or in other words, convert the output yet again. So, to show a simple & in this page, I have to encode it as &, and to show that, I have to encode it as &amp, in ever deepening layers of Inception [13]-like encoding. By the way, that was encoded as &amp;amp;—just for your information.)
I spent way too much time trying to generalize a solution, only to ultimately reject the code. I'll probably just add the code I need to support JSON Feed and call it a day, because solving the issue once and for all is just too much work.
[1] https://en.wikipedia.org/wiki/RSS
[2] https://en.wikipedia.org/wiki/Atom_(standard)
[6] https://www.phrases.org.uk/meanings/bobs-your-uncle.html
[7] http://www.montypython.com/
[8] https://en.wikipedia.org/wiki/UTF-8
[9] https://en.wikipedia.org/wiki/ASCII
[10] https://github.com/spc476/mod_blog/
[12] https://en.wikipedia.org/wiki/ISO/IEC_8859-1