💾 Archived View for idiomdrottning.org › my-xml-sunk-cost captured on 2022-07-16 at 14:10:19. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-03)

➡️ Next capture (2023-01-29)

-=-=-=-=-=-=-

My XML sunk cost

I’m often around people who hate XML (because yeah, it is pretty awful, almost as bad as as YAML or JSON) and I kinda squirm during the two minute hate sessions because I’ve spent so much time learning a bunch of XML-related tech.

Which sometimes comes in handy when I wanna do things like this. I just subscribed to the Atom feed of some webcomic I like (name and url obscured in the post because I don’t wanna hammer them with a bunch of image scrapers other than my own) and I didn’t like that the feed didn’t have the images, only links to posts. Which of course is their livelyhood, they want me to visit their page for upsell purps.

But it’s so easy to just transform their feed into something that does have the images:

(import matchable (chicken port) http-client html-parser
        sxml-serializer sxml-transforms ssax sxpath)
(serialize-sxml
 (pre-post-order-splice*
  (with-input-from-request
   "https://somewebcomic.url/feed/" #f
   (cut ssax:xml->sxml (current-input-port) '()))
  `((description .
                 ,(match-lambda*
                   (('description '("Webcomics")) '(description "Images from such-and-such webcomic name"))
                   ((tag body)
                    `(description
                      ,(serialize-sxml
                       `(img (@ (src
                                ,(cadar ((sxpath "//figure/img/@src")
                                         (with-input-from-request
                                          (cadar ((sxpath "//@href")
                                                  (with-input-from-string (car body) (cut html->sxml (current-input-port)))))
                                          #f (cut html->sxml (current-input-port)))))))))))))
    (*text* . ,(lambda (tag body) body))
    (*default* . ,cons)))
 output: (current-output-port))

Or for a repo, git clone https://idiomdrottning.org/my-xml-sunk-cost

Now, my first reaction upon doing this was “See! XML can be awesome!”

But on reflecting a bit… it’s not as if it would’ve been hard to do an awk pass on something like the format I use for https://idiomdrottning.org/blog/pixra.txt which is basically just a list of time stamps, png urls, and titles.

I was happy with the one-pager above, but with these kinds of simpler text formats it’s more one-liners than one-pagers.

The problem, of course, is standardization. That’s how XML started in the first place. Design by committee.

Update

Thinking some more about this, I’m pretty happy SVG isn’t in TSV or JSON. Sometimes XML is the right way to go♥