💾 Archived View for ddlyh.smol.pub › 2024-day-2 captured on 2024-08-25 at 00:28:34. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-08-18)
-=-=-=-=-=-=-
It being a Monday, I spent most of today at work, so it didn't matter that I was using a modern computer. I did bend the rules slightly by using it to check that the blogpost I made yesterday was properly viewable though.
I have made progress on my scripts late yesterday and this evening!
This script will convert an Atom or RSS feed to a webpage that works with Links:
let $c := count(//*:entry) let $cr := count(//*:item) (: for $m in //*:feed :) return if ($c > 0) then ( <a name="toc"></a>,<nav><ol>{ for $x at $n in //*:entry let $xt:=inner-html($x/*:title) let $xda:=( if ($x/*:published != '') then inner-html($x/*:published) else if ($x/*:updated != '') then inner-html($x/*:updated) else () ) order by $xda descending return <li><a href="#{$n}">{$xt}</a></li> }</ol></nav>,<main>{ for $x at $n in //*:entry let $xc:=( if ($x/*:content != '') then inner-html($x/*:content) else if ($x/*:summary != '') then inner-html($x/*:summary) else () ) let $xt:=inner-html($x/*:title) let $xl:=( if ($x/*:link/@href != '') then $x/*:link/@href else () ) let $xda:=( if ($x/*:published != '') then inner-html($x/*:published) else if ($x/*:updated != '') then inner-html($x/*:updated) else () ) return <article> <a name="{$n}"></a> <h1>{$xt}</h1> <p><strong>Date: {$xda}</strong></p> {parse-html(parse-html($xc))//body/*} <a href="{$xl}">Article</a> <a href="#toc">Back to TOC</a> </article> }</main> ) else if ($cr > 0) then ( <h1>{//channel/title}</h1>,<a name="toc"></a>,<nav><ol>{ for $x at $n in //*:item let $xt:=( if ($x/*:title != '') then inner-html($x/*:title) else inner-html($x/pubDate) ) let $xd:=inner-html($x/pubDate) return <li><a href="#{$n}">{if ($xt = $xd) then $xt else $xd,": ",$xt}</a></li> }</ol></nav>,<main>{ for $x at $n in //*:item let $xc:=( if ($x/*:description != '') then inner-html($x/*:description) else inner-html($x/*:title) ) let $xt:=( if ($x/*:title != '') then inner-html($x/*:title) else inner-html($x/pubDate) ) let $xd:=inner-html($x/pubDate) let $xl:=( if ($x/*:link != '') then inner-html($x/*:link) else () ) return <article> <a name="{$n}"></a> <h1>{if ($xt = $xd) then $xt else $xd,": ",$xt}</h1> {parse-html(parse-html($xc))//body/*} <a href="{$xl}">Article</a> <a href="#toc">Back to TOC</a> </article> }</main> ) else ( )
Sure, the code is still terrible and messy and I don't fully understand what I'm doing, but it works, darn it! :D
I have also been able to build on this with other scripts. This one converts an OPML export from my feed reader into a TSV I can parse with the shell:
#!/bin/sh xidel feeds.opml --extract 'for $x in //*:outline return $x/@text | $x/@xmlUrl'|awk 'NR % 2 == 0 {printf "%s\t" $0}; NR % 2 == 1 {printf "%s\n" $0}'
(That only needed doing once, but I saved it for posterity)
I can then use that file with my script to download all the feeds (yes, I know Xidel is supposed to have a downloader in it too, but it doesn't like this laptop):
#!/bin/sh awcl="$(wc -l feeds.txt|awk '{print $1}')" dodl() { url="$(cat feeds.txt|sed -n "${1}p"|cut -f 2)" txt="$(cat feeds.txt|sed -n "${1}p"|cut -f 1|sed "s/[^A-Za-z0-9]/_/g")" echo "$url ==> xml/$txt.xml" curl -L "$url" > "xml/$txt.xml" } if test "$#" == 1; then dodl "$1" exit fi for i in $(seq 1 $awcl); do dodl "$i" done
If given an argument, this script just downloads the feed at that line number, otherwise it downloads all of them. It's a bit of a bodge, but it works.
Then I have a script to use the earlier script to turn them into HTML files and build an index:
#!/bin/sh tohtml() { sh ./feed-to-html.sh "xml/$(basename "$1")" > "html/$(basename "$1" .xml).html" } getlbd() { xidel "$1" --extract '( if (//*:lastBuildDate != "") then inner-html(//*:lastBuildDate) else if (//*:channel/*:pubDate) then inner-html(//*:channel/*:pubDate) else inner-html(//*:feed/*:updated) )' } gettitle() { xidel "$1" --extract '( if (//*:channel != "") then inner-html(//*:channel/*:title) else inner-html(//*:feed/*:title) )' } echo "<!DOCTYPE html>">index.html echo "<html><head><title>Feeds $(date)</title></head><body><h1>Feeds $(date)</h1><ol>">>index.html for i in $(ls -1 xml/*); do echo "$i" wcl="$(wc -l "$i"|awk '{print $1}')" echo $wcl if test "$wcl" -gt 1 && test "$wcl" -lt 25000; then tohtml "$i" echo "<li><a href=\"html/$(basename "$i" ".xml").html\">$(gettitle "$i") ($(getlbd "$i"))</a></li>">>index.html fi done echo "</ol></body></html>">>index.html
(./feed-to-html.sh is the earlier script. The test to ensure fewer than 25000 lines is needed as one of my feeds uses more memory than this laptop posesses, so I'm filtering it out for now.)
Obviously, this is still very much a bodge, but it works!
Of course, YMMV with running any of these scripts, but they work with all the feeds I regularly use.
Discovered that cURL doesn't run at the same time as Mutt - I just get crashes because it can't use getaddrinfo(). Looks like I can't do much while I read emails, then! (Fortunately, Links still works).
Additionally/relatedly: Shout out to Amnesty International's website for being almost fully usable on Links!
For some reason, I've been trying to post this update for hours and it keeps failing. I hope I haven't broken SmolPub...)