đŸ Archived View for republic.circumlunar.space âș users âș flexibeast âș gemlog âș 2022-08-30.gmi captured on 2023-01-29 at 02:47:35. Gemini links have been rewritten to link to archived content
âĄïž Next capture (2023-05-24)
-=-=-=-=-=-=-
i've recently had something of a Knuth moment, resulting in me writing a small POSIX shell script for generating an EPUB.
When putting together his epic âArt of Computer Programmingâ (âTAOCPâ) series[a], computing scientist Don Knuth[b] took a slight detour to develop the TeX typesetting system[c]:
In 1976, Knuth prepared a second edition of Volume 2, requiring it to be typeset again, but the style of type used in the first edition (called hot type) was no longer available. In 1977, he decided to spend some time creating something more suitable. Eight years later, he returned with TeX
on top of which computing scientist Leslie Lamport created LaTeX[d], which today is still the de facto standard for typesetting mathematics.
In my own case, last week i went looking for an EPUB version of a physical book i own, so that i could have it on my Kobo. It turned out the author made the contents available years ago, but as a number of distinct Web pages, rather than a single file, with many of them containing _bad_ HTML. By âbadâ i don't mean âa bit inelegantâ, i mean âCSS is a faraway land, let's use a morass of HTML elements with formatting attributesâ.
This is a problem, because the EPUB format is based on XHTML files, and XHTML, as an XML application, has to be valid markup; it doesn't allow the sort of tag soup that a number of HTML parsers are forgiving of. (Which has been to the short-term benefit of some but the long-term cost of many.)
The reason i know that EPUB is based on XHTML files is because i asked myself, âOkay, how can I turn this collection of crappy HTML files into an EPUB?â There doesn't seem to be an obvious way to use the Pandoc document-conversion tool[e] for this - please correct me if i'm wrong! - and it seemed excessive to have to install the Sigil EPUB authoring software[f], based as it is on QtWebEngine, which is a _huge_ dependency (particularly on Gentoo, where it has to be compiled; a â-binâ package for it is not currently available).
As a result, i found myself spelunking through the various EPUB specs to try to work out the minimum that i needed to do to create a usable EPUB. Now that i've done that, you don't have to. i've added another item to my âguidesâ collection:
and publicly released the small POSIX shell script i mentioned in my opening paragraph:
The latter is intended to be as portable as possible, having only cat(1p), date(1p) and zip(1) as external dependencies. i've run it through checkbashisms (though i use zsh myself); it flagged âread -pâ as a possible bashism, though POSIX doesn't seem to specify a âreadâ built-in[g][h], and i'm using â-pâ to specify a prompt, as supported by dash(1). (EDIT: The â-pâ option for âreadâ for OpenBSD 7.1's ksh(1) is about co-processes, and this isn't affected by âset -o posixâ, so i've removed the â-pâ option from âreadâ calls, relying instead on âecho -nâ.) i've also run it through shellcheck, which in addition to the âread -pâ issue (SC2039), flagged various other issues, a number of which i've addressed, but one of which is âecho -nâ - not specified by POSIX, but also supported by dash(1). (EDIT: u9000-Nine submitted, and i've merged, a PR which replaces âecho -nâ with a call to âprintfâ, which for some reason i keep forgetting is a POSIX utility as well as a function .... ) Please let me know if there are any ways the script's portability could be improved. :-)
--
đ· documentation,ict
--
[a] Wikipedia: âThe Art of Computer Programmingâ
[b] Wikipedia: âDonald Knuthâ
[g] âShell Command Languageâ
[h] It turns out âreadâ _is_ in POSIX, just as a standard utility, not a shell builtin: