Document processing

In the past I did some research on what should be the _contents_ of software system documentation. It wasn't really relevant, but I also got interested in tools to format technical documents at that time too. I recently revisited this, just to see what has changed.

First, I'm only interested in markup languages. In particular, opaque binary formats like Word just create a lot of work for people. With a text file, particularly if you use semantic line breaking, you can branch and merge no problem. I'd recommend putting the documentation in a "doc" directory at the root of your project, then tags & releases always link the source code and relevant documentation.

Semantic Line Breaks

Markdown seems to be the most popular choice nowadays. I think djot is a worthy cleaned-up version of this. The first drafts can be text-only and just formatted to HTML using djot.

CommonMark, a markdown dialect

djot

Later you might want to add more varied content. This is when Pandoc will be useful. For diagrams you can also do them in markup using plantuml, and integrate them into the document using the panda pandoc filter.

Pandoc

PlantUML

Panda, useful for file & diagram inclusion

Pandoc also allows you to target Gemtext, using the md2gemini script. This is a nice fit with the view of Gemini as a "read-only document distribution protocol".

md2gemini

If you want to do things beyond this djot/panda/pandoc combination, I think it's best to switch to PDF output (via LaTeX). djot allows adding fragments of LaTeX. This might mean that, e.g. your bibliography won't appear in HTML or Gemtext, but that's ok. All the core text and diagrams should be there. With LaTeX you can format bibliographies, etc. And I think this allows a gradual imposition of more structure on a simple djot first draft (if required at all). It can be difficult to persuade some software engineers to write documents, you really don't want to introduce any obstacles (like mandating the use of MS Word). Note also that I did try to use ConTeXt, but it was missing quite a few features I wanted, e.g. Harvard bibliography style.

LaTeX Wikibook

I've seen the recommended LaTeX setup change quite a bit. In particular:

Also, it seems best to pass title, author, etc. through in a YAML metadata file to Pandoc. Then it can fill these fields into its template. For example, you could use a command like the following:

djot doc.dj -t pandoc | pandoc -f json -t latex --metadata-file=doc.yaml -s -o doc.tex

but in practice, for current versions of djot and pandoc there is a compatability check that stops this working. I've been running very recent versions of djot and pandoc, you might need to go to the home pages to get them. You actually need the following now:

djot doc.dj -t pandoc | jq '."pandoc-api-version" |= [1,23]' | pandoc -f json -t latex --metadata-file=doc.yaml -s -o doc.tex

Where doc.yaml was

documentclass: scrartcl
title: P and NP
author:
- Smart Person
header-includes: |
  \usepackage[backend=biber]{biblatex}
  \addbibresource{refs.bib}
abstract: |
  The matter of whether or not P = NP is finally settled.

I've verified that this collection of tools works together at least. And there's no need to use everything, as I said you can start out with just djot, and gradually introduce the others as needed (if at all).

Back to my gemlog