2020-07-24 Writing Markdown, generating PDF

I find that I like this Markdown → HTML → PDF process more and more because I get to leverage skills that I use on a regular basis: writing Markdown files and fiddling with simple CSS files.

this Markdown → HTML → PDF process

This is how it works: I use a Makefile. Makefiles are good for command line stuff integration because otherwise I’d forget how to do things.

Example:

all: a-thousand-dungeons.pdf

%.pdf: %.html %.css map1.svg map2.svg map3.svg
	weasyprint {body}lt; $@

%.html: %.md prefix suffix
	python3 -m markdown \
		--extension=markdown.extensions.tables \
		--extension markdown.extensions.smarty \
		--file=x$@ {body}lt;
	cat prefix x$@ suffix > $@
	rm x$@

clean:
	rm -f *.html *.pdf

Let’s look at it. The stuff that isn’t preceded by a TAB character is a rule. The file before the colon is the target. The files after the colon are the prerequisites. The percent character is used for patterns. All the indented stuff are the commands to run in order to make the target based on the prerequisites. There are also a few variables: $@ is the target name, {body}lt; is the first prerequisite name.

Here’s how to read the file: if you want to make all, you need to make a-thousand-dungeons.pdf (first rule); if you want to make any kind of PDF file, you need a matching HTML and CSS file, and the three images map1.svg, map2.svg, and map3.svg (second rule); if you want to make any kind of HTML file, you need a matching MD file, and the two files prefix and suffix (third rule).

Also, if you want a clean slate, you don’t need anything. Just remove all the HTML and PDF files (last rule).

Let’s look at the commands used by the various rules.

To turn a Markdown file into a HTML file (third rule), we use Python with the markdown module. On my system, you can install it as follows:

sudo apt install python3-markdown

The problem is that the HTML generated has neither header nor footer. That’s where the prefix and suffix files come in. What actually happens in order to make the HTML file:

python3 -m markdown \
	--extension=markdown.extensions.tables \
	--extension markdown.extensions.smarty \
	--file=xa-thousand-dungeons.html a-thousand-dungeons.md
cat prefix xa-thousand-dungeons.html suffix > a-thousand-dungeons.html
rm xa-thousand-dungeons.html

That is, we use Python to make a temporary file prefixed with an “x” and then we concatenate the prefix, the x file, and the suffix to create our HTML file.

The prefix simply starts the HTML file and links the CSS file:

<!doctype html>
<html lang=en>
  <head>
    <meta charset="utf-8"/>
    <link type="text/css" rel="stylesheet" href="a-thousand-dungeons.css"/>
  </head>
  <body>

The suffix simple closes the two tags we opened in the prefix:

</body>
</html>

To turn it into a PDF, we use weasyprint. I think I installed that one via pip:

pip3 install WeasyPrint

What the command does is simple:

weasyprint a-thousand-dungeons.html a-thousand-dungeons.pdf

So now, whenever I make changes, I can use “make” to update the PDF:

make a-thousand-dungeons.pdf

Sometimes you’ll find that you need some Markdown extensions (the alternative is of course to just use HTML in your Markdown). You can find the available extensions here:

Python Markdown: Officially Supported Extensions

The CSS can contain some of the rules relevant for pagination, like page numbers:

@page {
    @bottom-center {
	content: counter(page);
    }
}

And that’s it. I find this good enough for many documents. No need to dive into LaTeX anymore!

Examples:

Knives

Halberts

Farnthal

Altenstein

Helmbarten

Just Halberds

Spellcasters

Tag: ​#Publishing ​#Markdown

Comments

(Please contact me if you want to remove your comment.)

I like this use of Makefiles to minimize the build time of your pdf. I assume this is because you generate new pdfs every time someone posts a comment on your website, and you have such high traffic that anything short of this would be impossible.

However, it’s a shame that your Makefile encodes the fact that every markdown file will include those three images; you might as well have done `*.svg`, which is just as wrong, but more flexible.

Worst of all is the prefix that includes the css filename, making it hard to carry this over to other markdown files.

Have you looked at this pdfkit / python-markdown combination? Seems a very flexible way of doing the same thing, and I like how I get an index for free on each title.

this

– Tama McGlinn 2020-10-05 13:44 UTC

Tama McGlinn

---

I don’t generate enough such PDF files for this to be a concern of mine. The next project might not use any SVG file, or some PNG and JPG files, who knows.

I have used `wkhtmltopdf` before. This wiki offers a PDF download of each page, at the very bottom, using `wkhtmltopdf`, since the wiki isn’t written in clean Markdown but in a glorious, eclectic mix of markup rules.

PDF Button

– Alex 2020-10-05 15:51 UTC