2024-11-01 Gzip them all!

I keep an archive of the One Page Dungeon Contest. It's pretty big.

+------+------+
| Size | Year |
+------+------+
| 48M  | 2009 |
| 193M | 2010 |
| 121M | 2011 |
| 199M | 2012 |
| 149M | 2013 |
| 278M | 2014 |
| 364M | 2015 |
| 233M | 2016 |
| 250M | 2017 |
| 493M | 2018 |
| 345M | 2019 |
| 472M | 2020 |
| 217M | 2021 |
| 353M | 2022 |
| 492M | 2023 |
| 486M | 2024 |
+------+------+

So I decided I wanted to gzip the PDF files ("pre-compress") them. I found the answer I was looking for in Serving pre-compressed files using Apache by François Marier. Sometimes searching for stuff is hard just because you don't know what it's called. 😅

Serving pre-compressed files using Apache

AddEncoding gzip gz
Options +Multiviews
SetEnv force-no-vary
Header set Cache-Control "private"
<FilesMatch "\.pdf\.gz$">
ForceType application/pdf
</FilesMatch>

OK, time to gzip them all!

for d in 2*; cd /home/alex/campaignwiki.org/1pdc/$d; echo $d; gzip *.pdf; end

Aaaaand … the gains are abysmal! 😓

+------+------+
| Size | Year |
+------+------+
| 46M  | 2009 |
| 173M | 2010 |
| 110M | 2011 |
| 190M | 2012 |
| 126M | 2013 |
| 261M | 2014 |
| 351M | 2015 |
| 226M | 2016 |
| 225M | 2017 |
| 471M | 2018 |
| 325M | 2019 |
| 448M | 2020 |
| 206M | 2021 |
| 339M | 2022 |
| 472M | 2023 |
| 471M | 2024 |
+------+------+

The PDFs really are that big! 🤨

Somebody should put a size limit on submissions!

The whole collection is still 4.4G. 😞

#RPG #1PDC #Administration

*2024-11-04**. I started looking at Ghostscript to reduce filesize. The result of using `-dPDFSETTINGS=/ebook` is disappointing. The first PDF I opened had text in the original turned to a badly pixelated image.

I started reading Optimizing PDFs on the Ghostscript blog and my head started smoking.

Optimizing PDFs

I ended up writing the following:

markdown-links to turn the filenames into Markdown links for the wiki page
zip-original to create the first zip file of the originals
shrink-pdfs to shrink all the PDFs using `pdf-shrink`
pdf-shrink reduce the images in a PDF files to 150dpi (ebook quality)
zip-dir to create the zip file of the shrunk files for upload
upload-dir to upload the files to the site itself

To this, @mxp@mastodon.acm.org replied:

My invocation is less elaborate (w/o the threshold, filter settings, etc.), but similar in that I also downsample images to 150 dpi. In addition, I have `-dSubsetFonts=true -dCompressFonts=true`, but since I use this for my own LaTeX-generated documents, I guess I could drop this.

I didn't look into fonts because I don't mind people using weird fonts; for the moment images are a bigger problem than fonts.

Then I went through my local directories and called `pdf-shrink` on them all, regenerated the zip file containing the year's entries and gzipped the individual files.

As I was going through the files for 2024 I noticed that sometimes the filenames betray different names (from email senders, I presume), leaking privacy related information. I wanted to make sure that the filenames reflected the authors of the works and that made me realize two things:

1. not every entry has the license URL clearly visible

2. not every entry has the author names clearly visible

Then again, anonymous works are OK, but it would have saved me some time if it said "anonymous" somewhere. 😏

In any case, if you publish PDF files somewhere, here's what I'm planning to do from here on out:

1. add copyright information and the license (if any), i.e. a date or at least a year and the names of all the people that have rights to the work (text, art, layout, editing, maps, and so on)

2. add the work's title and the names of the all the people to the PDF's metadata ("properties")

3. put the work's title, data or version and the name of the main author (your name?) into the filename