Created: 2022-08-27T06:52:40-05:00
This card pertains to a resource available on the internet.
$ infile=scan.pdf $ tmpfile=$(mktemp) $ outfile=searchable-scan.pdf $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$tmpfile" "$infile" $ ocrmypdf -l eng --deskew "$tmpfile" "$outfile" $ rm $tmpfile
Order of compression matters. Article author found running optimization with gs prior to OCRmyPDF shaved the file from 1.5mb to 1mb. Running only OCRmyPDF took the scanner's raw output from 7.9mb to 2.7mb.
jbig2enc is an aggressive compressor for purely black and white images