2012-08-03 Free French OCR for Mac

Once again I’ve decided I needed to work on the memoirs of my grandfather Roland Li-Marchetti. I had digitized 16 pages many years ago but his memoirs contain a total of 45 pages of typed text. The last time I worked on this, I was using the OCR software that came with my scanner (a cheap Canon LiDE 25) – but today the scanner was no longer recognized by the operating system. I faintly remember having experienced this before when I upgraded my system. Bit rot!

Roland Li-Marchetti

Anyway, I was in the mood to try something new. Free Software?

1. Tesseract

2. requires Leptonica

3. and I needed to install GNU Libtool because I was getting an error: “Libtool library used but `LIBTOOL' is undefined. The usual way to define`LIBTOOL’ is to add `AC_PROG_LIBTOOL' to`configure.ac’ and run `aclocal' and`autoconf’ again. If `AC_PROG_LIBTOOL' is in`configure.ac’, make sure its definition is in aclocal’s search path.”

Tesseract

Leptonica

GNU Libtool

(While the stuff is compiling, I am in fact using a free online OCR service.)

free online OCR

Here is the original, taken with my Pentax K100D, loaded into Gimp, rotated, cropped, and auto-adjusted levels.

/pics/7703476934_8e4cde0f8b_z.jpg

The tesseract output is pretty cool:

que mes vingt prochaines années soient aussi riches d'aventures
et de bonheur auprès des miens, main dans la main avec Agnès mon
inséparable complice qui a beaucoup sacrifié et que j'espère
pouvoir encore rendre heureuse.

(When I tried it on a direct photo of the page the result was far less pleasing.)

Yay!

for ((i=20; i<=46; i++)); do
    tesseract IMGP$((5210+$i)).JPG "page-$i" -l fra
done

#OCR #Software

Comments

(Please contact me if you want to remove your comment.)

⁂

Nice, there are multiple ring binders of my grandfather’s memoirs as well. One day I should digitize them too.

BTW. you might want to have a look for other OCR solutions (I guess most of what I’ve written there would apply to Mac as well).

for other OCR solutions

– Andreas Gohr 2012-08-03 13:09 UTC

Andreas Gohr

---

Excellent! Thank you very much. I feel relieved that I seem to have picked the best free option. 😄

– Alex Schroeder 2012-08-03 14:09 UTC

Alex Schroeder