๐Ÿ‘ฝ mimas

Quick question: what is the best workflow to parse pdf in gemtext and set up a small archive? Thanks a lot

2 years ago

Actions

๐Ÿ‘‹ Join Station

3 Replies

๐Ÿ‘ฝ cipres

Really strange to have pdf and gemtext in the same sentence ^_^ ยท 2 years ago

๐Ÿ‘ฝ mfoo2

By design pdf is a bit of a dead end in terms of onward processing or parsing, since it is has very little semantic content, just letters on a page mostly. Any conversion of it will be probably be pretty rough and unsatisfying, or time consuming if you expect to have to clean up by hand afterwards. Why not simply keep them as PDF but generate or write a gemtext page that lists the entries with links to them? ยท 2 years ago

๐Ÿ‘ฝ smokey

So I havent done such a thing but have kind of done the reverse, taking in gemtext from various feeds and outputing it as a epub so perhaps I can help you theory craft a workflow.

1. pdf parser software, if you are a unix user see if theres such a command available in the repository, if not search on github for a cli pdf parser and compile the program.

2. A simple shell script will suffice to make the pdf-parser input pdf -> output text data to a into a generated text file.

If you wanted to look at the source for the gemtext to epub script I have, here it is.

gemini://tilde.team/~smokey/extras/script_dailydigest.gmi ยท 2 years ago

gemini://tilde.team/~smokey/extras/script_dailydigest.gmi