💾 Archived View for station.martinrue.com › mimas › 4947c6da81ae4f7f862e08c8936ad49c captured on 2024-08-31 at 15:07:21. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-05-12)

-=-=-=-=-=-=-

👽 mimas

Quick question: what is the best workflow to parse pdf in gemtext and set up a small archive? Thanks a lot

2 years ago

Actions

👋 Join Station

3 Replies

👽 cipres

Really strange to have pdf and gemtext in the same sentence ^_^ · 2 years ago

👽 mfoo2

By design pdf is a bit of a dead end in terms of onward processing or parsing, since it is has very little semantic content, just letters on a page mostly. Any conversion of it will be probably be pretty rough and unsatisfying, or time consuming if you expect to have to clean up by hand afterwards. Why not simply keep them as PDF but generate or write a gemtext page that lists the entries with links to them? · 2 years ago

👽 smokey

So I havent done such a thing but have kind of done the reverse, taking in gemtext from various feeds and outputing it as a epub so perhaps I can help you theory craft a workflow.

1. pdf parser software, if you are a unix user see if theres such a command available in the repository, if not search on github for a cli pdf parser and compile the program.

2. A simple shell script will suffice to make the pdf-parser input pdf -> output text data to a into a generated text file.

If you wanted to look at the source for the gemtext to epub script I have, here it is.

gemini://tilde.team/~smokey/extras/script_dailydigest.gmi · 2 years ago

gemini://tilde.team/~smokey/extras/script_dailydigest.gmi