💾 Archived View for station.martinrue.com › mimas › 4947c6da81ae4f7f862e08c8936ad49c captured on 2023-09-08 at 17:57:19. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-22)
-=-=-=-=-=-=-
Quick question: what is the best workflow to parse pdf in gemtext and set up a small archive? Thanks a lot
1 year ago
Really strange to have pdf and gemtext in the same sentence ^_^ · 1 year ago
By design pdf is a bit of a dead end in terms of onward processing or parsing, since it is has very little semantic content, just letters on a page mostly. Any conversion of it will be probably be pretty rough and unsatisfying, or time consuming if you expect to have to clean up by hand afterwards. Why not simply keep them as PDF but generate or write a gemtext page that lists the entries with links to them? · 1 year ago
So I havent done such a thing but have kind of done the reverse, taking in gemtext from various feeds and outputing it as a epub so perhaps I can help you theory craft a workflow.
1. pdf parser software, if you are a unix user see if theres such a command available in the repository, if not search on github for a cli pdf parser and compile the program.
2. A simple shell script will suffice to make the pdf-parser input pdf -> output text data to a into a generated text file.
If you wanted to look at the source for the gemtext to epub script I have, here it is.
gemini://tilde.team/~smokey/extras/script_dailydigest.gmi · 1 year ago