💾 Archived View for gemi.dev › gemini-mailing-list › 000411.gmi captured on 2024-06-16 at 13:09:11. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
Greetings follows, I just found out about gemini recently and I got interested in the project and wanted to be involved, In the process of setting up my gempod (that's how you call them?), I wanted to be able to have an HTML/HTTP mirror for my gempod and I haven't found a gemtext to HTML converter so I decided to write my own, and in the middle of the process I thought if I'm going write a full parser for gemtext, I might as well make the code reusable and package it as a library, so the project shifted from a gemtext to HTML tool to a gemtext processing library, and here I am. However, throughout implementing the parser, I got into many questions about the spec, I can't really recall all of them right now but here are a few: - In 5.5.2 it is stated that lines beginning with "* " are elements of an unordered list, and I assumed the space after '*' is required since it is included in the double quotes, when I asked for clarification on IRC, others suggested I handle '*' with or without space afterwards, so I did. - In 5.5.3, I had the same question with quotes, the spec says ">" which sounds like there should be no space after ">", and if there is a space, I should treat it like a part of the quote, I asked on IRC if I should follow that behaviour but others also recommended that I should handle spaces after ">" just in case, which leaves me to think again if the spec intended that or not. - In 5.5.1, It is not explicitly stated how many heading levels are allowed within the spec, I assumed it followed the same way markdown does it allowing up to 6 levels, however, after asking on IRC, I was told 3 is max, so I went with that. As of now, my implementation is complete, It is almost usable for anyone willing to test it, I wrote manpages for all functions currently implemented, but not for the data types yet, I'm going to work on that, and as part of my project, I want to write a manpage for the text/gemini format (gemtext(5)) and I want it to be precise and spec compliant, if you don't mind, I'll go ahead and write the manpage as a proposal to standardize some of the unclear cases of the spec, if the rest of the community agrees, maybe get the spec updated too? Attached is a tarball of my current implementation (WIP) -------------- next part -------------- A non-text attachment was scrubbed... Name: libgemtext.tar.gz Type: application/x-gzip Size: 59403 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201022/fef7 c467/attachment-0001.bin>
It was thus said that the Great Ali Fardan once stated: > Greetings follows, I just found out about gemini recently and I got > interested in the project and wanted to be involved, In the process of > setting up my gempod (that's how you call them?), I wanted to be able > to have an HTML/HTTP mirror for my gempod and I haven't found a gemtext > to HTML converter Given how easy it is, I'm surprised there aren't more. But by searching the mailing list, I did fine reference to two Gemini-text-to-HTML converters: https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3 (written in Rebol, a blast from the past) https://git.sr.ht/~sotirisp/qute-gemini (Gemini text to Markdown to HTML in python3) > so I decided to write my own, and in the middle of > the process I thought if I'm going write a full parser for gemtext, > I might as well make the code reusable and package it as a library, so > the project shifted from a gemtext to HTML tool to a gemtext processing > library, and here I am. Hello. > As of now, my implementation is complete, It is almost usable for > anyone willing to test it, I wrote manpages for all functions currently > implemented, but not for the data types yet, I'm going to work on that, > and as part of my project, I want to write a manpage for the text/gemini > format (gemtext(5)) and I want it to be precise and spec compliant, > if you don't mind, I'll go ahead and write the manpage as a proposal to > standardize some of the unclear cases of the spec, if the rest of the > community agrees, maybe get the spec updated too? > > Attached is a tarball of my current implementation (WIP) And here are some comments from trying it out. I wrote a simple Gemini text file (with very long lines) and ran your test program over it. In the output you have some garbage data on the very first line: 00000000: 88 DB CB 23 20 4C 6F 72 65 6D 20 69 70 73 75 6D ...# Lorem ipsum 00000010: 20 64 6F 6C 6F 72 20 73 69 74 20 61 6D 65 74 2C dolor sit amet, Thoughts: sounds like you have some unitialized memory. Aside from the garbage bytes, the output did not match the input as the pre-formatted block input did not have the ``` guards. And the last blank line was not included in the output either. I also ran it under valgrind [1] and found a leak in the happy path: [spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes --leak-check=full ./test </tmp/text.gemini >/tmp/t.gmi ==26859== Memcheck, a memory error detector. ==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al. ==26859== Using LibVEX rev 1575, a library for dynamic binary translation. ==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP. ==26859== Using valgrind-3.1.1, a dynamic binary instrumentation framework. ==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==26859== For more details, rerun with: -v ==26859== ==26859== Conditional jump or move depends on uninitialised value(s) ==26859== at 0x804A2C1: strlcat (strlcat.c:38) ==26859== by 0x8049C7E: _line_append (encode.c:198) ==26859== by 0x8049EF5: gemtext_encode (encode.c:263) ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) ==26859== by 0x804867A: main (test.c:15) ==26859== ==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 1) ==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks. ==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated. ==26859== For counts of detected errors, rerun with: -v ==26859== searching for pointers to 1 not-freed blocks. ==26859== checked 55,588 bytes. ==26859== ==26859== ==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==26859== at 0x400579F: realloc (vg_replace_malloc.c:306) ==26859== by 0x8049C56: _line_append (encode.c:194) ==26859== by 0x8049D5D: gemtext_encode (encode.c:224) ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) ==26859== by 0x804867A: main (test.c:15) ==26859== ==26859== LEAK SUMMARY: ==26859== definitely lost: 0 bytes in 0 blocks. ==26859== possibly lost: 7,200 bytes in 1 blocks. ==26859== still reachable: 0 bytes in 0 blocks. ==26859== suppressed: 0 bytes in 0 blocks. You will also want to check the non-happy paths for memory leaks. In my experience, memory leaks are more likely in the non-happy path because programmers rarely think through the non-happy path, and it's annoying to write code to properly handle the non-happy paths in C. But I think it's wonderful that there was only one leak, and possibly an easy one to fix. The library itself appears easy to use (if you know C). Good job. -spc [1] If you are doing C, and have access to valgrind (it's almost always installed on every Linux system, or available to be installed), use it. It is a fantastic tool to find memory leaks and issues with unitialized memory. Yes, it's annoying having to track all the issues down, but I feel it's worth it.
On 2020-10-23 01:01, Sean Conner wrote: > Given how easy it is, I'm surprised there aren't more. But by > searching > the mailing list, I did fine reference to two Gemini-text-to-HTML > converters: > > https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3 > (written in Rebol, a blast from the past) > > https://git.sr.ht/~sotirisp/qute-gemini > (Gemini text to Markdown to HTML in python3) Interesting, still would like to have my own. > And here are some comments from trying it out. I wrote a simple > Gemini > text file (with very long lines) and ran your test program over it. In > the > output you have some garbage data on the very first line: > > 00000000: 88 DB CB 23 20 4C 6F 72 65 6D 20 69 70 73 75 6D ...# Lorem > ipsum > 00000010: 20 64 6F 6C 6F 72 20 73 69 74 20 61 6D 65 74 2C dolor sit > amet, > > Thoughts: sounds like you have some unitialized memory. Aside from > the > garbage bytes, the output did not match the input as the pre-formatted > block > input did not have the ``` guards. And the last blank line was not > included > in the output either. Thanks for taking the time, I'm going to write tests for the code now, I haven't done that yet, I should probably be able to encounter that bug again. However, just in case I don't would you like to send me the file you used to test with? > I also ran it under valgrind [1] and found a leak in the happy path: > > [spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes > --leak-check=full ./test </tmp/text.gemini >/tmp/t.gmi > ==26859== Memcheck, a memory error detector. > ==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et > al. > ==26859== Using LibVEX rev 1575, a library for dynamic binary > translation. > ==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP. > ==26859== Using valgrind-3.1.1, a dynamic binary instrumentation > framework. > ==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et > al. > ==26859== For more details, rerun with: -v > ==26859== > ==26859== Conditional jump or move depends on uninitialised value(s) > ==26859== at 0x804A2C1: strlcat (strlcat.c:38) > ==26859== by 0x8049C7E: _line_append (encode.c:198) > ==26859== by 0x8049EF5: gemtext_encode (encode.c:263) > ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) > ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) > ==26859== by 0x804867A: main (test.c:15) > ==26859== > ==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from > 1) > ==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks. > ==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated. > ==26859== For counts of detected errors, rerun with: -v > ==26859== searching for pointers to 1 not-freed blocks. > ==26859== checked 55,588 bytes. > ==26859== > ==26859== > ==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of > 1 > ==26859== at 0x400579F: realloc (vg_replace_malloc.c:306) > ==26859== by 0x8049C56: _line_append (encode.c:194) > ==26859== by 0x8049D5D: gemtext_encode (encode.c:224) > ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) > ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) > ==26859== by 0x804867A: main (test.c:15) > ==26859== > ==26859== LEAK SUMMARY: > ==26859== definitely lost: 0 bytes in 0 blocks. > ==26859== possibly lost: 7,200 bytes in 1 blocks. > ==26859== still reachable: 0 bytes in 0 blocks. > ==26859== suppressed: 0 bytes in 0 blocks. > > You will also want to check the non-happy paths for memory leaks. In > my > experience, memory leaks are more likely in the non-happy path because > programmers rarely think through the non-happy path, and it's annoying > to > write code to properly handle the non-happy paths in C. > > But I think it's wonderful that there was only one leak, and possibly > an > easy one to fix. The library itself appears easy to use (if you know > C). > Good job. Oh yeah, I ran valgrind, on my test file though, which didn't trigger that, I wouldn't call it a real test file because it was the end of the day, I just wanted to see if it works.
On 2020-10-23 01:01, Sean Conner wrote: > I also ran it under valgrind [1] and found a leak in the happy path: > > [spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes > --leak-check=full ./test </tmp/text.gemini >/tmp/t.gmi > ==26859== Memcheck, a memory error detector. > ==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et > al. > ==26859== Using LibVEX rev 1575, a library for dynamic binary > translation. > ==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP. > ==26859== Using valgrind-3.1.1, a dynamic binary instrumentation > framework. > ==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et > al. > ==26859== For more details, rerun with: -v > ==26859== > ==26859== Conditional jump or move depends on uninitialised value(s) > ==26859== at 0x804A2C1: strlcat (strlcat.c:38) > ==26859== by 0x8049C7E: _line_append (encode.c:198) > ==26859== by 0x8049EF5: gemtext_encode (encode.c:263) > ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) > ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) > ==26859== by 0x804867A: main (test.c:15) > ==26859== > ==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from > 1) > ==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks. > ==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated. > ==26859== For counts of detected errors, rerun with: -v > ==26859== searching for pointers to 1 not-freed blocks. > ==26859== checked 55,588 bytes. > ==26859== > ==26859== > ==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of > 1 > ==26859== at 0x400579F: realloc (vg_replace_malloc.c:306) > ==26859== by 0x8049C56: _line_append (encode.c:194) > ==26859== by 0x8049D5D: gemtext_encode (encode.c:224) > ==26859== by 0x804A182: gemtext_encode_fd (encode.c:339) > ==26859== by 0x804A1FB: gemtext_encode_file (encode.c:359) > ==26859== by 0x804867A: main (test.c:15) > ==26859== > ==26859== LEAK SUMMARY: > ==26859== definitely lost: 0 bytes in 0 blocks. > ==26859== possibly lost: 7,200 bytes in 1 blocks. > ==26859== still reachable: 0 bytes in 0 blocks. > ==26859== suppressed: 0 bytes in 0 blocks. Leak is fixed, thank you. I missed that. https://git.tilde.institute/raiz/libgemtext/commit/?id=2024b2562ad83a04fbfb 6699ca8dc4b877a676e4
> - In 5.5.2 it is stated that lines beginning with "* " are elements of > an unordered list, and I assumed the space after '*' is required since > it is included in the double quotes, when I asked for clarification on IRC, > others suggested I handle '*' with or without space afterwards, so I > did. I would disagree with this. Originally there was no space required in the spec, but this was changed, because some people may start lines with asterisks as an "ASCII-art" way to show emphasis. To allow this writing style to continue, you should require the space after the asterisk. This is what the spec defines. > - In 5.5.3, I had the same question with quotes, the spec says ">" which > sounds like there should be no space after ">", and if there is a space, > I should treat it like a part of the quote, I asked on IRC if I should > follow that behaviour but others also recommended that I should handle > spaces after ">" just in case, which leaves me to think again if the > spec intended that or not. Initially no line markers required a space after them. The list line marker now does, as I described above, but the quote line marker was not changed, as there doesn't seem to be a reason to. There doesn't seem to be writing styles that start lines with a '>', but aren't referring to a quote. Working without the space also follows markdown, which is a plus. Cheers, makeworld
On 2020-10-23 21:48, colecmac at protonmail.com wrote: >> - In 5.5.2 it is stated that lines beginning with "* " are elements of >> an unordered list, and I assumed the space after '*' is required since >> it is included in the double quotes, when I asked for clarification on >> IRC, >> others suggested I handle '*' with or without space afterwards, so I >> did. > > I would disagree with this. Originally there was no space required in > the > spec, but this was changed, because some people may start lines with > asterisks as an "ASCII-art" way to show emphasis. To allow this writing > style to continue, you should require the space after the asterisk. > This is > what the spec defines. I'm with you on that, also, mandatory whitespace makes it look cleaner anyway, however, I'll keep handling both cases until the spec updates to a clarified version. >> - In 5.5.3, I had the same question with quotes, the spec says ">" >> which >> sounds like there should be no space after ">", and if there is a >> space, >> I should treat it like a part of the quote, I asked on IRC if I should >> follow that behaviour but others also recommended that I should handle >> spaces after ">" just in case, which leaves me to think again if the >> spec intended that or not. > > Initially no line markers required a space after them. The list line > marker > now does, as I described above, but the quote line marker was not > changed, > as there doesn't seem to be a reason to. There doesn't seem to be > writing > styles that start lines with a '>', but aren't referring to a quote. > Working > without the space also follows markdown, which is a plus. Again, I'll keep handling both cases for this too, until the spec clears out.
On Thu, 22 Oct 2020 18:01:34 -0400 Sean Conner <sean at conman.org> wrote: > It was thus said that the Great Ali Fardan once stated: > > Greetings follows, I just found out about gemini recently and I got > > interested in the project and wanted to be involved, In the process > > of setting up my gempod (that's how you call them?), I wanted to be > > able to have an HTML/HTTP mirror for my gempod and I haven't found > > a gemtext to HTML converter > > Given how easy it is, I'm surprised there aren't more. But by > searching the mailing list, I did fine reference to two > Gemini-text-to-HTML converters: > > https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3 > (written in Rebol, a blast from the past) > > https://git.sr.ht/~sotirisp/qute-gemini > (Gemini text to Markdown to HTML in python3) dillo-gemini (gemini://celehner.com/dillo-gemini/) includes one using AWK. I pulled out the current version and put it here: => gemini://celehner.com/gemini-utils/gmi2html There is also this one in Go, a previous version of which is used by the Kineto proxy I think: => https://git.sr.ht/~adnano/go-gemini/tree/master/text.go Other Gemini-web proxies like Mozz.us, Vulpes, and RPoD would also have their own Gemini-text-to-HTML converters - but I don't know where the source is for those. -- Charles -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201026/f006 da1c/attachment.sig>
On Mon Oct 26, 2020 at 12:17 PM EDT, Charles E. Lehner wrote: > There is also this one in Go, a previous version of which is used by the > Kineto proxy I think: > => https://git.sr.ht/~adnano/go-gemini/tree/master/text.go Kineto doesn't use this. I wrote my own HTML converter: https://git.sr.ht/~sircmpwn/kineto/tree/master/main.go I think anything related to HTML is out of scope for a generic Gemini library like go-gemini, and should not be included.
On Mon Oct 26, 2020 at 8:17 AM EDT, Charles E. Lehner wrote: > There is also this one in Go, a previous version of which is used by the > Kineto proxy I think: > => https://git.sr.ht/~adnano/go-gemini/tree/master/text.go I have moved this function out of the package to an example as it is somewhat out of scope. It can now be found here: https://git.sr.ht/~adnano/go-gemini/tree/master/examples/html.go
On Mon, 26 Oct 2020 12:19:10 -0400 "Drew DeVault" <sir at cmpwn.com> wrote: > Kineto doesn't use this. I wrote my own HTML converter: > > https://git.sr.ht/~sircmpwn/kineto/tree/master/main.go Thank you for the correction. Also here is another Gemtext-HTML converter in Go: https://github.com/boomlinde/gemini/blob/master/gemini/html.go Used in https://github.com/boomlinde/gemini.filter.dpi -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201026/6f5e f75b/attachment.sig>
---
Previous Thread: gus stopped indexing?
Next Thread: proxy.vulpes.one not displaying preformatted text