💾 Archived View for gemi.dev › gemini-mailing-list › 000411.gmi captured on 2024-06-16 at 13:09:11. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-12-28)

-=-=-=-=-=-=-

Spec proposal

1. Ali Fardan (raiz (a) stellarbound.space)

Greetings follows, I just found out about gemini recently and I got
interested in the project and wanted to be involved, In the process of
setting up my gempod (that's how you call them?), I wanted to be able
to have an HTML/HTTP mirror for my gempod and I haven't found a gemtext
to HTML converter so I decided to write my own, and in the middle of
the process I thought if I'm going write a full parser for gemtext,
I might as well make the code reusable and package it as a library, so
the project shifted from a gemtext to HTML tool to a gemtext processing
library, and here I am.

However, throughout implementing the parser, I got into many questions
about the spec, I can't really recall all of them right now but here
are a few:

- In 5.5.2 it is stated that lines beginning with "* " are elements of
an unordered list, and I assumed the space after '*' is required since 
it
is included in the double quotes, when I asked for clarification on IRC,
others suggested I handle '*' with or without space afterwards, so I 
did.

- In 5.5.3, I had the same question with quotes, the spec says ">" which
sounds like there should be no space after ">", and if there is a space,
I should treat it like a part of the quote, I asked on IRC if I should
follow that behaviour but others also recommended that I should handle
spaces after ">" just in case, which leaves me to think again if the
spec intended that or not.

- In 5.5.1, It is not explicitly stated how many heading levels are
allowed within the spec, I assumed it followed the same way markdown
does it allowing up to 6 levels, however, after asking on IRC, I was
told 3 is max, so I went with that.

As of now, my implementation is complete, It is almost usable for
anyone willing to test it, I wrote manpages for all functions currently
implemented, but not for the data types yet, I'm going to work on that,
and as part of my project, I want to write a manpage for the text/gemini
format (gemtext(5)) and I want it to be precise and spec compliant,
if you don't mind, I'll go ahead and write the manpage as a proposal to
standardize some of the unclear cases of the spec, if the rest of the
community agrees, maybe get the spec updated too?

Attached is a tarball of my current implementation (WIP)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libgemtext.tar.gz
Type: application/x-gzip
Size: 59403 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201022/fef7
c467/attachment-0001.bin>

Link to individual message.

2. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ali Fardan once stated:
> Greetings follows, I just found out about gemini recently and I got
> interested in the project and wanted to be involved, In the process of
> setting up my gempod (that's how you call them?), I wanted to be able
> to have an HTML/HTTP mirror for my gempod and I haven't found a gemtext
> to HTML converter 

  Given how easy it is, I'm surprised there aren't more.  But by searching
the mailing list, I did fine reference to two Gemini-text-to-HTML
converters:

	https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3
	(written in Rebol, a blast from the past)

	https://git.sr.ht/~sotirisp/qute-gemini
	(Gemini text to Markdown to HTML in python3)

> so I decided to write my own, and in the middle of
> the process I thought if I'm going write a full parser for gemtext,
> I might as well make the code reusable and package it as a library, so
> the project shifted from a gemtext to HTML tool to a gemtext processing
> library, and here I am.

  Hello.

> As of now, my implementation is complete, It is almost usable for
> anyone willing to test it, I wrote manpages for all functions currently
> implemented, but not for the data types yet, I'm going to work on that,
> and as part of my project, I want to write a manpage for the text/gemini
> format (gemtext(5)) and I want it to be precise and spec compliant,
> if you don't mind, I'll go ahead and write the manpage as a proposal to
> standardize some of the unclear cases of the spec, if the rest of the
> community agrees, maybe get the spec updated too?
> 
> Attached is a tarball of my current implementation (WIP)

  And here are some comments from trying it out.  I wrote a simple Gemini
text file (with very long lines) and ran your test program over it.  In the
output you have some garbage data on the very first line:

00000000: 88 DB CB 23 20 4C 6F 72 65 6D 20 69 70 73 75 6D ...# Lorem ipsum
00000010: 20 64 6F 6C 6F 72 20 73 69 74 20 61 6D 65 74 2C  dolor sit amet,

  Thoughts:  sounds like you have some unitialized memory.  Aside from the
garbage bytes, the output did not match the input as the pre-formatted block
input did not have the ``` guards.  And the last blank line was not included
in the output either.

  I also ran it under valgrind [1] and found a leak in the happy path:

[spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes --leak-check=full 
./test </tmp/text.gemini >/tmp/t.gmi
==26859== Memcheck, a memory error detector.
==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==26859== Using LibVEX rev 1575, a library for dynamic binary translation.
==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==26859== Using valgrind-3.1.1, a dynamic binary instrumentation framework.
==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==26859== For more details, rerun with: -v
==26859== 
==26859== Conditional jump or move depends on uninitialised value(s)
==26859==    at 0x804A2C1: strlcat (strlcat.c:38)
==26859==    by 0x8049C7E: _line_append (encode.c:198)
==26859==    by 0x8049EF5: gemtext_encode (encode.c:263)
==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
==26859==    by 0x804867A: main (test.c:15)
==26859== 
==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 1)
==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks.
==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated.
==26859== For counts of detected errors, rerun with: -v
==26859== searching for pointers to 1 not-freed blocks.
==26859== checked 55,588 bytes.
==26859== 
==26859== 
==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of 1
==26859==    at 0x400579F: realloc (vg_replace_malloc.c:306)
==26859==    by 0x8049C56: _line_append (encode.c:194)
==26859==    by 0x8049D5D: gemtext_encode (encode.c:224)
==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
==26859==    by 0x804867A: main (test.c:15)
==26859== 
==26859== LEAK SUMMARY:
==26859==    definitely lost: 0 bytes in 0 blocks.
==26859==      possibly lost: 7,200 bytes in 1 blocks.
==26859==    still reachable: 0 bytes in 0 blocks.
==26859==         suppressed: 0 bytes in 0 blocks.

  You will also want to check the non-happy paths for memory leaks.  In my
experience, memory leaks are more likely in the non-happy path because
programmers rarely think through the non-happy path, and it's annoying to
write code to properly handle the non-happy paths in C.

  But I think it's wonderful that there was only one leak, and possibly an
easy one to fix.  The library itself appears easy to use (if you know C). 
Good job.

  -spc

[1]	If you are doing C, and have access to valgrind (it's almost always
	installed on every Linux system, or available to be installed), use
	it.  It is a fantastic tool to find memory leaks and issues with
	unitialized memory.  Yes, it's annoying having to track all the
	issues down, but I feel it's worth it.

Link to individual message.

3. Ali Fardan (raiz (a) stellarbound.space)

On 2020-10-23 01:01, Sean Conner wrote:
>   Given how easy it is, I'm surprised there aren't more.  But by 
> searching
> the mailing list, I did fine reference to two Gemini-text-to-HTML
> converters:
> 
> 	https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3
> 	(written in Rebol, a blast from the past)
> 
> 	https://git.sr.ht/~sotirisp/qute-gemini
> 	(Gemini text to Markdown to HTML in python3)

Interesting, still would like to have my own.

>   And here are some comments from trying it out.  I wrote a simple 
> Gemini
> text file (with very long lines) and ran your test program over it.  In 
> the
> output you have some garbage data on the very first line:
> 
> 00000000: 88 DB CB 23 20 4C 6F 72 65 6D 20 69 70 73 75 6D ...# Lorem 
> ipsum
> 00000010: 20 64 6F 6C 6F 72 20 73 69 74 20 61 6D 65 74 2C  dolor sit 
> amet,
> 
>   Thoughts:  sounds like you have some unitialized memory.  Aside from 
> the
> garbage bytes, the output did not match the input as the pre-formatted 
> block
> input did not have the ``` guards.  And the last blank line was not 
> included
> in the output either.

Thanks for taking the time, I'm going to write tests for the code now,
I haven't done that yet, I should probably be able to encounter that
bug again.  However, just in case I don't would you like to send me the
file you used to test with?

>   I also ran it under valgrind [1] and found a leak in the happy path:
> 
> [spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes
> --leak-check=full ./test </tmp/text.gemini >/tmp/t.gmi
> ==26859== Memcheck, a memory error detector.
> ==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et 
> al.
> ==26859== Using LibVEX rev 1575, a library for dynamic binary 
> translation.
> ==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
> ==26859== Using valgrind-3.1.1, a dynamic binary instrumentation 
> framework.
> ==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et 
> al.
> ==26859== For more details, rerun with: -v
> ==26859==
> ==26859== Conditional jump or move depends on uninitialised value(s)
> ==26859==    at 0x804A2C1: strlcat (strlcat.c:38)
> ==26859==    by 0x8049C7E: _line_append (encode.c:198)
> ==26859==    by 0x8049EF5: gemtext_encode (encode.c:263)
> ==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
> ==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
> ==26859==    by 0x804867A: main (test.c:15)
> ==26859==
> ==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 
> 1)
> ==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks.
> ==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated.
> ==26859== For counts of detected errors, rerun with: -v
> ==26859== searching for pointers to 1 not-freed blocks.
> ==26859== checked 55,588 bytes.
> ==26859==
> ==26859==
> ==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of 
> 1
> ==26859==    at 0x400579F: realloc (vg_replace_malloc.c:306)
> ==26859==    by 0x8049C56: _line_append (encode.c:194)
> ==26859==    by 0x8049D5D: gemtext_encode (encode.c:224)
> ==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
> ==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
> ==26859==    by 0x804867A: main (test.c:15)
> ==26859==
> ==26859== LEAK SUMMARY:
> ==26859==    definitely lost: 0 bytes in 0 blocks.
> ==26859==      possibly lost: 7,200 bytes in 1 blocks.
> ==26859==    still reachable: 0 bytes in 0 blocks.
> ==26859==         suppressed: 0 bytes in 0 blocks.
> 
>   You will also want to check the non-happy paths for memory leaks.  In 
> my
> experience, memory leaks are more likely in the non-happy path because
> programmers rarely think through the non-happy path, and it's annoying 
> to
> write code to properly handle the non-happy paths in C.
> 
>   But I think it's wonderful that there was only one leak, and possibly 
> an
> easy one to fix.  The library itself appears easy to use (if you know 
> C).
> Good job.

Oh yeah, I ran valgrind, on my test file though, which didn't trigger
that, I wouldn't call it a real test file because it was the end of the
day, I just wanted to see if it works.

Link to individual message.

4. Ali Fardan (raiz (a) stellarbound.space)

On 2020-10-23 01:01, Sean Conner wrote:
>   I also ran it under valgrind [1] and found a leak in the happy path:
> 
> [spc]lucy:/tmp/libgemtext>valgrind --show-reachable=yes
> --leak-check=full ./test </tmp/text.gemini >/tmp/t.gmi
> ==26859== Memcheck, a memory error detector.
> ==26859== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et 
> al.
> ==26859== Using LibVEX rev 1575, a library for dynamic binary 
> translation.
> ==26859== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
> ==26859== Using valgrind-3.1.1, a dynamic binary instrumentation 
> framework.
> ==26859== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et 
> al.
> ==26859== For more details, rerun with: -v
> ==26859==
> ==26859== Conditional jump or move depends on uninitialised value(s)
> ==26859==    at 0x804A2C1: strlcat (strlcat.c:38)
> ==26859==    by 0x8049C7E: _line_append (encode.c:198)
> ==26859==    by 0x8049EF5: gemtext_encode (encode.c:263)
> ==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
> ==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
> ==26859==    by 0x804867A: main (test.c:15)
> ==26859==
> ==26859== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 
> 1)
> ==26859== malloc/free: in use at exit: 7,200 bytes in 1 blocks.
> ==26859== malloc/free: 86 allocs, 85 frees, 84,247 bytes allocated.
> ==26859== For counts of detected errors, rerun with: -v
> ==26859== searching for pointers to 1 not-freed blocks.
> ==26859== checked 55,588 bytes.
> ==26859==
> ==26859==
> ==26859== 7,200 bytes in 1 blocks are possibly lost in loss record 1 of 
> 1
> ==26859==    at 0x400579F: realloc (vg_replace_malloc.c:306)
> ==26859==    by 0x8049C56: _line_append (encode.c:194)
> ==26859==    by 0x8049D5D: gemtext_encode (encode.c:224)
> ==26859==    by 0x804A182: gemtext_encode_fd (encode.c:339)
> ==26859==    by 0x804A1FB: gemtext_encode_file (encode.c:359)
> ==26859==    by 0x804867A: main (test.c:15)
> ==26859==
> ==26859== LEAK SUMMARY:
> ==26859==    definitely lost: 0 bytes in 0 blocks.
> ==26859==      possibly lost: 7,200 bytes in 1 blocks.
> ==26859==    still reachable: 0 bytes in 0 blocks.
> ==26859==         suppressed: 0 bytes in 0 blocks.

Leak is fixed, thank you. I missed that.
https://git.tilde.institute/raiz/libgemtext/commit/?id=2024b2562ad83a04fbfb
6699ca8dc4b877a676e4

Link to individual message.

5. Martin Bays (mbays (a) sdf.org)



>Given how easy it is, I'm surprised there aren't more.  But by 
>searching the mailing list, I did find reference to two 
>Gemini-text-to-HTML converters

Here's another which I didn't bother announcing to the list at the time:
gemini://gemini.thegonz.net/gmi2html.sed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201023/d4d3
af31/attachment.sig>

Link to individual message.

6. colecmac (a) protonmail.com (colecmac (a) protonmail.com)

> - In 5.5.2 it is stated that lines beginning with "* " are elements of
> an unordered list, and I assumed the space after '*' is required since
> it is included in the double quotes, when I asked for clarification on IRC,
> others suggested I handle '*' with or without space afterwards, so I
> did.

I would disagree with this. Originally there was no space required in the
spec, but this was changed, because some people may start lines with
asterisks as an "ASCII-art" way to show emphasis. To allow this writing
style to continue, you should require the space after the asterisk. This is
what the spec defines.

> - In 5.5.3, I had the same question with quotes, the spec says ">" which
> sounds like there should be no space after ">", and if there is a space,
> I should treat it like a part of the quote, I asked on IRC if I should
> follow that behaviour but others also recommended that I should handle
> spaces after ">" just in case, which leaves me to think again if the
> spec intended that or not.

Initially no line markers required a space after them. The list line marker
now does, as I described above, but the quote line marker was not changed,
as there doesn't seem to be a reason to. There doesn't seem to be writing
styles that start lines with a '>', but aren't referring to a quote. Working
without the space also follows markdown, which is a plus.

Cheers,
makeworld

Link to individual message.

7. Ali Fardan (raiz (a) stellarbound.space)

On 2020-10-23 21:48, colecmac at protonmail.com wrote:
>> - In 5.5.2 it is stated that lines beginning with "* " are elements of
>> an unordered list, and I assumed the space after '*' is required since
>> it is included in the double quotes, when I asked for clarification on 
>> IRC,
>> others suggested I handle '*' with or without space afterwards, so I
>> did.
> 
> I would disagree with this. Originally there was no space required in 
> the
> spec, but this was changed, because some people may start lines with
> asterisks as an "ASCII-art" way to show emphasis. To allow this writing
> style to continue, you should require the space after the asterisk. 
> This is
> what the spec defines.

I'm with you on that, also, mandatory whitespace makes it look cleaner
anyway, however, I'll keep handling both cases until the spec updates
to a clarified version.

>> - In 5.5.3, I had the same question with quotes, the spec says ">" 
>> which
>> sounds like there should be no space after ">", and if there is a 
>> space,
>> I should treat it like a part of the quote, I asked on IRC if I should
>> follow that behaviour but others also recommended that I should handle
>> spaces after ">" just in case, which leaves me to think again if the
>> spec intended that or not.
> 
> Initially no line markers required a space after them. The list line 
> marker
> now does, as I described above, but the quote line marker was not 
> changed,
> as there doesn't seem to be a reason to. There doesn't seem to be 
> writing
> styles that start lines with a '>', but aren't referring to a quote. 
> Working
> without the space also follows markdown, which is a plus.

Again, I'll keep handling both cases for this too, until the spec
clears out.

Link to individual message.

8. Charles E. Lehner (cel (a) celehner.com)

On Thu, 22 Oct 2020 18:01:34 -0400
Sean Conner <sean at conman.org> wrote:

> It was thus said that the Great Ali Fardan once stated:
> > Greetings follows, I just found out about gemini recently and I got
> > interested in the project and wanted to be involved, In the process
> > of setting up my gempod (that's how you call them?), I wanted to be
> > able to have an HTML/HTTP mirror for my gempod and I haven't found
> > a gemtext to HTML converter   
> 
>   Given how easy it is, I'm surprised there aren't more.  But by
> searching the mailing list, I did fine reference to two
> Gemini-text-to-HTML converters:
> 
> 	https://github.com/LukeEmmet/GemiNaut/blob/master/GemiNaut/GmiConverters/GmiToHtml.r3
> 	(written in Rebol, a blast from the past)
> 
> 	https://git.sr.ht/~sotirisp/qute-gemini
> 	(Gemini text to Markdown to HTML in python3)

dillo-gemini (gemini://celehner.com/dillo-gemini/) includes one using AWK. 
I pulled out the current version and put it here:
=> gemini://celehner.com/gemini-utils/gmi2html

There is also this one in Go, a previous version of which is used by the 
Kineto proxy I think:
=> https://git.sr.ht/~adnano/go-gemini/tree/master/text.go

Other Gemini-web proxies like Mozz.us, Vulpes, and RPoD would also have 
their own Gemini-text-to-HTML converters - but I don't know where the 
source is for those.

-- 
Charles
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201026/f006
da1c/attachment.sig>

Link to individual message.

9. Drew DeVault (sir (a) cmpwn.com)

On Mon Oct 26, 2020 at 12:17 PM EDT, Charles E. Lehner wrote:
> There is also this one in Go, a previous version of which is used by the
> Kineto proxy I think:
> => https://git.sr.ht/~adnano/go-gemini/tree/master/text.go

Kineto doesn't use this. I wrote my own HTML converter:

https://git.sr.ht/~sircmpwn/kineto/tree/master/main.go

I think anything related to HTML is out of scope for a generic Gemini
library like go-gemini, and should not be included.

Link to individual message.

10. Adnan Maolood (me (a) adnano.co)

On Mon Oct 26, 2020 at 8:17 AM EDT, Charles E. Lehner wrote:
> There is also this one in Go, a previous version of which is used by the
> Kineto proxy I think:
> => https://git.sr.ht/~adnano/go-gemini/tree/master/text.go

I have moved this function out of the package to an example as it is
somewhat out of scope. It can now be found here:
https://git.sr.ht/~adnano/go-gemini/tree/master/examples/html.go

Link to individual message.

11. Charles E. Lehner (cel (a) celehner.com)

On Mon, 26 Oct 2020 12:19:10 -0400
"Drew DeVault" <sir at cmpwn.com> wrote:

> Kineto doesn't use this. I wrote my own HTML converter:
> 
> https://git.sr.ht/~sircmpwn/kineto/tree/master/main.go

Thank you for the correction.

Also here is another Gemtext-HTML converter in Go:

https://github.com/boomlinde/gemini/blob/master/gemini/html.go

Used in https://github.com/boomlinde/gemini.filter.dpi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201026/6f5e
f75b/attachment.sig>

Link to individual message.

---

Previous Thread: gus stopped indexing?

Next Thread: proxy.vulpes.one not displaying preformatted text