💾 Archived View for gemi.dev › gemini-mailing-list › 000840.gmi captured on 2024-05-26 at 16:48:16. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-12-28)

-=-=-=-=-=-=-

How do you buffer a Gemini connection?

1. almaember (almaember (a) disroot.org)

Hello,

I'm planning on creating my own Gemini client (with C99 and OpenSSL, if
you want to know). However, I have a problem with how I should be
implementing the downloading part.

So, how should I be buffering a Gemini connection? Since there is no
size indicator, I can't be sure everything even fits into memory, or
how much time it's going to take to download, or if it's even going to
end at all.

Another problem is long-polling. Through my exploration of Geminispace,
I found a few capsules that used a hack to push live updates to the
client. Specifically, these servers didn't close the connection, but
instead kept it open and just sent new data when something happened.

Since the spec says nothing (or I'm blind) about buffering, so I want
to ask others about what I should be doing, while still being able to
actually parse the output.

My ideas as of right now:

 - Have a big buffer and store everything in it until the connection is
   closed. The connection would be terminated if:
    - the connection closes
    - the server didn't send anything for X seconds
 - Buffer by line and don't close the connection unless the user
   terminates it (by pressing the stop key or by loading another
   capsule). Compatible with long polling.

Thanks in advance for any help!


~almaember

Link to individual message.

2. Ecmel Berk Canlıer (me (a) ecmelberk.com)

On 3/28/21 11:07 PM, almaember wrote:

> or how much time it's going to take to download, or if it's even going > to end at all.

As far as I'm aware no client has solved this. If TLS close_notify gets 
standardized we might get a "connection failed" state, but other than that 
you won't be able to know how long the content is and whether you 
successfully downloaded it or not.

> My ideas as of right now:
> 
>   - Have a big buffer and store everything in it until the connection is
>     closed. The connection would be terminated if:
>      - the connection closes
>      - the server didn't send anything for X seconds

If I know which client you use I can just send a huge file over and 
possibly overflow your buffer. And it needs to be BIG. I'm aware of at 
least one capsule serving ~70MB files. (gemini://konpeito.media)

>   - Buffer by line and don't close the connection unless the user
>     terminates it (by pressing the stop key or by loading another
>     capsule). Compatible with long polling.

This is what I do on Moonlander. Not by line, as binary data can be sent 
over Gemini, but in 1029 byte chunks (exactly the max length of one Gemini 
header. Saves me code in trying to split that too)

I send each chunk into a separate "renderer" for the given file type, 
which in text's case decodes it into the given encoding (default UTF-8) 
and sends it over to the text/gemini parser which then does the line 
splitting and parsing there. (And in other cases it will just save the 
bytes directly to a file)

It's a bit more work, but I have successfully downloaded a few of those 
70MB files with this approach, so I'd say it's pretty robust.

Link to individual message.

3. almaember (almaember (a) disroot.org)

On Mon, 29 Mar 2021 07:17:31 +0300
Ecmel Berk Canlıer <me@ecmelberk.com> wrote:
 
> This is what I do on Moonlander. Not by line, as binary data can be
> sent over Gemini, but in 1029 byte chunks (exactly the max length of
> one Gemini header. Saves me code in trying to split that too)

That makes sense, it will even save me from having to mess with
malloc and realloc.

> I send each chunk into a separate "renderer" for the given file type, 
> which in text's case decodes it into the given encoding (default
> UTF-8) and sends it over to the text/gemini parser which then does
> the line splitting and parsing there. (And in other cases it will
> just save the bytes directly to a file)
One question, what do you do if a single line is split into multiple
chunks?

I will try checking your source for it, but I don't really understand
or know Rust so I'm not sure I will understand it.


~almaember

Link to individual message.

4. Ecmel Berk Canlıer (me (a) ecmelberk.com)

On 3/29/21 9:22 AM, almaember wrote:
> One question, what do you do if a single line is split into multiple
> chunks?

https://git.ebc.li/_/moonlander/src/branch/main/gemtext/src/lib.rs

The chunked_parse (Line 160) function reads character by character, saving 
everything into an "incomplete line" buffer (Line 31), and when it 
encounters a newline it empties and processes the incomplete line buffer 
and returns the list of "line objects" corresponding to the markup inside the buffer.

When the connection is closed, the finalize function (Line 221) runs and 
parses the incomplete line buffer one last time to complete the document.

Link to individual message.

5. Omar Polo (op (a) omarpolo.com)


almaember <almaember@disroot.org> writes:

> Hello,
>
> I'm planning on creating my own Gemini client (with C99 and OpenSSL, if
> you want to know). However, I have a problem with how I should be
> implementing the downloading part.
>
> So, how should I be buffering a Gemini connection? Since there is no
> size indicator, I can't be sure everything even fits into memory, or
> how much time it's going to take to download, or if it's even going to
> end at all.
>
> Another problem is long-polling. Through my exploration of Geminispace,
> I found a few capsules that used a hack to push live updates to the
> client. Specifically, these servers didn't close the connection, but
> instead kept it open and just sent new data when something happened.
>
> Since the spec says nothing (or I'm blind) about buffering, so I want
> to ask others about what I should be doing, while still being able to
> actually parse the output.
>
> My ideas as of right now:
>
>  - Have a big buffer and store everything in it until the connection is
>    closed. The connection would be terminated if:
>     - the connection closes
>     - the server didn't send anything for X seconds
>  - Buffer by line and don't close the connection unless the user
>    terminates it (by pressing the stop key or by loading another
>    capsule). Compatible with long polling.
>
> Thanks in advance for any help!
>
>
> ~almaember

I'm working on a client and doing exactly as Ecmel Berk Canlıer outlined
in his mail.  I'm fetching the page one chunk at the time (it allows me
to use a fixed buffer and plays nice with asynchronous I/O) then get
sent to a proper renderer (atm only text/gemini and one generic text/*
handler).  There it gets parsed and split line by line.

The only part where I have to actually keep a dynamic buffer is when
splitting in lines, because a line can (and often is) splitted across
chunks, and possibly also bigger than one chunk.

I'm using LibreSSL (plus some other OpenBSD goodies), but if it saves
you some time here is:


https://github.com/omar-polo/telescope/blob/main/parser.c


Warning: the client is not yet 100% functional, but I'm quite found of
this design.  One additional benefit is that you can also render the
page on the go, and that enables also to stream text/* stuff.

Cheers,

Omar Polo

Link to individual message.

6. Stephane Bortzmeyer (stephane (a) sources.org)

On Mon, Mar 29, 2021 at 07:17:31AM +0300,
 Ecmel Berk Canlıer <me@ecmelberk.com> wrote 
 a message of 36 lines which said:

> As far as I'm aware no client has solved this. If TLS close_notify
> gets standardized we might get a "connection failed" state,

Note that the question is currently under discussion in the
specification work
<https://gitlab.com/gemini-specification/protocol/-/issues/2>.

Link to individual message.

---

Previous Thread: Italian Translation

Next Thread: Re: Italian Translation (Stephane Bortzmeyer)