Gemini file downloading client - a thought experiment

1. Katarina Eriksson (gmym (a) coopdot.com)

This is in response to the thread on content length. I propose making a
supporting specification separate from the Gemini specification.

Feel free to extend the experiment with more problems and scenarios or
point out if I made the wrong conclusion.

Problem A: clients don't have enough information to show a progress bar

Scenario 1: download a tiny file

The user clicks the download button and the download is completed within a
second. No need to show progress.

Scenario 2: download a medium sized file

The user clicks the download button and the download starts. The client
shows the number of bytes downloaded, download speed and elapsed time. A
button labeled "show progress" appears after 5 seconds or so. The download
completes before the user is tempted to click that button.

Scenario 3: download a big file

Like scenario 2 except the user gets impatient enough to click the "show
progress" button. The client makes a separate request to a standardized
endpoint to get the needed information. If the server supports this
standard, the information is sent and the client can show the progress bar
to the user.

Problem B: we can't tell if the download succeed

Scenario 4: the TLS session terminated cleanly but we don't have full
confidence the download succeed

Verify the file using the information we have (from scenario 3) or present
the user with the option to fetch a hash from the standardized endpoint.

--
Katarina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201103/ac41
f38e/attachment.htm>

Link to individual message.

2. Emery (ehmry (a) posteo.net)

On Dienstag, 3. November 2020 22:35:14 CET, Katarina Eriksson wrote:
> This is in response to the thread on content length. I propose making a
> supporting specification separate from the Gemini specification.
>
> Feel free to extend the experiment with more problems and scenarios or
> point out if I made the wrong conclusion.

Please, just use http or rsync or even better, magnet links for
directing to bulk data. GUS stats already show that there are ~1.5
octet-stream links for each Gemini page.

Cheers
E.

Link to individual message.

3. Ali Fardan (raiz (a) stellarbound.space)

On Tue, 3 Nov 2020 22:35:14 +0100
Katarina Eriksson <gmym at coopdot.com> wrote:
> Problem A: clients don't have enough information to show a progress
> bar

It is stated in
gemini://gemini.circumlunar.space/docs/best-practices.gmi that Gemini
is not well suited for large files, and the transmission of these could
be done using an external protocol.

> Scenario 3: download a big file
> 
> Like scenario 2 except the user gets impatient enough to click the
> "show progress" button. The client makes a separate request to a
> standardized endpoint to get the needed information. If the server
> supports this standard, the information is sent and the client can
> show the progress bar to the user.

Implementing this would be tricky on the server side, this would add
unnecessary complexity, what I would suggest instead if you really
insist on using Gemini for large file transfers is specifying file size
in the link label like so:

=> gemini://example.tld/large_file.bin Download (140 MB)

and then have the client show how much of the file is downloaded, now
that you know the file size, and know how much is downloaded, you don't
need a progress indicator to be implemented within the protocol.  Of
course this is on the client to implement, but wouldn't be tricky at
all.

> Scenario 4: the TLS session terminated cleanly but we don't have full
> confidence the download succeed
> 
> Verify the file using the information we have (from scenario 3) or
> present the user with the option to fetch a hash from the
> standardized endpoint.

This does not have to be embedded in the protocol, just do this:

SHA1: ad7ff785f989c9ff9cec92bd3d0bab035a54e997
SHA256: d2af944fe8b3af5cf5a9d8c3226d06ac6369d036fc591363dedc85344bc4daa7
=> gemini://example.tld/large_file.bin Download (140 MB)

now the user can verify their downloaded file, this is done all the
time even in HTTP land.

Link to individual message.

4. Katarina Eriksson (gmym (a) coopdot.com)

Emery <ehmry at posteo.net> wrote:

> On Dienstag, 3. November 2020 22:35:14 CET, Katarina Eriksson wrote:
> > This is in response to the thread on content length. I propose making a
> > supporting specification separate from the Gemini specification.
> >
> > Feel free to extend the experiment with more problems and scenarios or
> > point out if I made the wrong conclusion.
>
> Please, just use http or rsync or even better, magnet links for
> directing to bulk data. GUS stats already show that there are ~1.5
> octet-stream links for each Gemini page.
>
> Cheers
> E.
>

Yes, I agree. There are plenty of other protocols to choose from which are
better suited for transferring large files. But that's beside the point.

The purpose of this thought experiment is to find a use case where you
can't work within the constraints and have to add a content length header.

My expectation is that there are no such use case, certainly not one worth
braking compatibility for, but I'm keeping an open mind.

New people will join and ask for a content length header again in a few
months. It has happened before, more than once, in this very young mailing
list. If we would have a supporting specification which deals with this, we
would have something to point at.

Anyone who want to join me in my experiment?

-- 
Katarina

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201104/c4ad
cd1a/attachment.htm>

Link to individual message.

5. acdw (acdw (a) acdw.net)

On 2020-11-04 (Wednesday) at 01:25, Katarina Eriksson <gmym at coopdot.com> wrote:
> 
> Yes, I agree. There are plenty of other protocols to choose from which 
> are better suited for transferring large files. But that's beside the 
> point.

I'd say that's precisely the point -- Gemini is built for the serving of 
text files.. It'd be like (pardon the HTTP-only metaphor) if you decided 
to tweet out the contents of /War and Peace/ -- you could do it, but why 
would you want to?

> 
> The purpose of this thought experiment is to find a use case where you 
> can't work within the constraints and have to add a content length 
> header.
> 
> My expectation is that there are no such use case, certainly not one 
> worth braking compatibility for, but I'm keeping an open mind.

I agree that there's not enough of a use case for a Content-Length: header 
-- see above on the purpose of Gemini.  I mean, people can host whatever 
they'd like (I love the Konpeito mixtapes as much as anyone, for example), 
but I don't think anyone should *expect* to easily download large files 
from Gemini capsules.

> 
> New people will join and ask for a content length header again in a few 
> months. It has happened before, more than once, in this very young 
> mailing list. If we would have a supporting specification which deals 
> with this, we would have something to point at.

There was some talke about adding something addressing Content-Length in 
the FAQ, which I 1000% support.

-- 
~ acdw
acdw.net | breadpunk.club/~breadw

Link to individual message.

6. Katarina Eriksson (gmym (a) coopdot.com)

Ali Fardan <raiz at stellarbound.space> wrote:

> > Scenario 3: download a big file
>

[...]

Implementing this would be tricky on the server side, this would add
> unnecessary complexity,


Yes, the server would need to be able to accept and respond to other
connections while the file is transmitted. It also needs either be able to
run CGI scripts or have custom code for that endpoint.

what I would suggest instead if you really insist on using Gemini for large
> file transfers is specifying file size
> in the link label like so:
>
> => gemini://example.tld/large_file.bin Download (140 MB)
>
> and then have the client show how much of the file is downloaded, now
> that you know the file size, and know how much is downloaded, you don't
> need a progress indicator to be implemented within the protocol.  Of
> course this is on the client to implement, but wouldn't be tricky at
> all.
>

Foster an environment where capsule authors are good enough internet
citizens to always provide the file size on the page? Sure, that wouldn't
be tricky at
all.

Corrected:
Scenario 3.1: download a big file and the capsule author neglected to
provide a file size

> Scenario 4: the TLS session terminated cleanly but we don't have full
>
> confidence the download succeed
> >
> > Verify the file using the information we have (from scenario 3) or
> > present the user with the option to fetch a hash from the
> > standardized endpoint.
>
> This does not have to be embedded in the protocol, just do this:
>
> SHA1: ad7ff785f989c9ff9cec92bd3d0bab035a54e997
> SHA256: d2af944fe8b3af5cf5a9d8c3226d06ac6369d036fc591363dedc85344bc4daa7
> => gemini://example.tld/large_file.bin Download (140 MB)
>
> now the user can verify their downloaded file, this is done all the
> time even in HTTP land.
>

I would personally prefer the way we did it with FTP and provide a
large_file.bin.md5sum or now large_file.bin.sha1sum, but I would use
another protocol for large files.

-- 
Katarina

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201105/b785
3464/attachment.htm>

Link to individual message.

7. Ali Fardan (raiz (a) stellarbound.space)

On Thu, 5 Nov 2020 22:09:54 +0100
Katarina Eriksson <gmym at coopdot.com> wrote:
> Yes, the server would need to be able to accept and respond to other
> connections while the file is transmitted. It also needs either be
> able to run CGI scripts or have custom code for that endpoint.

Not only that, but how would a request sent separately tell the server
which connection of its dozen is the one that we want progress on?

That kills a nice feature of the protocol which is being able to serve
connections on a queue instead of forking for each connection because
once a transmission is over, the connection is over too, this
simplifies server code A LOT.

> Foster an environment where capsule authors are good enough internet
> citizens to always provide the file size on the page? Sure, that
> wouldn't be tricky at all.

If that becomes a guideline or an etiquette everyone should follow,
this is not an environment of division, lets take this mailing list as
an example, or for that matter, almost all mailing lists, plaintext
emails are a guideline, its a part of the culture, sending HTML email
can be seen as disrespectful at times, this is a result of social rules
that have evolved over time, I don't see how it would be any different
for Gemini.

Also, if capsule authors insist on serving large content over this
protocol they should make it not so annoying, the protocol should not
accommodate such people by serving them more features, if you want
HTTP, you know where to find it.

> Corrected:
> Scenario 3.1: download a big file and the capsule author neglected to
> provide a file size

Bummer.

> I would personally prefer the way we did it with FTP and provide a
> large_file.bin.md5sum or now large_file.bin.sha1sum

Brilliant, I don't see anything wrong with that.

> but I would use another protocol for large files.

Ok then, there's nothing to discuss further.

Link to individual message.

8. Drew DeVault (sir (a) cmpwn.com)

Weighing in after reading 1/3rd of the discussion and getting somewhat
tired of having it bubble back up to my inbox all the time:

TCP is pretty good. I've been tethered to my phone for a few days, and
I've had my connection drop out for minutes at a time, and my TCP
connections aren't dying.

=> /foo.mp3 foo.mp3 (14 MiB) 
SHA256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

This is good enough for me. Implement a progress viewer in your client
that shows the total amount downloaded so far and prints out the
checksum at the end and that'll do just fine.

Link to individual message.

9. Jordan (jordan (a) crowesnest.io)

On Thu Nov 5, 2020 at 5:02 PM EST, Ali Fardan wrote:
>
> > I would personally prefer the way we did it with FTP and provide a
> > large_file.bin.md5sum or now large_file.bin.sha1sum
>
> Brilliant, I don't see anything wrong with that.
>
> > but I would use another protocol for large files.
>
> Ok then, there's nothing to discuss further.

The client displaying and verifying the checksum seems like a way to go
as long as the host provides the checksum. Though, as Ali Fardan (I
think it was this person) stated another protocol would be safer and
easier to download the file.

Jordan

Link to individual message.

---

Previous Thread: Torture test 0026

Next Thread: Plain-text email