💾 Archived View for gemi.dev › gemini-mailing-list › 000197.gmi captured on 2024-06-16 at 12:48:53. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

"Wide load" status code(s)?

📧 Messages: 39
🗣️ Authors: 12
📅 First Message: 2020-06-10 07:32
📅 Last Message: 2020-06-12 22:48

1. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 07:32
📧 Message 1 of 39

Hey all,

Just throwing out a quick idea I had last night while trying to sleep,
to see how people feel about it.  It's simple and easily ignorable and I
think it's kind of neat.

It's been a small pain point for some people for a while now that there
is no way for a Gemini client to know how large a file it's downloading
is without simply downloading the whole thing.  This is inconvenient
from a UI perspective, as there is no way to display a progress bar, and
simple clients like AV-98 which simply download a complete file and then
pass it off to a handler program appear to "freeze" on large downloads
with no clear indication that anything is happening.  This is a much
bigger problem for people on limited machines (e.g. low memory diskless
systems are perfectly viable for reading text/gemini content and
displaying small images but not for downloading large binaries, but they
can't gracefully opt out of the big stuff and are forced to simply
terminate the connection once a threshold amount of content has been
downloaded, and then wastefully discard that partial content) or
internet connections (e.g. people at sea using satellite data plans
which are billed per megabyte).

People usually want to address this by having the server somehow declare
the file size upfront in the header, as per HTTP's "Content-Length", but
I've resisted that tooth and nail because there's no sensible way to do
it which doesn't turn the response header into an infinitely extensible
thing which people can tack their own variables onto, leaving us no
better off than HTTP (so, please, no more ideas on this front)...

I was very happy to realise that a lot of these problems can be solved,
or at least ameliorated, in a very simple way with an additional 2x
status code.  Since I plan to deprecate the current 21 code for ending a
transient certificate session, we could reuse 21 to mean "SUCCESS, a
response body follows, and it's larger than $THRESHOLD MiB" (note I am
proposing no change to the existing code 20 - 20 does NOT mean "what's
coming in less than $THRESHOLD").  Simple clients could simply treat 21
as 20 and be in exactly the situation they are in now, so the graceful
degredation of status codes to their x0 form works nicely here.  But
e.g. AV-98 could print "Downloading large file, please wait..." upon
receving a 21, and then proceed as usual.  This is a very low effort
client change, but it solves the problem of people thinking something
has gone wrong when they unknowingly start a large download.  More
importantly, users in resource-limited environments could use clients
which simply terminate the connection immediately on seeing a header
starting with 21, providing a quick and low-waste "opt out" of large
content.

On the server side, again, a dirt simple server could just always serve
up 20 without actually breaking anything.  I realise that knowing
whether to use 20 or 21 for dynamically generated content may not be
straightforward - no problem, it is *always* valid to just send 20 when
in doubt.  21 is nothing more than a helpful hint to clients who might
need it.  It doesn't need to be 100% reliable to still have value.

This feels like a good idea to me.  It's totally optional and very easy
to handle on both the client and server sides, and I feel like being
friendly to small/slow computers and slow/intermittent network
connections is a good fit with the overall "vibe" of Gemini, provided
doing so does not conflict with overall simplicity.

Naturally, deciding to do this will lead immediately to a weeks-long
heated debate on what the appropriate value of $THRESHOLD should be.  We

	could* wade into those waters, but I'll just also throw out that we

could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
100MiB respectively and leave it at that.  Clients targetting
resource-limited environments could let their users configure their own
threshold for early termination of downloads.

Cheers,
Solderpunk

Link to individual message.

2. Koushik Roy (koushik (a) meff.me)

📅 Sent: 2020-06-10 07:41
📧 Message 2 of 39

Sorry replying to the list address this time...

How about letting the server decide? I can imagine a scenario where the 
server defaults to sending text/* mime types as 20 responses and 
everything else as 21 responses. Then servers can let configuration or 
(s)CGI output determine how to decide on 20 vs 21. While leaving it up 
to the server is subjective, I think most content authors understand 
whether something is meant to be quickly digested and rendered or 
whether something should be downloaded/queued/status line shown.

- meff


On 6/10/20 12:32 AM, solderpunk wrote:
> Hey all,
> 
> Just throwing out a quick idea I had last night while trying to sleep,
> to see how people feel about it.  It's simple and easily ignorable and I
> think it's kind of neat.
> 
> It's been a small pain point for some people for a while now that there
> is no way for a Gemini client to know how large a file it's downloading
> is without simply downloading the whole thing.  This is inconvenient
> from a UI perspective, as there is no way to display a progress bar, and
> simple clients like AV-98 which simply download a complete file and then
> pass it off to a handler program appear to "freeze" on large downloads
> with no clear indication that anything is happening.  This is a much
> bigger problem for people on limited machines (e.g. low memory diskless
> systems are perfectly viable for reading text/gemini content and
> displaying small images but not for downloading large binaries, but they
> can't gracefully opt out of the big stuff and are forced to simply
> terminate the connection once a threshold amount of content has been
> downloaded, and then wastefully discard that partial content) or
> internet connections (e.g. people at sea using satellite data plans
> which are billed per megabyte).
> 
> People usually want to address this by having the server somehow declare
> the file size upfront in the header, as per HTTP's "Content-Length", but
> I've resisted that tooth and nail because there's no sensible way to do
> it which doesn't turn the response header into an infinitely extensible
> thing which people can tack their own variables onto, leaving us no
> better off than HTTP (so, please, no more ideas on this front)...
> 
> I was very happy to realise that a lot of these problems can be solved,
> or at least ameliorated, in a very simple way with an additional 2x
> status code.  Since I plan to deprecate the current 21 code for ending a
> transient certificate session, we could reuse 21 to mean "SUCCESS, a
> response body follows, and it's larger than $THRESHOLD MiB" (note I am
> proposing no change to the existing code 20 - 20 does NOT mean "what's
> coming in less than $THRESHOLD").  Simple clients could simply treat 21
> as 20 and be in exactly the situation they are in now, so the graceful
> degredation of status codes to their x0 form works nicely here.  But
> e.g. AV-98 could print "Downloading large file, please wait..." upon
> receving a 21, and then proceed as usual.  This is a very low effort
> client change, but it solves the problem of people thinking something
> has gone wrong when they unknowingly start a large download.  More
> importantly, users in resource-limited environments could use clients
> which simply terminate the connection immediately on seeing a header
> starting with 21, providing a quick and low-waste "opt out" of large
> content.
> 
> On the server side, again, a dirt simple server could just always serve
> up 20 without actually breaking anything.  I realise that knowing
> whether to use 20 or 21 for dynamically generated content may not be
> straightforward - no problem, it is *always* valid to just send 20 when
> in doubt.  21 is nothing more than a helpful hint to clients who might
> need it.  It doesn't need to be 100% reliable to still have value.
> 
> This feels like a good idea to me.  It's totally optional and very easy
> to handle on both the client and server sides, and I feel like being
> friendly to small/slow computers and slow/intermittent network
> connections is a good fit with the overall "vibe" of Gemini, provided
> doing so does not conflict with overall simplicity.
> 
> Naturally, deciding to do this will lead immediately to a weeks-long
> heated debate on what the appropriate value of $THRESHOLD should be.  We
> *could* wade into those waters, but I'll just also throw out that we
> could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> 100MiB respectively and leave it at that.  Clients targetting
> resource-limited environments could let their users configure their own
> threshold for early termination of downloads.
> 
> Cheers,
> Solderpunk
>

Link to individual message.

3. Felix Queißner (felix (a) masterq32.de)

📅 Sent: 2020-06-10 10:33
📧 Message 3 of 39

Heya!

> This feels like a good idea to me.  It's totally optional and very easy
> to handle on both the client and server sides, and I feel like being
> friendly to small/slow computers and slow/intermittent network
> connections is a good fit with the overall "vibe" of Gemini, provided
> doing so does not conflict with overall simplicity.
Yes, i totally like that idea!

> Naturally, deciding to do this will lead immediately to a weeks-long
> heated debate on what the appropriate value of $THRESHOLD should be.  We
> *could* wade into those waters, but I'll just also throw out that we
> could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> 100MiB respectively and leave it at that.  Clients targetting
> resource-limited environments could let their users configure their own
> threshold for early termination of downloads.
I had the exact same thoughts, and i think the idea of "wasting" 3
status codes for 1, 10 and 100 MB are totally okay. It allows clients to
better display a loading indication

Regards
- xq

Link to individual message.

4. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-06-10 10:36
📧 Message 4 of 39

I like the spirit of this idea a lot - Gemini has an opportunity to do
a lot more for users in resource-limited environments, and in addition
to the explicit austerity in the protocol itself, this is another way
to proactively respect users' resources and time.

What I feel less excited about is the specification of a hard-coded
$THRESHOLD. It feels like a magic number that's not going to fit all
situations well - adding three of them like you brought up at the end
improves the situation, but nevertheless still feels like a magic
number solution. And depending on how future-proof you want these
$THRESHOLDs to be, no matter how good the magic numbers are today, as
years pass and internet access/quality across the world changes, for
better or worse, the magic numbers will become more and more out of
date.

The only way I see to make it not a magic number is to allow clients
to specify a $THRESHOLD as part of their request - that, however,
feels like too big of a change to introduce to the Request structure
(even if we could make it gracefully degrade).

So, my conclusions are:

A) I love the idea
B) I don't love the design
C) I worry there may be no better design

And I would support speccing this if no better design arrives, because
it is a meaningful issue to solve!

Nat

Link to individual message.

5. Petite Abeille (petite.abeille (a) gmail.com)

📅 Sent: 2020-06-10 11:16
📧 Message 5 of 39

> On Jun 10, 2020, at 09:32, solderpunk <solderpunk at SDF.ORG> wrote:
> 
> are forced to simply terminate the connection once a threshold amount of 
content has been downloaded

Is it really a problem? It should be user-agents prerogative to drop the 
connection anytime they see fit. And/or only handle a small subset of 
media type (e.g. text/* only). Ditto for showing network activities. Even 
the simplest of client could count how many bytes it has read so far, no?

Link to individual message.

6. Tadeusz Sosnierz (tadeusz (a) sosnierz.com)

📅 Sent: 2020-06-10 11:16
📧 Message 6 of 39


2020-06-10 12:36 GMT+02:00 Natalie Pendragon<natpen at natpen.net>:
> What I feel less excited about is the specification of a hard-coded
> $THRESHOLD. It feels like a magic number that's not going to fit all
> situations well - adding three of them like you brought up at the end
> improves the situation, but nevertheless still feels like a magic
> number solution. And depending on how future-proof you want these
> $THRESHOLDs to be, no matter how good the magic numbers are today, as
> years pass and internet access/quality across the world changes, for
> better or worse, the magic numbers will become more and more out of
> date.

I feel the same way; sounds like no matter what you pick will become the
"64 kilobytes" of tomorrow's jokes. And ultimately it doesn't allow the
introduction of a meaningful progress indicator. Is 100MBs a lot? Is it a long wait?
I don't know, is the upstream server fast? Is my wifi having a good day?

The alternative is tricky to come up without making the response structure arbitrary
and complex. Here's a take though: anything bigger than a megabyte or ten 
is realistically
not going to be text, or anything text-like that can be displayed inline. 
I can think of images,
videos, maybe PDFs or just a bunch of encrypted data in the form of whatever.
I wonder if it'd make sense to have a status code that indicates
?mimetype is basically meaningless, it's a big whatever,
so I'll give you the content length instead?.

Client could then choose to receive a few more bytes and check a magic byte for something
it recognizes ? or just prompt the client saying ?that's not exactly 
something we can display anyway,
(how) do you want it saved??

--
tadzik

Link to individual message.

7. Thomas Karpiniec (tkarpiniec (a) icloud.com)

📅 Sent: 2020-06-10 12:16
📧 Message 7 of 39

On Wed, Jun 10, 2020 at 07:32:16AM +0000, solderpunk wrote:
> Naturally, deciding to do this will lead immediately to a weeks-long
> heated debate on what the appropriate value of $THRESHOLD should be.  We
> *could* wade into those waters, but I'll just also throw out that we
> could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> 100MiB respectively and leave it at that.  Clients targetting
> resource-limited environments could let their users configure their own
> threshold for early termination of downloads.

That's the real question isn't it? I don't feel strongly about this
either way but I'll share a couple of thoughts.

One place I previously used gopher was IP routed over VHF packet
radio. These links are 1200 baud simplex with an effective throughput
of some 80 B/s. I'm not saying gemini can or should care what weirdos
are doing with VHF radios, but it may one day find use in scenarios
where quantities much less then 1 MB matter. If you're using such a
slow link, you might have to waste quite a lot of time to realise that
you're getting an above-average amount of content, reducing the
effectiveness of a client-side threshold.

One way to offer more flexibility could be to use the second digit of
the "2n" response code as saying "this content >= 10^n bytes".

But I would raise no objections to maintaining the status quo, or to
adopting the three codes suggested here.

Cheers, Tom

Link to individual message.

8. Matthew Graybosch (hello (a) matthewgraybosch.com)

📅 Sent: 2020-06-10 13:49
📧 Message 8 of 39

On Wed, 10 Jun 2020 07:32:16 +0000
solderpunk <solderpunk at SDF.ORG> wrote:

> Hey all,
> 
> It's been a small pain point for some people for a while now that
> there is no way for a Gemini client to know how large a file it's
> downloading is without simply downloading the whole thing.

I agree that it's important to let people using Gemini clients know how
big a file they're about to download is, and I think the status codes
for indicating large, huge, and colossal files are a good idea.

This inspired me to manually add file sizes to my link descriptions for
any file I host on my own capsules as a courtesy to visitors so that
people can decide for themselves whether they want to open a particular
link. I've done the same for my capsules' atom feeds so that people
visiting CAPCOM can also see how large my files are.

I won't presume to recommend that everybody do this, even though it
would be nice to see my approach become a convention. It takes time to
go through your directory tree and get file sizes even if you're just
using shell commands like "ls -hal", but it's a low-tech approach I can
implement today instead of waiting for client and server developers to
catch up.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

9. Peter Vernigorov (pitr.vern (a) gmail.com)

📅 Sent: 2020-06-10 16:20
📧 Message 9 of 39

Although this makes sense at first glance, I believe that a status code is
too late to let user know that the file is large. Bytes are already being
transferred and received by user?s OS, even if client is not reading them
yet. By the time any of this information is displayed to the user, and the
time it takes them to react, damage has already been done. I think the only
sensible solution is for pages to display size information to user next to
the link.

On Wed, Jun 10, 2020 at 09:32 solderpunk <solderpunk at sdf.org> wrote:

> Hey all,
>
> Just throwing out a quick idea I had last night while trying to sleep,
> to see how people feel about it.  It's simple and easily ignorable and I
> think it's kind of neat.
>
> It's been a small pain point for some people for a while now that there
> is no way for a Gemini client to know how large a file it's downloading
> is without simply downloading the whole thing.  This is inconvenient
> from a UI perspective, as there is no way to display a progress bar, and
> simple clients like AV-98 which simply download a complete file and then
> pass it off to a handler program appear to "freeze" on large downloads
> with no clear indication that anything is happening.  This is a much
> bigger problem for people on limited machines (e.g. low memory diskless
> systems are perfectly viable for reading text/gemini content and
> displaying small images but not for downloading large binaries, but they
> can't gracefully opt out of the big stuff and are forced to simply
> terminate the connection once a threshold amount of content has been
> downloaded, and then wastefully discard that partial content) or
> internet connections (e.g. people at sea using satellite data plans
> which are billed per megabyte).
>
> People usually want to address this by having the server somehow declare
> the file size upfront in the header, as per HTTP's "Content-Length", but
> I've resisted that tooth and nail because there's no sensible way to do
> it which doesn't turn the response header into an infinitely extensible
> thing which people can tack their own variables onto, leaving us no
> better off than HTTP (so, please, no more ideas on this front)...
>
> I was very happy to realise that a lot of these problems can be solved,
> or at least ameliorated, in a very simple way with an additional 2x
> status code.  Since I plan to deprecate the current 21 code for ending a
> transient certificate session, we could reuse 21 to mean "SUCCESS, a
> response body follows, and it's larger than $THRESHOLD MiB" (note I am
> proposing no change to the existing code 20 - 20 does NOT mean "what's
> coming in less than $THRESHOLD").  Simple clients could simply treat 21
> as 20 and be in exactly the situation they are in now, so the graceful
> degredation of status codes to their x0 form works nicely here.  But
> e.g. AV-98 could print "Downloading large file, please wait..." upon
> receving a 21, and then proceed as usual.  This is a very low effort
> client change, but it solves the problem of people thinking something
> has gone wrong when they unknowingly start a large download.  More
> importantly, users in resource-limited environments could use clients
> which simply terminate the connection immediately on seeing a header
> starting with 21, providing a quick and low-waste "opt out" of large
> content.
>
> On the server side, again, a dirt simple server could just always serve
> up 20 without actually breaking anything.  I realise that knowing
> whether to use 20 or 21 for dynamically generated content may not be
> straightforward - no problem, it is *always* valid to just send 20 when
> in doubt.  21 is nothing more than a helpful hint to clients who might
> need it.  It doesn't need to be 100% reliable to still have value.
>
> This feels like a good idea to me.  It's totally optional and very easy
> to handle on both the client and server sides, and I feel like being
> friendly to small/slow computers and slow/intermittent network
> connections is a good fit with the overall "vibe" of Gemini, provided
> doing so does not conflict with overall simplicity.
>
> Naturally, deciding to do this will lead immediately to a weeks-long
> heated debate on what the appropriate value of $THRESHOLD should be.  We
> *could* wade into those waters, but I'll just also throw out that we
> could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> 100MiB respectively and leave it at that.  Clients targetting
> resource-limited environments could let their users configure their own
> threshold for early termination of downloads.
>
> Cheers,
> Solderpunk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200610/d501
e176/attachment.htm>

Link to individual message.

10. Matthew Graybosch (hello (a) matthewgraybosch.com)

📅 Sent: 2020-06-10 16:54
📧 Message 10 of 39

On Wed, 10 Jun 2020 18:20:03 +0200
Peter Vernigorov <pitr.vern at gmail.com> wrote:

> Although this makes sense at first glance, I believe that a status
> code is too late to let user know that the file is large. Bytes are
> already being transferred and received by user's OS, even if client
> is not reading them yet. By the time any of this information is
> displayed to the user, and the time it takes them to react, damage
> has already been done. I think the only sensible solution is for
> pages to display size information to user next to the link.

I agree, but I suspect that unless content authors take the time to
provide this information themselves displaying size info next to links
will require multiple rounds of communication between the user agent
and the server just to query file sizes. That probably isn't what
solderpunk, etc. had in mind.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg, PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

11. Luke Emmet (luke (a) marmaladefoo.com)

📅 Sent: 2020-06-10 19:04
📧 Message 11 of 39

On 10-Jun-2020 13:16, Thomas Karpiniec wrote:
> One way to offer more flexibility could be to use the second digit of
> the "2n" response code as saying "this content>= 10^n bytes".
>
> But I would raise no objections to maintaining the status quo, or to
> adopting the three codes suggested here.
This seems nice and future proof, but uses up a lot of status codes 
potentially (29 takes us up to 10^9)

However, it seems we are making a lot of effort to find the wrong way to 
communicate important information about download size. In any networked 
application this information is really useful for clients and users to 
make good choices about whether to wait for something to come or not. If 
we can have lang and mime-type in the response for 20 response, is it 
really that philosophically disturbing to include this important 
information that we know makes network bandwidth negotiation better for 
everyone?

If we have to, I think it could be a client option:

[ ] limit non-text content to 5Mb per request
[x] download all text/* content

Best Wishes

  - Luke

Link to individual message.

12. Luke Emmet (luke (a) marmaladefoo.com)

📅 Sent: 2020-06-10 19:20
📧 Message 12 of 39

On 10-Jun-2020 14:49, Matthew Graybosch wrote:
>
> I agree that it's important to let people using Gemini clients know how
> big a file they're about to download is, and I think the status codes
> for indicating large, huge, and colossal files are a good idea.
>
> This inspired me to manually add file sizes to my link descriptions for
> any file I host on my own capsules as a courtesy to visitors so that
> people can decide for themselves whether they want to open a particular
> link. I've done the same for my capsules' atom feeds so that people
> visiting CAPCOM can also see how large my files are.
>
> I won't presume to recommend that everybody do this, even though it
> would be nice to see my approach become a convention. It takes time to
> go through your directory tree and get file sizes even if you're just
> using shell commands like "ls -hal", but it's a low-tech approach I can
> implement today instead of waiting for client and server developers to
> catch up.
>
Again, the client can do a lot of work for you in this regard. As we all 
know mostly the heavy content is going to be images and things like 
linked mp3, zip, pdf etc.

Generally speaking a client can examine the URL and make an educated 
assumption about the target mime type from the file extension. Such 
links can be decorated or hinted to the user who may or may not decide 
to download them.

In the upcoming relese of GemiNaut I have implemented a simple 
decoration scheme that lets users infer this form the link. This 
decoration is added by the client irrespective of the server.

=> /normal/path Normal gemini link

These can generally can be assumed to be text/gmi or maybe text/*, no 
extra decoration, so displayed like this:

? Normal gemini link

=> /path/to/file.png Link to an image

these can be decorated to hint this to the user, so what you actually 
see is this. The icon hints at the content you will likely get.

? ? Link to an image

Do I want to click on all links to images? Probably not for various 
reasons including bandwidth/performance, and the client helped me make 
that judgement.

At the moment I have an opt in list of the most likely file types that 
might be linked to:

[png gif jpg etc] are images

[mp3 mov pdf zip gz] are decorated as "binary" files

This gives the user a good idea what they will get and helps make a 
choice to click on the link or not

Best wishes

- Luke

Link to individual message.

13. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 19:47
📧 Message 13 of 39

On Wed, Jun 10, 2020 at 06:36:05AM -0400, Natalie Pendragon wrote:

> What I feel less excited about is the specification of a hard-coded
> $THRESHOLD. It feels like a magic number that's not going to fit all
> situations well - adding three of them like you brought up at the end
> improves the situation, but nevertheless still feels like a magic
> number solution. And depending on how future-proof you want these
> $THRESHOLDs to be, no matter how good the magic numbers are today, as
> years pass and internet access/quality across the world changes, for
> better or worse, the magic numbers will become more and more out of
> date.

Hmm.  I'll admit I didn't think about this at all, so good job flagging
it!  That said, I'm not sure this is a problem for us.  Gemini's lack
of caching, lack of compression, and lack of resumable downloads all
mean it's *never* going to be sensible choice for really big downloads,
no matter how fast internet speeds get.  I don't see a future where
Gemini clients wanting to choose between thresholds of 100MiB, 1GiB and
10GiB are sensible.

Cheers,
Solderpunk

Link to individual message.

14. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 19:57
📧 Message 14 of 39

On Wed, Jun 10, 2020 at 01:16:12PM +0200, Petite Abeille wrote:
> 
> 
> > On Jun 10, 2020, at 09:32, solderpunk <solderpunk at SDF.ORG> wrote:
> > 
> > are forced to simply terminate the connection once a threshold amount 
of content has been downloaded
> 
> Is it really a problem? It should be user-agents prerogative to drop the 
connection anytime they see fit. And/or only handle a small subset of 
media type (e.g. text/* only). Ditto for showing network activities. Even 
the simplest of client could count how many bytes it has read so far, no?
> 

Yes, of course, the client can always drop the connection whenever,
whyever.  But if your goal is to conserve limited or expensive network
traffic, being able to sever the connection immediately after seeing the
first two bytes of the header will be much more effective than
downloading the first MiB of data and then saying "Nope, I can't afford
to finish this" and then throwing away that already-downloaded MiB.

As for counting bytes, in fact AV-98, after parsing the header, simply
reads from the socket until EOF with a single line:

`body = f.read()`

Replacing this with a loop to make smaller reads, calculate their
length, update a counter, and append them to a buffer is much more work
than simply printing a single "This may take a while..." statement after
parsing the header.

Cheers,
Solderpunk

Link to individual message.

15. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 20:00
📧 Message 15 of 39

On Wed, Jun 10, 2020 at 10:16:03PM +1000, Thomas Karpiniec wrote:

> One place I previously used gopher was IP routed over VHF packet
> radio. These links are 1200 baud simplex with an effective throughput
> of some 80 B/s. I'm not saying gemini can or should care what weirdos
> are doing with VHF radios, but it may one day find use in scenarios
> where quantities much less then 1 MB matter. If you're using such a
> slow link, you might have to waste quite a lot of time to realise that
> you're getting an above-average amount of content, reducing the
> effectiveness of a client-side threshold.

In general I'm very happy to entertain the needs of weirdos doing
non-conventional networking where it doesn't conflict with wider goals.

That said, the unavoidable TLS overhead in Gemini means it's unlikely to
be a good match for super slow network scenrios like packet radio.  We
might be able to squeeze it down by pushing adoption of compact
signature algorithms and encouraging use of TLS 1.3's session
resumption, but in general there will be kilobytes of overhead going on.
If quantities much less than 1 MiB matter, I don't think any status
codes are going to make Gemini feasible.

Cheers,
Solderpunk

Link to individual message.

16. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 20:03
📧 Message 16 of 39

On Wed, Jun 10, 2020 at 09:49:14AM -0400, Matthew Graybosch wrote:

> I won't presume to recommend that everybody do this, even though it
> would be nice to see my approach become a convention.

I was going to quip that I had already recommended everybody do this in
the Best Practices document and moan once again that nobody ever reads
it...then and checked and, whoops, it's not in there!  How'd that
happen?

I think it is definitely polite to explicitly indicate file sizes for
anything larger than, I dunno, 10 MiB or thereabouts?

Relatedly, as of a semi-recent update, the automatically generated
directory listings produced by Molly Brown include file sizes.

Cheers,
Solderpunk

Link to individual message.

17. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 20:07
📧 Message 17 of 39

On Wed, Jun 10, 2020 at 06:20:03PM +0200, Peter Vernigorov wrote:
> Although this makes sense at first glance, I believe that a status code is
> too late to let user know that the file is large. Bytes are already being
> transferred and received by user?s OS, even if client is not reading them
> yet. By the time any of this information is displayed to the user, and the
> time it takes them to react, damage has already been done. I think the only
> sensible solution is for pages to display size information to user next to
> the link.

It's true, and a good point, that stuff may be buffering up in kernel
space even when the client has only read a little bit of it.  If the
idea was to present a prompt to the user, I would agree this may be of
little use.  But if the client itself peeks at the first two bytes and
then immediately closes the connection, I would have thought that would
be quick enough to make a worthwhile impact.  But I admit I'm not too
sure of that.  Maybe actual real world testing would be in order...

Cheers,
Solderpunk

Link to individual message.

18. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 20:14
📧 Message 18 of 39

On Wed, Jun 10, 2020 at 08:04:54PM +0100, Luke Emmet wrote:

> If we can
> have lang and mime-type in the response for 20 response, is it really that
> philosophically disturbing to include this important information that we
> know makes network bandwidth negotiation better for everyone?

Important distinction: we don't have lang *and* MIME type, we have MIME
type.  The text/gemini MIME type has a lang parameter defined.  We're
allowed to define parameters on text/gemini because it's our type (we
can't prescribe parameters on any other text/* type, which by the way is
the answer to some of the questions in
gemini://mozz.us/journal/2020-06-08.gmi).

Adding a content-size parameter to text/gemini *is* deeply
philosophically disturbing because MIME types are, well, *types*.
Categories.  It makes no sense whatsoever to put details specific to a
particular token as part of type declaration.

So the only way to add an analogue of content-length would be to send a
MIME type *and* content-length, and then we need some kind of delmiter.
For the sake of argument, let's say a tab character.

So you send <MIME><TAB><CONTENT-LENGTH> and then, boom, next week it's
<MIME><TAB><CONTENT-LENGTH><TAB><SOMETHING-ELSE>, and it never stops.
Exactly one parameter is a stable, defensible position (what Gemini has
now).  Arbitrarily many delimited parameters is a stable, defensible
position (what the web has now).  Anything in between is at very real
risk, IMHO, of mutating into the latter over time.  Once a delimiter
comes along, it's impossible to stop subsequent expansions.

> If we have to, I think it could be a client option:

> [ ] limit non-text content to 5Mb per request
> [x] download all text/* content

Well, this second option is actually already totally possible.  As far
as I know, it's not implemented by anybody yet.  I kind of like the
idea, I may add it to AV-98.

Cheers,
Solderpunk

Link to individual message.

19. Matthew Graybosch (hello (a) matthewgraybosch.com)

📅 Sent: 2020-06-10 21:15
📧 Message 19 of 39

On Wed, 10 Jun 2020 20:03:37 +0000
solderpunk <solderpunk at SDF.ORG> wrote:

> I was going to quip that I had already recommended everybody do this
> in the Best Practices document and moan once again that nobody ever
> reads it...then and checked and, whoops, it's not in there!  How'd
> that happen?

Maybe you had more pressing concerns?

> I think it is definitely polite to explicitly indicate file sizes for
> anything larger than, I dunno, 10 MiB or thereabouts?

I'd suggest warning for 1MB because I doubt that most text content
served on Gemini protocol would exceed 100KB per file, and because
section 3.3 of the spec explicitly states that servers are not supposed
to compress content before sending it down the pipe. I mean, I'd love to
see more writers adopt Gemini for long-form work, but I think most
gemlog posts range from 250-2000 words, which might be 25KB at most.

I looked at the markdown file for my trunk novel, and it weighs in at
1.6MB for 289,000 words. I shudder to think of what one might find in a
text file weighing 10MB or more. I'm thinking of something like the
collected works of Leo Tolstoy or a partial dump of Equifax's data
as of 2020.

Of course, 1.6MB is fine on a decent broadband connection, and even
10MB, but I still remember what it was like to try install Gentoo from
stage one (hundreds of megabytes of code) on a dialup connection. It
wasn't fun.

> Relatedly, as of a semi-recent update, the automatically generated
> directory listings produced by Molly Brown include file sizes.

That's sure to come in handy.

-- 
Matthew Graybosch		https://www.matthewgraybosch.com
#include <disclaimer.h>		gemini://starbreaker.org
Harrisburg, PA			gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

20. Sean Conner (sean (a) conman.org)

📅 Sent: 2020-06-10 21:22
📧 Message 20 of 39

It was thus said that the Great solderpunk once stated:
> On Wed, Jun 10, 2020 at 09:49:14AM -0400, Matthew Graybosch wrote:
>  
> Relatedly, as of a semi-recent update, the automatically generated
> directory listings produced by Molly Brown include file sizes.

  I just added that to GLV-1.12556.  And because of the way I signal CGI and
SCGI scripts, I can also detect those as well.

Index of private/
---------------------------

=> Welcome.txt  Welcome.txt (352 bytes)
=> cgi-sample   cgi-sample (CGI script)
=> christmas-carol.txt  christmas-carol.txt (187995 bytes)
=> mondrian.gif mondrian.gif (3068 bytes)
=> scgi-sample  scgi-sample (SCGI script)

---------------------------
GLV-1.12556

  -spc

Link to individual message.

21. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-10 21:25
📧 Message 21 of 39

On Wed, Jun 10, 2020 at 05:15:38PM -0400, Matthew Graybosch wrote:

> I'd suggest warning for 1MB because I doubt that most text content
> served on Gemini protocol would exceed 100KB per file, and because
> section 3.3 of the spec explicitly states that servers are not supposed
> to compress content before sending it down the pipe.

Just to be clear: servers can use MIME types like application/gzip or
application/zip to serve compressed content if they want to.  There's
just no way to do what HTTP allows, to send compressed data while
specifying both the compression *and* the underlying media type.

Cheers,
Solderpunk

Link to individual message.

22. Sean Conner (sean (a) conman.org)

📅 Sent: 2020-06-10 22:21
📧 Message 22 of 39

It was thus said that the Great solderpunk once stated:
> Hey all,
> 
> Just throwing out a quick idea I had last night while trying to sleep,
> to see how people feel about it.  It's simple and easily ignorable and I
> think it's kind of neat.

  [ snip ]

> Naturally, deciding to do this will lead immediately to a weeks-long
> heated debate on what the appropriate value of $THRESHOLD should be.  We
> *could* wade into those waters, but I'll just also throw out that we
> could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> 100MiB respectively and leave it at that.  Clients targetting
> resource-limited environments could let their users configure their own
> threshold for early termination of downloads.

  I'm replying here because I think this is the best play to reply with my
thoughts.  I have read the rest of this thread and will be referencing some
later emails.  You have been warned.

  Dispite being the one who pushed for larger status codes, I'm not a fan of
this proposal, but I can't fully explain *why* other than to say "where does
it stop?"  Like Tadeusz Sosnierz alluded too, what's huge today may be small
tomorrow [1].  Personally, I think adding the filesize to the MIME type *is*
the best answer, but I can see and even agree with the arguemnts against it.
And I think I'm justified in saying that [2].

  Petite Abeille has listed the options that are open today, but solderpunk
didn't like the suggestion(s) as they might complicate the client.  But a
client that simply reads the entire response with a single call, while
simple, is a problem waiting to happen.  What if the response doesn't fit
into memory?  It may be reasonable to say that "text/*" can fit into memory,
but video/*?  image/*?  (seriously, I have a 306MB beautiful image of the
moon, probably from NASA).

  I'm not sure what the best solution is though.

  -spc

[1]	The first real editor I used was 40k in size.  Today, there are
	people who use editors that consume over 1G of RAM when running.

[2]	Visual pun, but you have to use a monosopace font to see it.

Link to individual message.

23. Petite Abeille (petite.abeille (a) gmail.com)

📅 Sent: 2020-06-10 22:43
📧 Message 23 of 39

> On Jun 10, 2020, at 23:22, Sean Conner <sean at conman.org> wrote:
> 
> => Welcome.txt  Welcome.txt (352 bytes)
> => cgi-sample   cgi-sample (CGI script)
> => christmas-carol.txt  christmas-carol.txt (187995 bytes)
> => mondrian.gif mondrian.gif (3068 bytes)
> => scgi-sample  scgi-sample (SCGI script)

Cool, if not machine readable.

Perhaps would benefit from a human touch ala ls -h:

When used with the -l option, use unit suffixes: Byte, Kilobyte,
Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
number of digits to three or less using base 2 for sizes.

So gemini answer to metadata is to emulate ls in text/gemini? :P

Link to individual message.

24. Petite Abeille (petite.abeille (a) gmail.com)

📅 Sent: 2020-06-10 22:52
📧 Message 24 of 39



> On Jun 10, 2020, at 21:57, solderpunk <solderpunk at SDF.ORG> wrote:
> 
> As for counting bytes, in fact AV-98, after parsing the header, simply
> reads from the socket until EOF with a single line:

surely the mighty python ecosystem must have the equivalent of  something 
like pipe viewer, no?

http://www.ivarch.com/programs/pv.shtml

Link to individual message.

25. Sean Conner (sean (a) conman.org)

📅 Sent: 2020-06-10 23:17
📧 Message 25 of 39

It was thus said that the Great Petite Abeille once stated:
> 
> 
> > On Jun 10, 2020, at 23:22, Sean Conner <sean at conman.org> wrote:
> > 
> > => Welcome.txt  Welcome.txt (352 bytes)
> > => cgi-sample   cgi-sample (CGI script)
> > => christmas-carol.txt  christmas-carol.txt (187995 bytes)
> > => mondrian.gif mondrian.gif (3068 bytes)
> > => scgi-sample  scgi-sample (SCGI script)
> 
> Cool, if not machine readable.

  Okay.  The above is the format for GLV-1.12556.  I found two other pages
(from a quick search) that also include file sizes:

Via <gemini://gemini.circumlunar.space/docs/>

# Directory listing

=> / ..
=> best-practices.gmi best-practices.gmi                             5 KiB   Jun  6 2020
=> faq.gmi faq.gmi                                       17 KiB   Jun  7 2020
=> specification.gmi specification.gmi                             30 KiB   Jun  7 2020

And via <gemini://gemini.circumlunar.space/capcom/>

## 2020-06-10

=> gemini://demifiend.org/journal/2020/i-wanted-to-like-mate-too.gemini 
demifiend - I Wanted to Like MATE, Too (2.2K)
=> gemini://demifiend.org/journal/2020/exit-bug-reports-loyalty.gemini 
demifiend - Exit, Bug Reports, and Loyalty (4.1K)
=> gemini://gemini.circumlunar.space/~shufei/phlog/Shufei-ThisAndThat-Weiph
log.gmi Shufei?s Gmiphlog - The ?phlog (Weiphlog)
=> gemini://demifiend.org/journal/2020/choose-life.gemini demifiend - 
Choose Life; or, The Problem with Video Games (12K)
=> gemini://acidic.website/musings/npr-bridge.gmi Musings of Meff - NPR 
Text Portal and Spec Changes

> Perhaps would benefit from a human touch ala ls -h:

  Perhaps.

> So gemini answer to metadata is to emulate ls in text/gemini? :P

  Appears to be so.

  -spc

Link to individual message.

26. Ivy Foster (escondida (a) iff.ink)

📅 Sent: 2020-06-11 05:52
📧 Message 26 of 39

On 10 Jun 2020, at  6:21 pm -0400, Sean Conner wrote:
> It was thus said that the Great solderpunk once stated:
> > Just throwing out a quick idea I had last night while trying to sleep,
> > to see how people feel about it.  It's simple and easily ignorable and I
> > think it's kind of neat.
> 
>   [ snip ]
> 
> > Naturally, deciding to do this will lead immediately to a weeks-long
> > heated debate on what the appropriate value of $THRESHOLD should be.  We
> > *could* wade into those waters, but I'll just also throw out that we
> > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or
> > 100MiB respectively and leave it at that.  Clients targetting
> > resource-limited environments could let their users configure their own
> > threshold for early termination of downloads.

>   Dispite being the one who pushed for larger status codes, I'm not a fan of
> this proposal, but I can't fully explain *why* other than to say "where does
> it stop?"  Like Tadeusz Sosnierz alluded too, what's huge today may be small
> tomorrow [1].  Personally, I think adding the filesize to the MIME type *is*
> the best answer, but I can see and even agree with the arguemnts against it.
> And I think I'm justified in saying that [2].

I'm aware that this is a very contentious point, but just to throw a
wrench into things...there actually is [an RFC precedent][rfc1341] for
including size as a parameter to a MIME type, even though MIME type is
generally intended to be a classification. Granted, it's intended
specifically for external message bodies, so the user can decide
whether it's worth their while to download...but isn't an external
message body ultimately what any arbitrary file you fetch is anyway?

[rfc1341]: https://tools.ietf.org/html/rfc1341

I totally get it if the answer's still no to a size parameter; the
"where does it end?" argument is a strong one. However, a precedent
for including a very useful piece of information as a MIME type
parameter is still something to consider.

Cheers,
Ivy

Link to individual message.

27. Petite Abeille (petite.abeille (a) gmail.com)

📅 Sent: 2020-06-11 11:49
📧 Message 27 of 39

> On Jun 10, 2020, at 21:57, solderpunk <solderpunk at SDF.ORG> wrote:
> 
> Yes, of course, the client can always drop the connection whenever,
> whyever.  But if your goal is to conserve limited or expensive network
> traffic, being able to sever the connection immediately after seeing the
> first two bytes of the header will be much more effective than
> downloading the first MiB of data and then saying "Nope, I can't afford
> to finish this" and then throwing away that already-downloaded MiB.

Perhaps servers could introduce a slight delay between sending the status 
line and the content itself.

That way, a client may have enough time to drop the connection before the 
server has actually started to saturate the link with content.

This may address Peter Vernigorov showstopper:

"I believe that a status code is too late to let user know that the file 
is large. Bytes are already being transferred and received by user?s OS, 
even if client is not reading them yet. By the time any of this 
information is displayed to the user, and the time it takes them to react, 
damage has already been done."

Link to individual message.

28. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-11 17:34
📧 Message 28 of 39

On Thu, Jun 11, 2020 at 12:52:36AM -0500, Ivy Foster wrote:

> I'm aware that this is a very contentious point, but just to throw a
> wrench into things...there actually is [an RFC precedent][rfc1341] for
> including size as a parameter to a MIME type, even though MIME type is
> generally intended to be a classification. Granted, it's intended
> specifically for external message bodies, so the user can decide
> whether it's worth their while to download...but isn't an external
> message body ultimately what any arbitrary file you fetch is anyway?
> 
> [rfc1341]: https://tools.ietf.org/html/rfc1341

Well, that is unexpected!  I guess practicality beats the idea of
semantic purity even in "real specs".

> I totally get it if the answer's still no to a size parameter; the
> "where does it end?" argument is a strong one. However, a precedent
> for including a very useful piece of information as a MIME type
> parameter is still something to consider.

It definitely weakens my argument that file size is not appropriate
information to attach to a MIME type.  Nevertheless, it's still true
that we only get to dictate the permissible parameters for the one MIME
type that we are actually defining ourselves.  All other registered MIME
types, including all the image/*, audio/* and video/* types which are
liable to be the most common large files, have their own pre-defined
list of registered parameters and we shouldn't be adding extras of our
own.

Maybe we just need to (continue to) let the file size issue go.  I won't
deny that it's useful, but (as so often) Gopherspace is the existence
proof that useful and valuable stuff can be built without it.

The concern about users without fast and cheap internet which was part
of the motivation for my recent suggestion for more 2x codes was
genuine, and it would have been one more nice utilisation of the "two
digit codes which degrade gracefully into one digit codes" philosophy
(which I think is neat but worry that we perhaps don't utilise enough
to make it worth while), but as was pointed out in the ensuing
discussion, most text/* content is likely to be quite small and clients
can already terminate connections early on the basis of a non text/*
MIME type in precisely the way that I was proposing they should do on
the receipt of a 2x code above their threshold.  So, they can get quite
a lot of the benefit of that proposal with no changes required.  I think
I will add an option to quick-terminate on non-text content to AV-98.

Cheers,
Solderpunk

Link to individual message.

29. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-06-12 11:21
📧 Message 29 of 39

On Wed, Jun 10, 2020 at 08:03:37PM +0000, solderpunk wrote:
> I think it is definitely polite to explicitly indicate file sizes for
> anything larger than, I dunno, 10 MiB or thereabouts?
>
> Relatedly, as of a semi-recent update, the automatically generated
> directory listings produced by Molly Brown include file sizes.

Okay, I will jump on the update train!

GUS now provides, by default, size information for every result*.

It's also exposed as a new query filter, in case you want to explore
more on your own. I put an example query below to show usage, but you
can find more documentation on the about page [0].

"computer AND size:>2000"

gemini://gus.guru/search?computer%20AND%20size%3A%3E2000

[0] gemini://gus.guru/about

	except for content_type input results, because is generally less

  useful for them, and I already use that UI space in input results
  for giving the user a preview of what the actual prompt is.

Link to individual message.

30. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-06-12 11:57
📧 Message 30 of 39

On Wed, Jun 10, 2020 at 07:47:00PM +0000, solderpunk wrote:
> Gemini's lack of caching, lack of compression, and lack of resumable
> downloads all mean it's *never* going to be sensible choice for
> really big downloads, no matter how fast internet speeds get.

I agree with this! Gemini isn't suited for big downloads. But, (and I
realize this is very speculative) I think it's very plausible and even
likely that what is "big" will change, perhaps substantially, over
time. We've definitely seen that happen with average file sizes of
say, images, over the past few decades.

A decade ago I would have wanted resumability for a couple hundred MB
download. Now I don't even give it a second thought!

Link to individual message.

31. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-12 12:32
📧 Message 31 of 39

On Fri, Jun 12, 2020 at 07:21:03AM -0400, Natalie Pendragon wrote:
> Okay, I will jump on the update train!
> 
> GUS now provides, by default, size information for every result*.
> 
> It's also exposed as a new query filter, in case you want to explore
> more on your own. I put an example query below to show usage, but you
> can find more documentation on the about page [0].

These are fantastic updates, well done!

Out of curiosity, can you use your GUSly powers to easily tell us, say,
the mean and median size of resources served via Gemini?

How do you feel about the (wonderful!) GUS statistics page being updated
to give information on the size distribution?  Perhaps you could bin
resources into size ranges, say semi-open intervals, [0, 1Kib), [1Kib,
10Kib), [10Kib, 100Kib), etc, etc?

Since I am being greedy and asking you to do things, let me close by
singing the praises of GUS!

Back when new Gemini content was popping up at an insane rate, I would
spend a lot of time exploring and reading.  Weeks later I wonder things
like "where did I read that great retrospective write-up on the career
of a recently deceased motorsport legend?  I wonder if the author has
written anything more?", have absolutely *no* recollection of who wrote
it, or where it was hosted, or much else (I'm not a motorsport fan at
all so did not remember the names of any people, cars, courses, etc.
involved - but the thing was so well written and full of evident passion
that I enjoyed reading it as a complete outsider).  So I'd GUS for some
random small detail I could recall like "oil pressure" and, boom, there
it is.

Anybody who follows my phlog knows that one every few months I'll refer
to something I read in gopherspace but that I have forgotten the source
of and could not find later by checking likely places, so I have to
leave a note saying "if this was you, or you remember who it was, please
let me know!".  I'm thrilled that Geminispace may never have this
problem.

Seriously, Gemini search is already, somehow, a hundred, nay a thousand
times better than Gopher search has ever been, despite the great
disparity in time and attention between the two.  I would love to
understand why, and if there is some difference in the protocols that
explains it.  I have always suspected that Gopher's complete absence of
machine-readable signalling of whether a request succeeded or failed
must be a huge impediment to building a reliable indexer, but I have no
idea if that's actually the reason for the dramatic difference.

Anybody have any insight?

Cheers,
Solderpunk

Link to individual message.

32. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-06-12 12:59
📧 Message 32 of 39

Aww, thanks!! That makes me really happy to hear.

On Fri, Jun 12, 2020 at 12:32:51PM +0000, solderpunk wrote:
> Out of curiosity, can you use your GUSly powers to easily tell us, say,
> the mean and median size of resources served via Gemini?

Sure, just give me a little bit to do some index hackery and I'll
share results back.

> How do you feel about the (wonderful!) GUS statistics page being updated
> to give information on the size distribution?  Perhaps you could bin
> resources into size ranges, say semi-open intervals, [0, 1Kib), [1Kib,
> 10Kib), [10Kib, 100Kib), etc, etc?

Yes! I was thinking about that too, but hadn't convinced myself of the
best way to present the data. I was also thinking of binning the data
to make histograms, or maybe just avg/median/max to start with.

I think it needs segmented by content_type to be useful though. Or at
least by the first/major component of content_type (i.e., all images
are bundled together for measurement).

Link to individual message.

33. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-06-12 13:25
📧 Message 33 of 39

On Fri, Jun 12, 2020 at 08:59:45AM -0400, Natalie Pendragon wrote:
> On Fri, Jun 12, 2020 at 12:32:51PM +0000, solderpunk wrote:
> > Out of curiosity, can you use your GUSly powers to easily tell us, say,
> > the mean and median size of resources served via Gemini?
>
> Sure, just give me a little bit to do some index hackery and I'll
> share results back.

Okay, here are some statistics for the most prevalent content types.
Please do share any feedback you have, and I'll likely add this or
something similar to the statistics page soon so it's continually
available and up-to-date.

The number in parentheses after content_type is the number pages
in Geminispace with that content_type, and I hope everything else is
self-explanatory!

# text/gemini (10378)
Mean   :    2.1 K
Median :    1.2 K
Max    :  753.3 K

# text/plain (1390)
Mean   :    4.2 K
Median :    1.4 K
Max    :  333.8 K

# image/png (281)
Mean   :  462.2 K
Median :  170.6 K
Max    :    3.4 M

# image/jpeg (159)
Mean   :  998.7 K
Median :  507.9 K
Max    :    4.2 M

# image/gif (18)
Mean   :   35.7 K
Median :    5.9 K
Max    :  356.4 K

# audio/mpeg (367)
Mean   :    3.4 M
Median :    3.3 M
Max    :   16.7 M

# application/octet-stream (40)
Mean   :  497.4 K
Median :   94.1 K
Max    :   10.7 M

# application/pdf (165)
Mean   :    4.2 M
Median :  223.9 K
Max    :  126.9 M

# audio/midi (72)
Mean   :    5.2 K
Median :    4.7 K
Max    :   20.7 K

# audio/flac (48)
Mean   :   21.1 M
Median :   15.4 M
Max    :  118.6 M

# text/x-lilypond (122)
Mean   :    4.7 K
Median :    4.6 K
Max    :    7.6 K

Link to individual message.

34. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-12 16:35
📧 Message 34 of 39

On Fri, Jun 12, 2020 at 09:25:55AM -0400, Natalie Pendragon wrote:
> 
> Okay, here are some statistics for the most prevalent content types.
> Please do share any feedback you have, and I'll likely add this or
> something similar to the statistics page soon so it's continually
> available and up-to-date.
> 
> The number in parentheses after content_type is the number pages
> in Geminispace with that content_type, and I hope everything else is
> self-explanatory!
> 
> # text/gemini (10378)
> Mean   :    2.1 K
> Median :    1.2 K
> Max    :  753.3 K
> 
> # text/plain (1390)
> Mean   :    4.2 K
> Median :    1.4 K
> Max    :  333.8 K

Thanks for sharing these!  This is actually really useful information to
have, for thinking about the role of TLS overhead in Gemini.  It
actually seems like most text/gemini content is in fact about as big as
the server's TLS certificate or smaller, which kind of blows the idea of
Gemini being a good match for constrained networks out of the water -
even if we did have those 2x status codes, you'd still need to download
the server's cert, plus the (comparatively small) handshake traffic,
before you could make the decision to terminate early.

This overhead also scuttles lots of simple ideas to work around Gemini's
lack of a HEAD request equivalent.  Defining a well-known endpoint where
clients can fetch, e.g. a small file with the timestamps of the N most
recently updated files the server hosts makes no sense whatsoever if the
TLS overhead of fetching *that* list is, in fact, just as big as the
possibly out-of-date file the client actually wants in the first place.

It doesn't need to make it into the spec, but over time we should
develop and publicise best practices for minimising this overhead.  This
may include choosing signature schemes which are secure with smaller
key sizes, like ed25519.  Self-signed certificates also have a strong
advantage here by virtue of not needing to include a long chain of certs
leading to a trusted root.

These measures will bump heads with Petite Abeille's point about TLS
fingerprinting and how it's good, from an anti-censorship point of view,
to blend in with the crowd by not using "unusual" certificates.

Cheers,
Solderpunk

Link to individual message.

35. Matthew Graybosch (hello (a) matthewgraybosch.com)

📅 Sent: 2020-06-12 18:29
📧 Message 35 of 39

On Fri, 12 Jun 2020 16:35:11 +0000
solderpunk <solderpunk at SDF.ORG> wrote:

> These measures will bump heads with Petite Abeille's point about TLS
> fingerprinting and how it's good, from an anti-censorship point of
> view, to blend in with the crowd by not using "unusual" certificates.

Not to disparage Petite Abeille's point about TLS fingerprinting and
blending in to avoid notice, but aren't we sticking out anyway by
listening on port 1965?

This is a bit outside my expertise, but wouldn't an adversary
determined to find gemini servers just have to do a port scan?

-- 
Matthew Graybosch		gemini://starbreaker.org
#include <disclaimer.h>		gemini://demifiend.org
https://matthewgraybosch.com	gemini://tanelorn.city
"Out of order?! Even in the future nothing works."

Link to individual message.

36. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-06-12 19:11
📧 Message 36 of 39

On Fri, Jun 12, 2020 at 02:29:15PM -0400, Matthew Graybosch wrote:

> Not to disparage Petite Abeille's point about TLS fingerprinting and
> blending in to avoid notice, but aren't we sticking out anyway by
> listening on port 1965?

By default, yes, but if somebody wanted to host a server on port 443 in
an attempt to "blend in", they could.  How effectively they would blend
in would then be a function of how typical their certificate looked.

But maybe there's not such a conflict here.  Somebody wanting to run a
server in extreme stealth mode might just have to accept that this
involves sacrificing some efficiency and use fat certs.

Cheers,
Solderpunk

Link to individual message.

37. Petite Abeille (petite.abeille (a) gmail.com)

📅 Sent: 2020-06-12 19:24
📧 Message 37 of 39



> On Jun 12, 2020, at 21:11, solderpunk <solderpunk at SDF.ORG> wrote:
> 
> But maybe there's not such a conflict here.  Somebody wanting to run a
> server in extreme stealth mode might just have to accept that this
> involves sacrificing some efficiency and use fat certs.

My name is Fat Tony, and I approve this message.

Link to individual message.

38. Luke Emmet (luke (a) marmaladefoo.com)

📅 Sent: 2020-06-12 20:46
📧 Message 38 of 39

On 12-Jun-2020 12:21, Natalie Pendragon wrote:
> Okay, I will jump on the update train!
> GUS now provides, by default, size information for every result*.

Just a quick thought - is this really necessary for the text/* content. 
It seems extra info that is not really important to know? As you have 
demonstrated, these files are all *tiny* and it is not something the 
user needs to take on board to act on. The others, yes, this is a nice 
touch and will help the user.

I'd also like to add my moral support to the work of GUS - it is great 
to have such a nice simple search engine that works well :-)

Best Wishes

  - Luke

Link to individual message.

39. Sean Conner (sean (a) conman.org)

📅 Sent: 2020-06-12 22:48
📧 Message 39 of 39

It was thus said that the Great Natalie Pendragon once stated:
> On Wed, Jun 10, 2020 at 08:03:37PM +0000, solderpunk wrote:
> > I think it is definitely polite to explicitly indicate file sizes for
> > anything larger than, I dunno, 10 MiB or thereabouts?
> >
> > Relatedly, as of a semi-recent update, the automatically generated
> > directory listings produced by Molly Brown include file sizes.
> 
> Okay, I will jump on the update train!
> 
> GUS now provides, by default, size information for every result*.
> 
> It's also exposed as a new query filter, in case you want to explore
> more on your own. I put an example query below to show usage, but you
> can find more documentation on the about page [0].

  Thank you for your work.  This is wonderful.

  -spc

Link to individual message.

---

Previous Thread: Kristall browser

Next Thread: redirect opt-in?