πΎ Archived View for gemi.dev βΊ gemini-mailing-list βΊ 000197.gmi captured on 2024-06-16 at 12:48:53. Gemini links have been rewritten to link to archived content
β¬ οΈ Previous capture (2023-12-28)
-=-=-=-=-=-=-
Hey all, Just throwing out a quick idea I had last night while trying to sleep, to see how people feel about it. It's simple and easily ignorable and I think it's kind of neat. It's been a small pain point for some people for a while now that there is no way for a Gemini client to know how large a file it's downloading is without simply downloading the whole thing. This is inconvenient from a UI perspective, as there is no way to display a progress bar, and simple clients like AV-98 which simply download a complete file and then pass it off to a handler program appear to "freeze" on large downloads with no clear indication that anything is happening. This is a much bigger problem for people on limited machines (e.g. low memory diskless systems are perfectly viable for reading text/gemini content and displaying small images but not for downloading large binaries, but they can't gracefully opt out of the big stuff and are forced to simply terminate the connection once a threshold amount of content has been downloaded, and then wastefully discard that partial content) or internet connections (e.g. people at sea using satellite data plans which are billed per megabyte). People usually want to address this by having the server somehow declare the file size upfront in the header, as per HTTP's "Content-Length", but I've resisted that tooth and nail because there's no sensible way to do it which doesn't turn the response header into an infinitely extensible thing which people can tack their own variables onto, leaving us no better off than HTTP (so, please, no more ideas on this front)... I was very happy to realise that a lot of these problems can be solved, or at least ameliorated, in a very simple way with an additional 2x status code. Since I plan to deprecate the current 21 code for ending a transient certificate session, we could reuse 21 to mean "SUCCESS, a response body follows, and it's larger than $THRESHOLD MiB" (note I am proposing no change to the existing code 20 - 20 does NOT mean "what's coming in less than $THRESHOLD"). Simple clients could simply treat 21 as 20 and be in exactly the situation they are in now, so the graceful degredation of status codes to their x0 form works nicely here. But e.g. AV-98 could print "Downloading large file, please wait..." upon receving a 21, and then proceed as usual. This is a very low effort client change, but it solves the problem of people thinking something has gone wrong when they unknowingly start a large download. More importantly, users in resource-limited environments could use clients which simply terminate the connection immediately on seeing a header starting with 21, providing a quick and low-waste "opt out" of large content. On the server side, again, a dirt simple server could just always serve up 20 without actually breaking anything. I realise that knowing whether to use 20 or 21 for dynamically generated content may not be straightforward - no problem, it is *always* valid to just send 20 when in doubt. 21 is nothing more than a helpful hint to clients who might need it. It doesn't need to be 100% reliable to still have value. This feels like a good idea to me. It's totally optional and very easy to handle on both the client and server sides, and I feel like being friendly to small/slow computers and slow/intermittent network connections is a good fit with the overall "vibe" of Gemini, provided doing so does not conflict with overall simplicity. Naturally, deciding to do this will lead immediately to a weeks-long heated debate on what the appropriate value of $THRESHOLD should be. We
Sorry replying to the list address this time... How about letting the server decide? I can imagine a scenario where the server defaults to sending text/* mime types as 20 responses and everything else as 21 responses. Then servers can let configuration or (s)CGI output determine how to decide on 20 vs 21. While leaving it up to the server is subjective, I think most content authors understand whether something is meant to be quickly digested and rendered or whether something should be downloaded/queued/status line shown. - meff On 6/10/20 12:32 AM, solderpunk wrote: > Hey all, > > Just throwing out a quick idea I had last night while trying to sleep, > to see how people feel about it. It's simple and easily ignorable and I > think it's kind of neat. > > It's been a small pain point for some people for a while now that there > is no way for a Gemini client to know how large a file it's downloading > is without simply downloading the whole thing. This is inconvenient > from a UI perspective, as there is no way to display a progress bar, and > simple clients like AV-98 which simply download a complete file and then > pass it off to a handler program appear to "freeze" on large downloads > with no clear indication that anything is happening. This is a much > bigger problem for people on limited machines (e.g. low memory diskless > systems are perfectly viable for reading text/gemini content and > displaying small images but not for downloading large binaries, but they > can't gracefully opt out of the big stuff and are forced to simply > terminate the connection once a threshold amount of content has been > downloaded, and then wastefully discard that partial content) or > internet connections (e.g. people at sea using satellite data plans > which are billed per megabyte). > > People usually want to address this by having the server somehow declare > the file size upfront in the header, as per HTTP's "Content-Length", but > I've resisted that tooth and nail because there's no sensible way to do > it which doesn't turn the response header into an infinitely extensible > thing which people can tack their own variables onto, leaving us no > better off than HTTP (so, please, no more ideas on this front)... > > I was very happy to realise that a lot of these problems can be solved, > or at least ameliorated, in a very simple way with an additional 2x > status code. Since I plan to deprecate the current 21 code for ending a > transient certificate session, we could reuse 21 to mean "SUCCESS, a > response body follows, and it's larger than $THRESHOLD MiB" (note I am > proposing no change to the existing code 20 - 20 does NOT mean "what's > coming in less than $THRESHOLD"). Simple clients could simply treat 21 > as 20 and be in exactly the situation they are in now, so the graceful > degredation of status codes to their x0 form works nicely here. But > e.g. AV-98 could print "Downloading large file, please wait..." upon > receving a 21, and then proceed as usual. This is a very low effort > client change, but it solves the problem of people thinking something > has gone wrong when they unknowingly start a large download. More > importantly, users in resource-limited environments could use clients > which simply terminate the connection immediately on seeing a header > starting with 21, providing a quick and low-waste "opt out" of large > content. > > On the server side, again, a dirt simple server could just always serve > up 20 without actually breaking anything. I realise that knowing > whether to use 20 or 21 for dynamically generated content may not be > straightforward - no problem, it is *always* valid to just send 20 when > in doubt. 21 is nothing more than a helpful hint to clients who might > need it. It doesn't need to be 100% reliable to still have value. > > This feels like a good idea to me. It's totally optional and very easy > to handle on both the client and server sides, and I feel like being > friendly to small/slow computers and slow/intermittent network > connections is a good fit with the overall "vibe" of Gemini, provided > doing so does not conflict with overall simplicity. > > Naturally, deciding to do this will lead immediately to a weeks-long > heated debate on what the appropriate value of $THRESHOLD should be. We > *could* wade into those waters, but I'll just also throw out that we > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > 100MiB respectively and leave it at that. Clients targetting > resource-limited environments could let their users configure their own > threshold for early termination of downloads. > > Cheers, > Solderpunk >
Heya! > This feels like a good idea to me. It's totally optional and very easy > to handle on both the client and server sides, and I feel like being > friendly to small/slow computers and slow/intermittent network > connections is a good fit with the overall "vibe" of Gemini, provided > doing so does not conflict with overall simplicity. Yes, i totally like that idea! > Naturally, deciding to do this will lead immediately to a weeks-long > heated debate on what the appropriate value of $THRESHOLD should be. We > *could* wade into those waters, but I'll just also throw out that we > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > 100MiB respectively and leave it at that. Clients targetting > resource-limited environments could let their users configure their own > threshold for early termination of downloads. I had the exact same thoughts, and i think the idea of "wasting" 3 status codes for 1, 10 and 100 MB are totally okay. It allows clients to better display a loading indication Regards - xq
I like the spirit of this idea a lot - Gemini has an opportunity to do a lot more for users in resource-limited environments, and in addition to the explicit austerity in the protocol itself, this is another way to proactively respect users' resources and time. What I feel less excited about is the specification of a hard-coded $THRESHOLD. It feels like a magic number that's not going to fit all situations well - adding three of them like you brought up at the end improves the situation, but nevertheless still feels like a magic number solution. And depending on how future-proof you want these $THRESHOLDs to be, no matter how good the magic numbers are today, as years pass and internet access/quality across the world changes, for better or worse, the magic numbers will become more and more out of date. The only way I see to make it not a magic number is to allow clients to specify a $THRESHOLD as part of their request - that, however, feels like too big of a change to introduce to the Request structure (even if we could make it gracefully degrade). So, my conclusions are: A) I love the idea B) I don't love the design C) I worry there may be no better design And I would support speccing this if no better design arrives, because it is a meaningful issue to solve! Nat
> On Jun 10, 2020, at 09:32, solderpunk <solderpunk at SDF.ORG> wrote: > > are forced to simply terminate the connection once a threshold amount of content has been downloaded Is it really a problem? It should be user-agents prerogative to drop the connection anytime they see fit. And/or only handle a small subset of media type (e.g. text/* only). Ditto for showing network activities. Even the simplest of client could count how many bytes it has read so far, no?
2020-06-10 12:36 GMT+02:00 Natalie Pendragon<natpen at natpen.net>: > What I feel less excited about is the specification of a hard-coded > $THRESHOLD. It feels like a magic number that's not going to fit all > situations well - adding three of them like you brought up at the end > improves the situation, but nevertheless still feels like a magic > number solution. And depending on how future-proof you want these > $THRESHOLDs to be, no matter how good the magic numbers are today, as > years pass and internet access/quality across the world changes, for > better or worse, the magic numbers will become more and more out of > date. I feel the same way; sounds like no matter what you pick will become the "64 kilobytes" of tomorrow's jokes. And ultimately it doesn't allow the introduction of a meaningful progress indicator. Is 100MBs a lot? Is it a long wait? I don't know, is the upstream server fast? Is my wifi having a good day? The alternative is tricky to come up without making the response structure arbitrary and complex. Here's a take though: anything bigger than a megabyte or ten is realistically not going to be text, or anything text-like that can be displayed inline. I can think of images, videos, maybe PDFs or just a bunch of encrypted data in the form of whatever. I wonder if it'd make sense to have a status code that indicates ?mimetype is basically meaningless, it's a big whatever, so I'll give you the content length instead?. Client could then choose to receive a few more bytes and check a magic byte for something it recognizes ? or just prompt the client saying ?that's not exactly something we can display anyway, (how) do you want it saved?? -- tadzik
On Wed, Jun 10, 2020 at 07:32:16AM +0000, solderpunk wrote: > Naturally, deciding to do this will lead immediately to a weeks-long > heated debate on what the appropriate value of $THRESHOLD should be. We > *could* wade into those waters, but I'll just also throw out that we > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > 100MiB respectively and leave it at that. Clients targetting > resource-limited environments could let their users configure their own > threshold for early termination of downloads. That's the real question isn't it? I don't feel strongly about this either way but I'll share a couple of thoughts. One place I previously used gopher was IP routed over VHF packet radio. These links are 1200 baud simplex with an effective throughput of some 80 B/s. I'm not saying gemini can or should care what weirdos are doing with VHF radios, but it may one day find use in scenarios where quantities much less then 1 MB matter. If you're using such a slow link, you might have to waste quite a lot of time to realise that you're getting an above-average amount of content, reducing the effectiveness of a client-side threshold. One way to offer more flexibility could be to use the second digit of the "2n" response code as saying "this content >= 10^n bytes". But I would raise no objections to maintaining the status quo, or to adopting the three codes suggested here. Cheers, Tom
On Wed, 10 Jun 2020 07:32:16 +0000 solderpunk <solderpunk at SDF.ORG> wrote: > Hey all, > > It's been a small pain point for some people for a while now that > there is no way for a Gemini client to know how large a file it's > downloading is without simply downloading the whole thing. I agree that it's important to let people using Gemini clients know how big a file they're about to download is, and I think the status codes for indicating large, huge, and colossal files are a good idea. This inspired me to manually add file sizes to my link descriptions for any file I host on my own capsules as a courtesy to visitors so that people can decide for themselves whether they want to open a particular link. I've done the same for my capsules' atom feeds so that people visiting CAPCOM can also see how large my files are. I won't presume to recommend that everybody do this, even though it would be nice to see my approach become a convention. It takes time to go through your directory tree and get file sizes even if you're just using shell commands like "ls -hal", but it's a low-tech approach I can implement today instead of waiting for client and server developers to catch up. -- Matthew Graybosch https://www.matthewgraybosch.com #include <disclaimer.h> gemini://starbreaker.org Harrisburg,PA gemini://demifiend.org "Out of order?! Even in the future nothing works."
Although this makes sense at first glance, I believe that a status code is too late to let user know that the file is large. Bytes are already being transferred and received by user?s OS, even if client is not reading them yet. By the time any of this information is displayed to the user, and the time it takes them to react, damage has already been done. I think the only sensible solution is for pages to display size information to user next to the link. On Wed, Jun 10, 2020 at 09:32 solderpunk <solderpunk at sdf.org> wrote: > Hey all, > > Just throwing out a quick idea I had last night while trying to sleep, > to see how people feel about it. It's simple and easily ignorable and I > think it's kind of neat. > > It's been a small pain point for some people for a while now that there > is no way for a Gemini client to know how large a file it's downloading > is without simply downloading the whole thing. This is inconvenient > from a UI perspective, as there is no way to display a progress bar, and > simple clients like AV-98 which simply download a complete file and then > pass it off to a handler program appear to "freeze" on large downloads > with no clear indication that anything is happening. This is a much > bigger problem for people on limited machines (e.g. low memory diskless > systems are perfectly viable for reading text/gemini content and > displaying small images but not for downloading large binaries, but they > can't gracefully opt out of the big stuff and are forced to simply > terminate the connection once a threshold amount of content has been > downloaded, and then wastefully discard that partial content) or > internet connections (e.g. people at sea using satellite data plans > which are billed per megabyte). > > People usually want to address this by having the server somehow declare > the file size upfront in the header, as per HTTP's "Content-Length", but > I've resisted that tooth and nail because there's no sensible way to do > it which doesn't turn the response header into an infinitely extensible > thing which people can tack their own variables onto, leaving us no > better off than HTTP (so, please, no more ideas on this front)... > > I was very happy to realise that a lot of these problems can be solved, > or at least ameliorated, in a very simple way with an additional 2x > status code. Since I plan to deprecate the current 21 code for ending a > transient certificate session, we could reuse 21 to mean "SUCCESS, a > response body follows, and it's larger than $THRESHOLD MiB" (note I am > proposing no change to the existing code 20 - 20 does NOT mean "what's > coming in less than $THRESHOLD"). Simple clients could simply treat 21 > as 20 and be in exactly the situation they are in now, so the graceful > degredation of status codes to their x0 form works nicely here. But > e.g. AV-98 could print "Downloading large file, please wait..." upon > receving a 21, and then proceed as usual. This is a very low effort > client change, but it solves the problem of people thinking something > has gone wrong when they unknowingly start a large download. More > importantly, users in resource-limited environments could use clients > which simply terminate the connection immediately on seeing a header > starting with 21, providing a quick and low-waste "opt out" of large > content. > > On the server side, again, a dirt simple server could just always serve > up 20 without actually breaking anything. I realise that knowing > whether to use 20 or 21 for dynamically generated content may not be > straightforward - no problem, it is *always* valid to just send 20 when > in doubt. 21 is nothing more than a helpful hint to clients who might > need it. It doesn't need to be 100% reliable to still have value. > > This feels like a good idea to me. It's totally optional and very easy > to handle on both the client and server sides, and I feel like being > friendly to small/slow computers and slow/intermittent network > connections is a good fit with the overall "vibe" of Gemini, provided > doing so does not conflict with overall simplicity. > > Naturally, deciding to do this will lead immediately to a weeks-long > heated debate on what the appropriate value of $THRESHOLD should be. We > *could* wade into those waters, but I'll just also throw out that we > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > 100MiB respectively and leave it at that. Clients targetting > resource-limited environments could let their users configure their own > threshold for early termination of downloads. > > Cheers, > Solderpunk > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200610/d501 e176/attachment.htm>
On Wed, 10 Jun 2020 18:20:03 +0200 Peter Vernigorov <pitr.vern at gmail.com> wrote: > Although this makes sense at first glance, I believe that a status > code is too late to let user know that the file is large. Bytes are > already being transferred and received by user's OS, even if client > is not reading them yet. By the time any of this information is > displayed to the user, and the time it takes them to react, damage > has already been done. I think the only sensible solution is for > pages to display size information to user next to the link. I agree, but I suspect that unless content authors take the time to provide this information themselves displaying size info next to links will require multiple rounds of communication between the user agent and the server just to query file sizes. That probably isn't what solderpunk, etc. had in mind. -- Matthew Graybosch https://www.matthewgraybosch.com #include <disclaimer.h> gemini://starbreaker.org Harrisburg, PA gemini://demifiend.org "Out of order?! Even in the future nothing works."
On 10-Jun-2020 13:16, Thomas Karpiniec wrote: > One way to offer more flexibility could be to use the second digit of > the "2n" response code as saying "this content>= 10^n bytes". > > But I would raise no objections to maintaining the status quo, or to > adopting the three codes suggested here. This seems nice and future proof, but uses up a lot of status codes potentially (29 takes us up to 10^9) However, it seems we are making a lot of effort to find the wrong way to communicate important information about download size. In any networked application this information is really useful for clients and users to make good choices about whether to wait for something to come or not. If we can have lang and mime-type in the response for 20 response, is it really that philosophically disturbing to include this important information that we know makes network bandwidth negotiation better for everyone? If we have to, I think it could be a client option: [ ] limit non-text content to 5Mb per request [x] download all text/* content Best Wishes - Luke
On 10-Jun-2020 14:49, Matthew Graybosch wrote: > > I agree that it's important to let people using Gemini clients know how > big a file they're about to download is, and I think the status codes > for indicating large, huge, and colossal files are a good idea. > > This inspired me to manually add file sizes to my link descriptions for > any file I host on my own capsules as a courtesy to visitors so that > people can decide for themselves whether they want to open a particular > link. I've done the same for my capsules' atom feeds so that people > visiting CAPCOM can also see how large my files are. > > I won't presume to recommend that everybody do this, even though it > would be nice to see my approach become a convention. It takes time to > go through your directory tree and get file sizes even if you're just > using shell commands like "ls -hal", but it's a low-tech approach I can > implement today instead of waiting for client and server developers to > catch up. > Again, the client can do a lot of work for you in this regard. As we all know mostly the heavy content is going to be images and things like linked mp3, zip, pdf etc. Generally speaking a client can examine the URL and make an educated assumption about the target mime type from the file extension. Such links can be decorated or hinted to the user who may or may not decide to download them. In the upcoming relese of GemiNaut I have implemented a simple decoration scheme that lets users infer this form the link. This decoration is added by the client irrespective of the server. => /normal/path Normal gemini link These can generally can be assumed to be text/gmi or maybe text/*, no extra decoration, so displayed like this: ? Normal gemini link => /path/to/file.png Link to an image these can be decorated to hint this to the user, so what you actually see is this. The icon hints at the content you will likely get. ? ? Link to an image Do I want to click on all links to images? Probably not for various reasons including bandwidth/performance, and the client helped me make that judgement. At the moment I have an opt in list of the most likely file types that might be linked to: [png gif jpg etc] are images [mp3 mov pdf zip gz] are decorated as "binary" files This gives the user a good idea what they will get and helps make a choice to click on the link or not Best wishes - Luke
On Wed, Jun 10, 2020 at 06:36:05AM -0400, Natalie Pendragon wrote: > What I feel less excited about is the specification of a hard-coded > $THRESHOLD. It feels like a magic number that's not going to fit all > situations well - adding three of them like you brought up at the end > improves the situation, but nevertheless still feels like a magic > number solution. And depending on how future-proof you want these > $THRESHOLDs to be, no matter how good the magic numbers are today, as > years pass and internet access/quality across the world changes, for > better or worse, the magic numbers will become more and more out of > date. Hmm. I'll admit I didn't think about this at all, so good job flagging it! That said, I'm not sure this is a problem for us. Gemini's lack of caching, lack of compression, and lack of resumable downloads all mean it's *never* going to be sensible choice for really big downloads, no matter how fast internet speeds get. I don't see a future where Gemini clients wanting to choose between thresholds of 100MiB, 1GiB and 10GiB are sensible. Cheers, Solderpunk
On Wed, Jun 10, 2020 at 01:16:12PM +0200, Petite Abeille wrote: > > > > On Jun 10, 2020, at 09:32, solderpunk <solderpunk at SDF.ORG> wrote: > > > > are forced to simply terminate the connection once a threshold amount of content has been downloaded > > Is it really a problem? It should be user-agents prerogative to drop the connection anytime they see fit. And/or only handle a small subset of media type (e.g. text/* only). Ditto for showing network activities. Even the simplest of client could count how many bytes it has read so far, no? > Yes, of course, the client can always drop the connection whenever, whyever. But if your goal is to conserve limited or expensive network traffic, being able to sever the connection immediately after seeing the first two bytes of the header will be much more effective than downloading the first MiB of data and then saying "Nope, I can't afford to finish this" and then throwing away that already-downloaded MiB. As for counting bytes, in fact AV-98, after parsing the header, simply reads from the socket until EOF with a single line: `body = f.read()` Replacing this with a loop to make smaller reads, calculate their length, update a counter, and append them to a buffer is much more work than simply printing a single "This may take a while..." statement after parsing the header. Cheers, Solderpunk
On Wed, Jun 10, 2020 at 10:16:03PM +1000, Thomas Karpiniec wrote: > One place I previously used gopher was IP routed over VHF packet > radio. These links are 1200 baud simplex with an effective throughput > of some 80 B/s. I'm not saying gemini can or should care what weirdos > are doing with VHF radios, but it may one day find use in scenarios > where quantities much less then 1 MB matter. If you're using such a > slow link, you might have to waste quite a lot of time to realise that > you're getting an above-average amount of content, reducing the > effectiveness of a client-side threshold. In general I'm very happy to entertain the needs of weirdos doing non-conventional networking where it doesn't conflict with wider goals. That said, the unavoidable TLS overhead in Gemini means it's unlikely to be a good match for super slow network scenrios like packet radio. We might be able to squeeze it down by pushing adoption of compact signature algorithms and encouraging use of TLS 1.3's session resumption, but in general there will be kilobytes of overhead going on. If quantities much less than 1 MiB matter, I don't think any status codes are going to make Gemini feasible. Cheers, Solderpunk
On Wed, Jun 10, 2020 at 09:49:14AM -0400, Matthew Graybosch wrote: > I won't presume to recommend that everybody do this, even though it > would be nice to see my approach become a convention. I was going to quip that I had already recommended everybody do this in the Best Practices document and moan once again that nobody ever reads it...then and checked and, whoops, it's not in there! How'd that happen? I think it is definitely polite to explicitly indicate file sizes for anything larger than, I dunno, 10 MiB or thereabouts? Relatedly, as of a semi-recent update, the automatically generated directory listings produced by Molly Brown include file sizes. Cheers, Solderpunk
On Wed, Jun 10, 2020 at 06:20:03PM +0200, Peter Vernigorov wrote: > Although this makes sense at first glance, I believe that a status code is > too late to let user know that the file is large. Bytes are already being > transferred and received by user?s OS, even if client is not reading them > yet. By the time any of this information is displayed to the user, and the > time it takes them to react, damage has already been done. I think the only > sensible solution is for pages to display size information to user next to > the link. It's true, and a good point, that stuff may be buffering up in kernel space even when the client has only read a little bit of it. If the idea was to present a prompt to the user, I would agree this may be of little use. But if the client itself peeks at the first two bytes and then immediately closes the connection, I would have thought that would be quick enough to make a worthwhile impact. But I admit I'm not too sure of that. Maybe actual real world testing would be in order... Cheers, Solderpunk
On Wed, Jun 10, 2020 at 08:04:54PM +0100, Luke Emmet wrote: > If we can > have lang and mime-type in the response for 20 response, is it really that > philosophically disturbing to include this important information that we > know makes network bandwidth negotiation better for everyone? Important distinction: we don't have lang *and* MIME type, we have MIME type. The text/gemini MIME type has a lang parameter defined. We're allowed to define parameters on text/gemini because it's our type (we can't prescribe parameters on any other text/* type, which by the way is the answer to some of the questions in gemini://mozz.us/journal/2020-06-08.gmi). Adding a content-size parameter to text/gemini *is* deeply philosophically disturbing because MIME types are, well, *types*. Categories. It makes no sense whatsoever to put details specific to a particular token as part of type declaration. So the only way to add an analogue of content-length would be to send a MIME type *and* content-length, and then we need some kind of delmiter. For the sake of argument, let's say a tab character. So you send <MIME><TAB><CONTENT-LENGTH> and then, boom, next week it's <MIME><TAB><CONTENT-LENGTH><TAB><SOMETHING-ELSE>, and it never stops. Exactly one parameter is a stable, defensible position (what Gemini has now). Arbitrarily many delimited parameters is a stable, defensible position (what the web has now). Anything in between is at very real risk, IMHO, of mutating into the latter over time. Once a delimiter comes along, it's impossible to stop subsequent expansions. > If we have to, I think it could be a client option: > [ ] limit non-text content to 5Mb per request > [x] download all text/* content Well, this second option is actually already totally possible. As far as I know, it's not implemented by anybody yet. I kind of like the idea, I may add it to AV-98. Cheers, Solderpunk
On Wed, 10 Jun 2020 20:03:37 +0000 solderpunk <solderpunk at SDF.ORG> wrote: > I was going to quip that I had already recommended everybody do this > in the Best Practices document and moan once again that nobody ever > reads it...then and checked and, whoops, it's not in there! How'd > that happen? Maybe you had more pressing concerns? > I think it is definitely polite to explicitly indicate file sizes for > anything larger than, I dunno, 10 MiB or thereabouts? I'd suggest warning for 1MB because I doubt that most text content served on Gemini protocol would exceed 100KB per file, and because section 3.3 of the spec explicitly states that servers are not supposed to compress content before sending it down the pipe. I mean, I'd love to see more writers adopt Gemini for long-form work, but I think most gemlog posts range from 250-2000 words, which might be 25KB at most. I looked at the markdown file for my trunk novel, and it weighs in at 1.6MB for 289,000 words. I shudder to think of what one might find in a text file weighing 10MB or more. I'm thinking of something like the collected works of Leo Tolstoy or a partial dump of Equifax's data as of 2020. Of course, 1.6MB is fine on a decent broadband connection, and even 10MB, but I still remember what it was like to try install Gentoo from stage one (hundreds of megabytes of code) on a dialup connection. It wasn't fun. > Relatedly, as of a semi-recent update, the automatically generated > directory listings produced by Molly Brown include file sizes. That's sure to come in handy. -- Matthew Graybosch https://www.matthewgraybosch.com #include <disclaimer.h> gemini://starbreaker.org Harrisburg, PA gemini://demifiend.org "Out of order?! Even in the future nothing works."
It was thus said that the Great solderpunk once stated: > On Wed, Jun 10, 2020 at 09:49:14AM -0400, Matthew Graybosch wrote: > > Relatedly, as of a semi-recent update, the automatically generated > directory listings produced by Molly Brown include file sizes. I just added that to GLV-1.12556. And because of the way I signal CGI and SCGI scripts, I can also detect those as well. Index of private/ --------------------------- => Welcome.txt Welcome.txt (352 bytes) => cgi-sample cgi-sample (CGI script) => christmas-carol.txt christmas-carol.txt (187995 bytes) => mondrian.gif mondrian.gif (3068 bytes) => scgi-sample scgi-sample (SCGI script) --------------------------- GLV-1.12556 -spc
On Wed, Jun 10, 2020 at 05:15:38PM -0400, Matthew Graybosch wrote: > I'd suggest warning for 1MB because I doubt that most text content > served on Gemini protocol would exceed 100KB per file, and because > section 3.3 of the spec explicitly states that servers are not supposed > to compress content before sending it down the pipe. Just to be clear: servers can use MIME types like application/gzip or application/zip to serve compressed content if they want to. There's just no way to do what HTTP allows, to send compressed data while specifying both the compression *and* the underlying media type. Cheers, Solderpunk
It was thus said that the Great solderpunk once stated: > Hey all, > > Just throwing out a quick idea I had last night while trying to sleep, > to see how people feel about it. It's simple and easily ignorable and I > think it's kind of neat. [ snip ] > Naturally, deciding to do this will lead immediately to a weeks-long > heated debate on what the appropriate value of $THRESHOLD should be. We > *could* wade into those waters, but I'll just also throw out that we > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > 100MiB respectively and leave it at that. Clients targetting > resource-limited environments could let their users configure their own > threshold for early termination of downloads. I'm replying here because I think this is the best play to reply with my thoughts. I have read the rest of this thread and will be referencing some later emails. You have been warned. Dispite being the one who pushed for larger status codes, I'm not a fan of this proposal, but I can't fully explain *why* other than to say "where does it stop?" Like Tadeusz Sosnierz alluded too, what's huge today may be small tomorrow [1]. Personally, I think adding the filesize to the MIME type *is* the best answer, but I can see and even agree with the arguemnts against it. And I think I'm justified in saying that [2]. Petite Abeille has listed the options that are open today, but solderpunk didn't like the suggestion(s) as they might complicate the client. But a client that simply reads the entire response with a single call, while simple, is a problem waiting to happen. What if the response doesn't fit into memory? It may be reasonable to say that "text/*" can fit into memory, but video/*? image/*? (seriously, I have a 306MB beautiful image of the moon, probably from NASA). I'm not sure what the best solution is though. -spc [1] The first real editor I used was 40k in size. Today, there are people who use editors that consume over 1G of RAM when running. [2] Visual pun, but you have to use a monosopace font to see it.
> On Jun 10, 2020, at 23:22, Sean Conner <sean at conman.org> wrote: > > => Welcome.txt Welcome.txt (352 bytes) > => cgi-sample cgi-sample (CGI script) > => christmas-carol.txt christmas-carol.txt (187995 bytes) > => mondrian.gif mondrian.gif (3068 bytes) > => scgi-sample scgi-sample (SCGI script) Cool, if not machine readable. Perhaps would benefit from a human touch ala ls -h: When used with the -l option, use unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the number of digits to three or less using base 2 for sizes. So gemini answer to metadata is to emulate ls in text/gemini? :P
> On Jun 10, 2020, at 21:57, solderpunk <solderpunk at SDF.ORG> wrote: > > As for counting bytes, in fact AV-98, after parsing the header, simply > reads from the socket until EOF with a single line: surely the mighty python ecosystem must have the equivalent of something like pipe viewer, no? http://www.ivarch.com/programs/pv.shtml
It was thus said that the Great Petite Abeille once stated: > > > > On Jun 10, 2020, at 23:22, Sean Conner <sean at conman.org> wrote: > > > > => Welcome.txt Welcome.txt (352 bytes) > > => cgi-sample cgi-sample (CGI script) > > => christmas-carol.txt christmas-carol.txt (187995 bytes) > > => mondrian.gif mondrian.gif (3068 bytes) > > => scgi-sample scgi-sample (SCGI script) > > Cool, if not machine readable. Okay. The above is the format for GLV-1.12556. I found two other pages (from a quick search) that also include file sizes: Via <gemini://gemini.circumlunar.space/docs/> # Directory listing => / .. => best-practices.gmi best-practices.gmi 5 KiB Jun 6 2020 => faq.gmi faq.gmi 17 KiB Jun 7 2020 => specification.gmi specification.gmi 30 KiB Jun 7 2020 And via <gemini://gemini.circumlunar.space/capcom/> ## 2020-06-10 => gemini://demifiend.org/journal/2020/i-wanted-to-like-mate-too.gemini demifiend - I Wanted to Like MATE, Too (2.2K) => gemini://demifiend.org/journal/2020/exit-bug-reports-loyalty.gemini demifiend - Exit, Bug Reports, and Loyalty (4.1K) => gemini://gemini.circumlunar.space/~shufei/phlog/Shufei-ThisAndThat-Weiph log.gmi Shufei?s Gmiphlog - The ?phlog (Weiphlog) => gemini://demifiend.org/journal/2020/choose-life.gemini demifiend - Choose Life; or, The Problem with Video Games (12K) => gemini://acidic.website/musings/npr-bridge.gmi Musings of Meff - NPR Text Portal and Spec Changes > Perhaps would benefit from a human touch ala ls -h: Perhaps. > So gemini answer to metadata is to emulate ls in text/gemini? :P Appears to be so. -spc
On 10 Jun 2020, at 6:21 pm -0400, Sean Conner wrote: > It was thus said that the Great solderpunk once stated: > > Just throwing out a quick idea I had last night while trying to sleep, > > to see how people feel about it. It's simple and easily ignorable and I > > think it's kind of neat. > > [ snip ] > > > Naturally, deciding to do this will lead immediately to a weeks-long > > heated debate on what the appropriate value of $THRESHOLD should be. We > > *could* wade into those waters, but I'll just also throw out that we > > could use 21, 22 and 23 to indicate payloads exceeding 1MiB, 10MiB or > > 100MiB respectively and leave it at that. Clients targetting > > resource-limited environments could let their users configure their own > > threshold for early termination of downloads. > Dispite being the one who pushed for larger status codes, I'm not a fan of > this proposal, but I can't fully explain *why* other than to say "where does > it stop?" Like Tadeusz Sosnierz alluded too, what's huge today may be small > tomorrow [1]. Personally, I think adding the filesize to the MIME type *is* > the best answer, but I can see and even agree with the arguemnts against it. > And I think I'm justified in saying that [2]. I'm aware that this is a very contentious point, but just to throw a wrench into things...there actually is [an RFC precedent][rfc1341] for including size as a parameter to a MIME type, even though MIME type is generally intended to be a classification. Granted, it's intended specifically for external message bodies, so the user can decide whether it's worth their while to download...but isn't an external message body ultimately what any arbitrary file you fetch is anyway? [rfc1341]: https://tools.ietf.org/html/rfc1341 I totally get it if the answer's still no to a size parameter; the "where does it end?" argument is a strong one. However, a precedent for including a very useful piece of information as a MIME type parameter is still something to consider. Cheers, Ivy
> On Jun 10, 2020, at 21:57, solderpunk <solderpunk at SDF.ORG> wrote: > > Yes, of course, the client can always drop the connection whenever, > whyever. But if your goal is to conserve limited or expensive network > traffic, being able to sever the connection immediately after seeing the > first two bytes of the header will be much more effective than > downloading the first MiB of data and then saying "Nope, I can't afford > to finish this" and then throwing away that already-downloaded MiB. Perhaps servers could introduce a slight delay between sending the status line and the content itself. That way, a client may have enough time to drop the connection before the server has actually started to saturate the link with content. This may address Peter Vernigorov showstopper: "I believe that a status code is too late to let user know that the file is large. Bytes are already being transferred and received by user?s OS, even if client is not reading them yet. By the time any of this information is displayed to the user, and the time it takes them to react, damage has already been done."
On Thu, Jun 11, 2020 at 12:52:36AM -0500, Ivy Foster wrote: > I'm aware that this is a very contentious point, but just to throw a > wrench into things...there actually is [an RFC precedent][rfc1341] for > including size as a parameter to a MIME type, even though MIME type is > generally intended to be a classification. Granted, it's intended > specifically for external message bodies, so the user can decide > whether it's worth their while to download...but isn't an external > message body ultimately what any arbitrary file you fetch is anyway? > > [rfc1341]: https://tools.ietf.org/html/rfc1341 Well, that is unexpected! I guess practicality beats the idea of semantic purity even in "real specs". > I totally get it if the answer's still no to a size parameter; the > "where does it end?" argument is a strong one. However, a precedent > for including a very useful piece of information as a MIME type > parameter is still something to consider. It definitely weakens my argument that file size is not appropriate information to attach to a MIME type. Nevertheless, it's still true that we only get to dictate the permissible parameters for the one MIME type that we are actually defining ourselves. All other registered MIME types, including all the image/*, audio/* and video/* types which are liable to be the most common large files, have their own pre-defined list of registered parameters and we shouldn't be adding extras of our own. Maybe we just need to (continue to) let the file size issue go. I won't deny that it's useful, but (as so often) Gopherspace is the existence proof that useful and valuable stuff can be built without it. The concern about users without fast and cheap internet which was part of the motivation for my recent suggestion for more 2x codes was genuine, and it would have been one more nice utilisation of the "two digit codes which degrade gracefully into one digit codes" philosophy (which I think is neat but worry that we perhaps don't utilise enough to make it worth while), but as was pointed out in the ensuing discussion, most text/* content is likely to be quite small and clients can already terminate connections early on the basis of a non text/* MIME type in precisely the way that I was proposing they should do on the receipt of a 2x code above their threshold. So, they can get quite a lot of the benefit of that proposal with no changes required. I think I will add an option to quick-terminate on non-text content to AV-98. Cheers, Solderpunk
On Wed, Jun 10, 2020 at 08:03:37PM +0000, solderpunk wrote: > I think it is definitely polite to explicitly indicate file sizes for > anything larger than, I dunno, 10 MiB or thereabouts? > > Relatedly, as of a semi-recent update, the automatically generated > directory listings produced by Molly Brown include file sizes. Okay, I will jump on the update train! GUS now provides, by default, size information for every result*. It's also exposed as a new query filter, in case you want to explore more on your own. I put an example query below to show usage, but you can find more documentation on the about page [0]. "computer AND size:>2000" gemini://gus.guru/search?computer%20AND%20size%3A%3E2000 [0] gemini://gus.guru/about
On Wed, Jun 10, 2020 at 07:47:00PM +0000, solderpunk wrote: > Gemini's lack of caching, lack of compression, and lack of resumable > downloads all mean it's *never* going to be sensible choice for > really big downloads, no matter how fast internet speeds get. I agree with this! Gemini isn't suited for big downloads. But, (and I realize this is very speculative) I think it's very plausible and even likely that what is "big" will change, perhaps substantially, over time. We've definitely seen that happen with average file sizes of say, images, over the past few decades. A decade ago I would have wanted resumability for a couple hundred MB download. Now I don't even give it a second thought!
On Fri, Jun 12, 2020 at 07:21:03AM -0400, Natalie Pendragon wrote: > Okay, I will jump on the update train! > > GUS now provides, by default, size information for every result*. > > It's also exposed as a new query filter, in case you want to explore > more on your own. I put an example query below to show usage, but you > can find more documentation on the about page [0]. These are fantastic updates, well done! Out of curiosity, can you use your GUSly powers to easily tell us, say, the mean and median size of resources served via Gemini? How do you feel about the (wonderful!) GUS statistics page being updated to give information on the size distribution? Perhaps you could bin resources into size ranges, say semi-open intervals, [0, 1Kib), [1Kib, 10Kib), [10Kib, 100Kib), etc, etc? Since I am being greedy and asking you to do things, let me close by singing the praises of GUS! Back when new Gemini content was popping up at an insane rate, I would spend a lot of time exploring and reading. Weeks later I wonder things like "where did I read that great retrospective write-up on the career of a recently deceased motorsport legend? I wonder if the author has written anything more?", have absolutely *no* recollection of who wrote it, or where it was hosted, or much else (I'm not a motorsport fan at all so did not remember the names of any people, cars, courses, etc. involved - but the thing was so well written and full of evident passion that I enjoyed reading it as a complete outsider). So I'd GUS for some random small detail I could recall like "oil pressure" and, boom, there it is. Anybody who follows my phlog knows that one every few months I'll refer to something I read in gopherspace but that I have forgotten the source of and could not find later by checking likely places, so I have to leave a note saying "if this was you, or you remember who it was, please let me know!". I'm thrilled that Geminispace may never have this problem. Seriously, Gemini search is already, somehow, a hundred, nay a thousand times better than Gopher search has ever been, despite the great disparity in time and attention between the two. I would love to understand why, and if there is some difference in the protocols that explains it. I have always suspected that Gopher's complete absence of machine-readable signalling of whether a request succeeded or failed must be a huge impediment to building a reliable indexer, but I have no idea if that's actually the reason for the dramatic difference. Anybody have any insight? Cheers, Solderpunk
Aww, thanks!! That makes me really happy to hear. On Fri, Jun 12, 2020 at 12:32:51PM +0000, solderpunk wrote: > Out of curiosity, can you use your GUSly powers to easily tell us, say, > the mean and median size of resources served via Gemini? Sure, just give me a little bit to do some index hackery and I'll share results back. > How do you feel about the (wonderful!) GUS statistics page being updated > to give information on the size distribution? Perhaps you could bin > resources into size ranges, say semi-open intervals, [0, 1Kib), [1Kib, > 10Kib), [10Kib, 100Kib), etc, etc? Yes! I was thinking about that too, but hadn't convinced myself of the best way to present the data. I was also thinking of binning the data to make histograms, or maybe just avg/median/max to start with. I think it needs segmented by content_type to be useful though. Or at least by the first/major component of content_type (i.e., all images are bundled together for measurement).
On Fri, Jun 12, 2020 at 08:59:45AM -0400, Natalie Pendragon wrote: > On Fri, Jun 12, 2020 at 12:32:51PM +0000, solderpunk wrote: > > Out of curiosity, can you use your GUSly powers to easily tell us, say, > > the mean and median size of resources served via Gemini? > > Sure, just give me a little bit to do some index hackery and I'll > share results back. Okay, here are some statistics for the most prevalent content types. Please do share any feedback you have, and I'll likely add this or something similar to the statistics page soon so it's continually available and up-to-date. The number in parentheses after content_type is the number pages in Geminispace with that content_type, and I hope everything else is self-explanatory! # text/gemini (10378) Mean : 2.1 K Median : 1.2 K Max : 753.3 K # text/plain (1390) Mean : 4.2 K Median : 1.4 K Max : 333.8 K # image/png (281) Mean : 462.2 K Median : 170.6 K Max : 3.4 M # image/jpeg (159) Mean : 998.7 K Median : 507.9 K Max : 4.2 M # image/gif (18) Mean : 35.7 K Median : 5.9 K Max : 356.4 K # audio/mpeg (367) Mean : 3.4 M Median : 3.3 M Max : 16.7 M # application/octet-stream (40) Mean : 497.4 K Median : 94.1 K Max : 10.7 M # application/pdf (165) Mean : 4.2 M Median : 223.9 K Max : 126.9 M # audio/midi (72) Mean : 5.2 K Median : 4.7 K Max : 20.7 K # audio/flac (48) Mean : 21.1 M Median : 15.4 M Max : 118.6 M # text/x-lilypond (122) Mean : 4.7 K Median : 4.6 K Max : 7.6 K
On Fri, Jun 12, 2020 at 09:25:55AM -0400, Natalie Pendragon wrote: > > Okay, here are some statistics for the most prevalent content types. > Please do share any feedback you have, and I'll likely add this or > something similar to the statistics page soon so it's continually > available and up-to-date. > > The number in parentheses after content_type is the number pages > in Geminispace with that content_type, and I hope everything else is > self-explanatory! > > # text/gemini (10378) > Mean : 2.1 K > Median : 1.2 K > Max : 753.3 K > > # text/plain (1390) > Mean : 4.2 K > Median : 1.4 K > Max : 333.8 K Thanks for sharing these! This is actually really useful information to have, for thinking about the role of TLS overhead in Gemini. It actually seems like most text/gemini content is in fact about as big as the server's TLS certificate or smaller, which kind of blows the idea of Gemini being a good match for constrained networks out of the water - even if we did have those 2x status codes, you'd still need to download the server's cert, plus the (comparatively small) handshake traffic, before you could make the decision to terminate early. This overhead also scuttles lots of simple ideas to work around Gemini's lack of a HEAD request equivalent. Defining a well-known endpoint where clients can fetch, e.g. a small file with the timestamps of the N most recently updated files the server hosts makes no sense whatsoever if the TLS overhead of fetching *that* list is, in fact, just as big as the possibly out-of-date file the client actually wants in the first place. It doesn't need to make it into the spec, but over time we should develop and publicise best practices for minimising this overhead. This may include choosing signature schemes which are secure with smaller key sizes, like ed25519. Self-signed certificates also have a strong advantage here by virtue of not needing to include a long chain of certs leading to a trusted root. These measures will bump heads with Petite Abeille's point about TLS fingerprinting and how it's good, from an anti-censorship point of view, to blend in with the crowd by not using "unusual" certificates. Cheers, Solderpunk
On Fri, 12 Jun 2020 16:35:11 +0000 solderpunk <solderpunk at SDF.ORG> wrote: > These measures will bump heads with Petite Abeille's point about TLS > fingerprinting and how it's good, from an anti-censorship point of > view, to blend in with the crowd by not using "unusual" certificates. Not to disparage Petite Abeille's point about TLS fingerprinting and blending in to avoid notice, but aren't we sticking out anyway by listening on port 1965? This is a bit outside my expertise, but wouldn't an adversary determined to find gemini servers just have to do a port scan? -- Matthew Graybosch gemini://starbreaker.org #include <disclaimer.h> gemini://demifiend.org https://matthewgraybosch.com gemini://tanelorn.city "Out of order?! Even in the future nothing works."
On Fri, Jun 12, 2020 at 02:29:15PM -0400, Matthew Graybosch wrote: > Not to disparage Petite Abeille's point about TLS fingerprinting and > blending in to avoid notice, but aren't we sticking out anyway by > listening on port 1965? By default, yes, but if somebody wanted to host a server on port 443 in an attempt to "blend in", they could. How effectively they would blend in would then be a function of how typical their certificate looked. But maybe there's not such a conflict here. Somebody wanting to run a server in extreme stealth mode might just have to accept that this involves sacrificing some efficiency and use fat certs. Cheers, Solderpunk
> On Jun 12, 2020, at 21:11, solderpunk <solderpunk at SDF.ORG> wrote: > > But maybe there's not such a conflict here. Somebody wanting to run a > server in extreme stealth mode might just have to accept that this > involves sacrificing some efficiency and use fat certs. My name is Fat Tony, and I approve this message.
On 12-Jun-2020 12:21, Natalie Pendragon wrote: > Okay, I will jump on the update train! > GUS now provides, by default, size information for every result*. Just a quick thought - is this really necessary for the text/* content. It seems extra info that is not really important to know? As you have demonstrated, these files are all *tiny* and it is not something the user needs to take on board to act on. The others, yes, this is a nice touch and will help the user. I'd also like to add my moral support to the work of GUS - it is great to have such a nice simple search engine that works well :-) Best Wishes - Luke
It was thus said that the Great Natalie Pendragon once stated: > On Wed, Jun 10, 2020 at 08:03:37PM +0000, solderpunk wrote: > > I think it is definitely polite to explicitly indicate file sizes for > > anything larger than, I dunno, 10 MiB or thereabouts? > > > > Relatedly, as of a semi-recent update, the automatically generated > > directory listings produced by Molly Brown include file sizes. > > Okay, I will jump on the update train! > > GUS now provides, by default, size information for every result*. > > It's also exposed as a new query filter, in case you want to explore > more on your own. I put an example query below to show usage, but you > can find more documentation on the about page [0]. Thank you for your work. This is wonderful. -spc
---