File size issues

🗣️ From: Sean Conner (sean (a) conman.org)
📅 Sent: 2019-08-15 18:06
📧 Message 2 of 11
It was thus said that the Great solderpunk once stated:
> Two quick points with regard to the fact that Gemini currently does not
> convey file sizes to users at any point:
> 
> * Sean has pointed out in one of his RFCs that this means there is no
>   way for a client to know whether or not a download completed
>   successfully or was interrupted due to an accidentally dropped or
>   even a maliciously severred connection
> 
> * I've received an email from somebody watching the Gemini design unfold
>   with interest, who is concerned about Gemini clients with limited
>   system resources unwittingly downloading large files (such as PDFs of
>   scanned documents) which they aren't even capable of opening.  While I
>   quite like the idea of Gemini being friendly to low-end systems, I do
>   wonder whether or not the TLS requirement makes this a little moot.
> 
> Anyway, the question is do we want to change anything to address these
> issues and if so how do we want to do it?
> 
> I'll quickly note in pasing that both of these problems also exist in
> exactly the same form for Gopher, but I've never once heard Gopher users
> complain about them. 

  Gopher does address this rather obliquely---text files (and gopher
indexes) are supposed to end with a '.' on a line by itself.  This lets the
client know it received the data correctly, and it says as much in RFC-1436,
section 3.8:

	Note that for type 5 or type 9 the client must be prepared to read
	until the connection closes.  There will be no period at the end of
	the file; ...

  It's not necessarily a pain point about the filesizes not being known
before hand, but it does make displaying a progress bar (for example)
difficult to implement.

> One possibility, as proposed by Sean, is to add file size to the
> response header, with it optionally appearing after the MIME type.  I'm
> not hugely fond of this myself, simply because it complicates parsing of
> the response header.  

  I'm not seeing much of an issue.  Assuming tabs separate the compoents on
the status line, then

	(\d+)\t([^\t]+)(\t([^\t]+))*

would parse the line (I suspect, I'm not a fan of regex but I think the
above would work to parse the status line).  I don't see much of an issue in
parsing any of the following:

	20<HTAB>text/plain; charset=utf-8<HTAB>2123<CRLF>
	20<HTAB>text/plain<CRLF>

> Remember that the MIME type can have multiple
> components specifying encodings etc.  If you just split the META part of
> the header on whitespace, the number of components is variable, so
> recognising whether or not an optional filesize is present requires
> actually inspecting the parts and looking for a number.  In fairness to
> Sean, at the time of writing of his RFC the spec spec said META was
> separated from STATUS by a tab (whereas now it is just whitespace), so
> tacking something after META with another tab was unambiguous, assuming
> nobody put tabs in their MIME types...

  Which could be specified, "don't put tabs in the MIME type section."

> Another possibility ties into another request I got from somebody very
> early on - it would be nice if there was some way to query a Gemini
> server for the time a resource was last modified, so that Gemini
> equivaents of tools like moku pona could avoid needlessly fetching
> unchanged resources over and over again.  At that point I started
> wondering about giving Gemini some equivalent of HTTP HEAD, although I
> abandoned it pretty quickly when I realised that substantial TLS
> overhead probably made making a whole second request to check if a
> resource had changed not such a worthwhile idea.  

  One way would be to query a well-known endpoint (these exist in the HTTP
world---robots.txt is one such file) that contains tiemstamps for various
resources.  Slap a MIME type of text/gemini-timestamp and call it done:

gemini://example.com/	2019-08-15T13:53:00-05:00
gemini://example.com/feed	2019-07-29T00:00:00-05:00
gemini://example.com/other	2019-08-01T00:00:00-05:00

  That's one way.

> But, we could possibly
> bring this idea back, as the response to such a request could naturally
> include the file size as well.  The real question is how to *make* such
> a request, ideally in a way which doesn't open the door to a half dozen
> other new "methods".

  As I mentioned in a private email to solderpunk earlier, one could always
take advantage of the sub-delimeters in the path portion.  I had at one
point mentioned using those to specify the prompt (otherwise the server
would return a status of 10):

	gemini://example.com/search;Search%20for

This could be formalized:

	gemini://example.com/search;prompt=Search%20for
	gemini://example.com/blogfeed;timestamp=2019-08-15T00:00:00Z
	gemini://example.com/wildexample;prompt=Search%20for;timestamp=2019-08-15:
00:00:00Z?query=foo&usename=bar

  So, you have "prompt" and "timestamp".  Others could be propsed.  If the
"timestamp" thing above is accepted, then you might want to have a new
status code meaning "no change" or "okay, but there's no content".

> Regarding ways to enable something like a HEAD request without changing
> the request format to include a method field - I'm not quite sure
> whether using a fixed URL fragment, like #meta, on requests would be a
> kosher way to do this.  Does metadata count as "some portion or subset
> of the primary resource, some view on representations of the primary
> resource, or some other resource defined or described by those
> representations" (from RFC3986)?

  Well, there are RFC-5147 and RFC-7111 that give semantics to the URI
fragment section, but I still think using the sub-delimeter of ';' in the
path portion is the way to go.

  -spc
---
Previous in thread (1 of 11): 🗣️ solderpunk (solderpunk (a) SDF.ORG)
Next in thread (3 of 11): 🗣️ solderpunk (solderpunk (a) SDF.ORG)
View entire thread.