File size issues

🗣️ From: solderpunk (solderpunk (a) SDF.ORG)
📅 Sent: 2019-08-16 10:43
📧 Message 3 of 11
>   Gopher does address this rather obliquely---text files (and gopher
> indexes) are supposed to end with a '.' on a line by itself.  This lets the
> client know it received the data correctly, and it says as much in RFC-1436,
> section 3.8:

Whoops, true!  In my defence, I think this is very rarely used nowadays.
VF-1 includes no code whatsoever to detect and strip this from files it
downloads and I've never seen one appear on screen.
 
>   I'm not seeing much of an issue.  Assuming tabs separate the compoents on
> the status line, then
> 
> 	(\d+)\t([^\t]+)(\t([^\t]+))*
> 
> would parse the line (I suspect, I'm not a fan of regex but I think the
> above would work to parse the status line).  I don't see much of an issue in
> parsing any of the following:
> 
> 	20<HTAB>text/plain; charset=utf-8<HTAB>2123<CRLF>
> 	20<HTAB>text/plain<CRLF>
> 
>   Which could be specified, "don't put tabs in the MIME type section."

Yes, with sufficient prescription of whitespace practices in response
headers it could be made sufficiently parsable, but it would be nice if
things weren't so brittle.

This also, of course, sets a precendent of "whenever we decide a little
bit of extra metadata would be handy in the header, just append it after
a tab", which over time could bloat our header until it's basically just
a HTTP header in disguise with tabs instead of newlines.

(not a fan of regex either, by the way, and was quite happy to discover
Lua's lightweight alternative system when I first picked it up)
 
>   One way would be to query a well-known endpoint (these exist in the HTTP
> world---robots.txt is one such file) that contains tiemstamps for various
> resources.  Slap a MIME type of text/gemini-timestamp and call it done:
> 
> gemini://example.com/	2019-08-15T13:53:00-05:00
> gemini://example.com/feed	2019-07-29T00:00:00-05:00
> gemini://example.com/other	2019-08-01T00:00:00-05:00
> 
>   That's one way.

I actually quite like this idea.  No need to make it timestamp-specific
either.  We could have a well-known endpoint for general file metadata,
which listed modification time, file size, checksum, MIME type, etc.  It
could accept queries for a specific path, and *that* could be the way to
do an equivalent of a HEAD request.

This would let clients for specific scenarios do the extra work
themselves to work around their problems, e.g. clients with very low
memory or storage space could request the metadata for all files before
attempting to access them and warn the user if the file size exceeds a
threshold; clients on unreliable connections could request the metadata
before downloading and then warn the user if file size and/checksum did
not agree.  Most "normal" clients could do neither and just operate as
they already do.

I think it's kind of neat to keep solutions to edge problems outside of
the protocol itself and push them into things like well-known endpoints
like the above where they can easily be ignored when they are not
needed/wanted.  The downside is that server developers have to do the
work to add support for these things - but it's expected, I think, that
servers are harder to write than clients.  Ease of client implementation
is very important - it leads to a large number of independent clients,
which means unofficial extensions of the standard can only really take
off if a large number of people with presumably diverse opinions can be
convinced they are worthwhile.  And, of course, some server authors can
just choose not to support some of these endpoints, and when queried can
just return status 51 and then the client understands they are on their
own.  All of this can be done without any change to the core Gemini spec
(each well-known endpoint, of course, would need its own spec).

>   As I mentioned in a private email to solderpunk earlier, one could always
> take advantage of the sub-delimeters in the path portion.  I had at one
> point mentioned using those to specify the prompt (otherwise the server
> would return a status of 10):
> 
> 	gemini://example.com/search;Search%20for
> 
> This could be formalized:
> 
> 	gemini://example.com/search;prompt=Search%20for
> 	gemini://example.com/blogfeed;timestamp=2019-08-15T00:00:00Z
> 	gemini://example.com/wildexample;prompt=Search%20for;timestamp=2019-08-1
5:00:00:00Z?query=foo&usename=bar
> 
>   So, you have "prompt" and "timestamp".  Others could be propsed.  If the
> "timestamp" thing above is accepted, then you might want to have a new
> status code meaning "no change" or "okay, but there's no content".

I think I prefer the well-known endpoint over this, but that's right now
more of a gut reaction and not a well thought-out and defencible
position. 
 
>   Well, there are RFC-5147 and RFC-7111 that give semantics to the URI
> fragment section, but I still think using the sub-delimeter of ';' in the
> path portion is the way to go.

Ah, more for the reading list!

-Solderpunk
---
Previous in thread (2 of 11): 🗣️ Sean Conner (sean (a) conman.org)
Next in thread (4 of 11): 🗣️ Jason McBrayer (jmcbray (a) carcosa.net)
View entire thread.