I am proposing a convention of putting human and machine readable metadata in documents (in ''geminisphere''). This is completely optional for document writers. The metadata should be placed at the end of the document so that the viewers can view the content first. For now, I am proposing the following metadata for inclusion in documents (all of which is optional):
On Sat Nov 14, 2020 at 12:07 PM EST, wrote: > * the date (and maybe time) when the document was published > > * the date (and maybe time) when the document was last modified Can already pull this out of RSS
It was thus said that the Great Drew DeVault once stated: > On Sat Nov 14, 2020 at 12:07 PM EST, wrote: > > * the date (and maybe time) when the document was published > > > > * the date (and maybe time) when the document was last modified > > Can already pull this out of RSS Which version? There are around half a dozen variations that are not all compatible with each other. -spc (And that also assumes you have an RSS feed for every page on the site)
It was thus said that the Great smlckz at tilde.pink once stated: > I am proposing a convention of putting human and machine readable metadata > in documents (in ''geminisphere''). This is completely optional for > document writers. > > The metadata should be placed at the end of the document so that the > viewers can view the content first. > > For now, I am proposing the following metadata for inclusion in documents > (all of which is optional): > > * the date (and maybe time) when the document was published > > * the date (and maybe time) when the document was last modified > > * copyright information and/or license of the document > > IMO, we should use ISO 8601 for the date/time in metadata. > > The clients may use the information, but may not hide the metadata. > The spiders/bots can also use the information > (when indexing/archiving documents) as well. > > Now the question for you is how the metadata is formatted? > Please share your thoughts on it. Okay. Created 2020-11-14T17:34:19-0500 Modified 2020-11-14T17:50:03-0500 Copyright 2020 by Sean Conner. The timestamp was created with the following Unix command: "date +%FT%T%z" so that's pretty easy. And you know, if you move the lines to the top of the document, put the Modified: header first, a client would only have to read the first 34 bytes of the document to see if it's modified, and if it hasn't since the client last read it, the client can close the connection. Caching solved! -spc (Add a Size header and you solve the size problem as well!)
> On Nov 14, 2020, at 23:57, Sean Conner <sean at conman.org> wrote: > > And you know, if you move the lines to the top of the document Furthermore, if you add an empty line after all these, hmmm, lines, you have reinvented MIME! Hurray! Smells like 1982 all over again! Long live RFC822!
> On Nov 14, 2020, at 23:52, Sean Conner <sean at conman.org> wrote: > > Which version? Atom (Web standard), RFC 4287, December 2005. > There are around half a dozen variations that are not all compatible with each other. Ignore them. No point in replaying a decade of bickering. > > -spc (And that also assumes you have an RSS feed for every page on the > site) ...
On Sat, 14 Nov 2020, Drew DeVault wrote: > Can already pull this out of RSS Every document does not and need not have to have a RSS feed associated with it. In those pages which have a RSS feed, you need to parse XML and who likes that? The proposed convention is meant to be simple to parse, write and understand. No need for another library. ~smlckz
On Sat, 14 Nov 2020, Sean Conner wrote: > Okay. > > Created 2020-11-14T17:34:19-0500 > Modified 2020-11-14T17:50:03-0500 > Copyright 2020 by Sean Conner. > > The timestamp was created with the following Unix command: "date +%FT%T%z" > so that's pretty easy. That's one way of doing that. Can we do better than that? > And you know, if you move the lines to the top of > the document, put the Modified: header first, a client would only have to > read the first 34 bytes of the document to see if it's modified, and if it > hasn't since the client last read it, the client can close the connection. > Caching solved! > > -spc (Add a Size header and you solve the size problem as well!) > We don't want or need anything like that. That is a breaking change to the spec so breaks all existing clients. Let me change my wordings a little bit. >> The metadata should be placed at the end of the document so that the >> viewers can view the content first. I should have said ''must'' instead of ''should''. >> The clients may use the information, but may not hide the metadata. ''must'' not hide. mmmh.. ~smlckz
On 14/11/20 17:07, smlckz at tilde.pink wrote: > For now, I am proposing the following metadata for inclusion in > documents (all of which is optional): > > * the date (and maybe time) when the document was published > > * the date (and maybe time) when the document was last modified > > * copyright information and/or license of the document I would also like a field for the source of the document. This will allow people to take local copies of documents without loosing track of where they came from and where updated versions may be found. It may also simplify mirroring sites too, especially given their is a field for the license information. I've had a dilemma about whether I should be placing the sites name at the top-level heading or elsewhere. If there were a location field, I would avoid placing the site's name at the top-most header, reclaiming another layer of heading depth in the process. Regardless of whether this becomes an official standard, I will probably adopt this or something like it because it makes so much sense. I would additionally like a field for the author and authors email address, but that's less important to me. I'm also proposing that these additional fields are optional. -- Jon
It was thus said that the Great smlckz at tilde.pink once stated: > On Sat, 14 Nov 2020, Sean Conner wrote: > > Okay. > > > >Created 2020-11-14T17:34:19-0500 > >Modified 2020-11-14T17:50:03-0500 > >Copyright 2020 by Sean Conner. > > > > The timestamp was created with the following Unix command: "date +%FT%T%z" > >so that's pretty easy. > > That's one way of doing that. Can we do better than that? What's wrong with that format? It's an ISO standard, it's locale neutral, easy to parse, and it's easy for humans to read. I don't think you really can do better than that. Unless you really want to parse dates like vuos, sk?b 16. b. 2020 02:27:32 CET > >And you know, if you move the lines to the top of > >the document, put the Modified: header first, a client would only have to > >read the first 34 bytes of the document to see if it's modified, and if it > >hasn't since the client last read it, the client can close the connection. > >Caching solved! > > > > -spc (Add a Size header and you solve the size problem as well!) > > > > We don't want or need anything like that. That is a breaking change to the > spec so breaks all existing clients. Like adding headers isn't a breaking change? And I'm not adding the size to the MIME type, but to the other "fields", something like: Created 2020-11-15T20:29:01-0500 Modifie 2020-11-15T20:29:01-0500 Copyright 2020 by Sean Conner Size 806 Cache not-on-your-life User-Agent myGeminiClient-1.13 MD5sum fd888c3218f34e71dc57221143d44ccb > Let me change my wordings a little bit. > > >>The metadata should be placed at the end of the document so that the > >>viewers can view the content first. > I should have said ''must'' instead of ''should''. Kill joy. > >>The clients may use the information, but may not hide the metadata. > ''must'' not hide. > > mmmh.. Mmmmmh indeed ... -spc
On Sat, 14 Nov 2020 17:57:34 -0500 Sean Conner <sean at conman.org> wrote: > -spc (Add a Size header and you solve the size problem as well!) Nope, not for as long as it's optional (can never reliably tell that you've received the complete document if the connection dies before receiving the full header) and only part of text/gemini (can never tell what the size is for other document types, which IMO are more likely to be a concern size-wise; a photo's size easily exceeds that of a Bible sized text/gemini document). -- Philip -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201116/801c 6031/attachment.sig>
On Sun, 15 Nov 2020 19:41:47 +0000 (UTC) smlckz at tilde.pink wrote: > We don't want or need anything like that. That is a breaking change to the spec so breaks all existing clients. How so? My client will never interpret such meta-data, and I see the lack of provisions for it in text/gemini as a feature, but I don't see how adding it would break my client. To my client, it's regular text lines in the document body. The main advantage of this proposal is that the spec really doesn't need to be concerned with it. It's still text/gemini and there are no changes to the protocol. That makes it a great honeypot. Hopefully, most future suggestions for changes to the protocol can instead be added to an evergrowing list of in-document header fields that no one will implement. -- Philip -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201116/8397 e6ff/attachment.sig>
On Mon, 16 Nov 2020, Philip Linde wrote: > On Sun, 15 Nov 2020 19:41:47 +0000 (UTC) > smlckz at tilde.pink wrote: > >> We don't want or need anything like that. That is a breaking change to the spec so breaks all existing clients. > > How so? My client will never interpret such meta-data, and I see the > lack of provisions for it in text/gemini as a feature, but I don't see > how adding it would break my client. To my client, it's regular text > lines in the document body. What I thought was that if header metadata were to be introduced, they need to be hidden (at least) as they degrade user experience. You as a visitor do not want to see the document is X bytes in size or the md5sum or sha512sum of the document before actual content. And if they need to be hidden, the spec needs to be changed and that would break existing clients. Hopefully you can see what I meant. For this reason, I had to be ''Kill joy'' and amend my proposal so that the metadata must be placed at the end of the document. > The main advantage of this proposal is that the spec really doesn't need > to be concerned with it. It's still text/gemini and there are no > changes to the protocol. That makes it a great honeypot. Hopefully, > most future suggestions for changes to the protocol can instead be > added to an evergrowing list of in-document header fields that no one > will implement. Not only text/gemini, but also any other text/* format. I have clearly stated that this is just metadata. As each field is optional, unrecognised fields would be ignored. > -- > Philip > ~smlckz
On Sun, 15 Nov 2020, Sean Conner wrote: > What's wrong with that format? It's an ISO standard, it's locale neutral, > easy to parse, and it's easy for humans to read. I don't think you really > can do better than that. Unless you really want to parse dates like > > vuos, sk?b 16. b. 2020 02:27:32 CET I am not against ISO 8601 format and don't want to dive into l10n mess. I wonder if we need a seperator between content and metadata or not, or it'd be better to put the whole metadata into a preformatted text block with alt-text of `metadata`, or using some prefix for each line of metadata field. What do you think? ~smlckz
November 16, 2020 4:17 AM, smlckz at tilde.pink wrote: > I am not against ISO 8601 format and don't want to dive into l10n mess. I wonder if we need a > seperator between content and metadata or not, or > it'd be better to put the whole metadata into a preformatted text block > with alt-text of `metadata`, or using some prefix for each line of metadata > field. What do you think? > > ~smlckz The least friction way to implement metadata would be to present it a preformatted yaml: ``` yaml title: 'This is the title: it contains a colon' author: - Author One - Author Two keywords: [nothing, nothingness] abstract: | This is the abstract. It consists of two paragraphs. ``` A variation that I believe would have utility the most utility in clients would be a new syntactic feature called metadata implemented similar to preformatted blocks, like: --- yaml title: 'This is the title: it contains a colon' author: - Author One - Author Two keywords: [nothing, nothingness] abstract: | This is the abstract. It consists of two paragraphs. --- Not evangelizing yaml in this context, though it's not a bad fit. Just taking my example from Pandoc documentation - https://pandoc.org/MANUAL.html#metadata-blocks Obviously document writers *can* include metadata in any document they write, so the question would be whether the value added by encouraging a uniform presentation is worth defining a metadata line type in the specification. I think doing so enables interesting options for client code, and that creating a line type as opposed to dictating page placement is more idiomatically gemini text Chris
On Saturday, November 14, 2020 6:24 PM, Petite Abeille <petite.abeille at gmail.com> wrote: > > On Nov 14, 2020, at 23:52, Sean Conner sean at conman.org wrote: > > Which version? > > Atom (Web standard), RFC 4287, December 2005. > > > There are around half a dozen variations that are not all compatible with each other. > > Ignore them. No point in replaying a decade of bickering. Seconded. Atom is a strong spec and has not other versions, extensions, date issues, etc to deal with. It's already used for much of Gemini, thanks to CAPCOM. makeworld
Reading about favicon in Gemini by way of an extra request to the host serving a requested document, I searched for "metadata" in old threads. It seems to me the quoted message from 2020-11-14 below could serve as a solution to the debate. Note that the favicon RFC basically uses this approach itself, stating some `key: value` pairs within the document. => gemini://mozz.us/files/rfc_gemini_favicon.gmi RFC: Adding Emoji Favicons to Gemini Why not use this kind of structured metadata lines for an (as per RFC still unmotivated) favicon convention? Favicon: # On Sat, 14 Nov 2020 17:07:30 +0000, smlckz wrote: > I am proposing a convention of putting human and machine readable > metadata in documents (in ''geminisphere''). This is completely optional > for document writers. > > The metadata should be placed at the end of the document so that the > viewers can view the content first. > > For now, I am proposing the following metadata for inclusion in > documents (all of which is optional): > > * the date (and maybe time) when the document was published > > * the date (and maybe time) when the document was last modified > > * copyright information and/or license of the document > > IMO, we should use ISO 8601 for the date/time in metadata. > > The clients may use the information, but may not hide the metadata. > The spiders/bots can also use the information (when indexing/archiving > documents) as well. > > Now the question for you is how the metadata is formatted? > Please share your thoughts on it. > > > ~smlckz
On Sun, 21 Feb 2021 13:06:53 -0000 (UTC) text at sdfeu.org: > Reading about favicon in Gemini by way of an extra request to the host > serving a requested document, I searched for "metadata" in old threads. > > It seems to me the quoted message from 2020-11-14 below could serve as a > solution to the debate. > > Note that the favicon RFC basically uses this approach itself, stating > some `key: value` pairs within the document. > > => gemini://mozz.us/files/rfc_gemini_favicon.gmi RFC: Adding Emoji > Favicons to Gemini > > Why not use this kind of structured metadata lines for an (as per RFC > still unmotivated) favicon convention? > > Favicon: # > It seems your are suggesting implementing equivalent of http headers that are key: values pair and are not part of the document but is transmitted in the reply. Currently gemini only returns the status code, the content type and potentially the language (this is not mandatory). That's an endless rabbithole that the Gemini protocol should better not explore because it allows endless extendability.
On Sun, 21 Feb 2021 at 13:13, Solene Rapenne <solene at perso.pw> wrote: > > It seems your are suggesting implementing equivalent of http headers > that are key: values pair and are not part of the document but is transmitted > in the reply. Currently gemini only returns the status code, the content type > and potentially the language (this is not mandatory). > > That's an endless rabbithole that the Gemini protocol should better > not explore because it allows endless extendability. This isn't headers of any sort, it's document metadata, similar to HTML's <head> and <meta>.
On Sun, 21 Feb 2021 19:51:39 +0000, Oliver Simmons wrote: > This isn't headers of any sort, it's document metadata, similar to > HTML's <head> and <meta>. I loved Opera's native navigation support for HTML's rel prev/next tags. https://www.w3.org/TR/2018/SPSD-html32-20180315/#link states: > LINK provides a media independent method for defining relationships with other documents and resources. LINK has been part of HTML since the very early days, although few browsers as yet take advantage of it (most still ignore LINK elements). https://news.ycombinator.com/item?id=11515888 has some comments on it: > Literally one of the greatest things about the Opera browser was that you could browse an entire forum or whatever (longform article etc) with the Space key
On Sun, 21 Feb 2021 at 20:29, <text at sdfeu.org> wrote: > > I loved Opera's native navigation support for HTML's rel prev/next tags. > > https://www.w3.org/TR/2018/SPSD-html32-20180315/#link states: > > LINK provides a media independent method for defining relationships > with other documents and resources. LINK has been part of HTML since the > very early days, although few browsers as yet take advantage of it (most > still ignore LINK elements). > > https://news.ycombinator.com/item?id=11515888 has some comments on it: > > Literally one of the greatest things about the Opera browser was that > you could browse an entire forum or whatever (longform article etc) with > the Space key > > That sounds neat, would be useful for orbits/webrings (such as LEO). The current link system works ok, but can be a bit clunky.
It was thus said that the Great Oliver Simmons once stated: > On Sun, 21 Feb 2021 at 13:13, Solene Rapenne <solene at perso.pw> wrote: > > > > It seems your are suggesting implementing equivalent of http headers > > that are key: values pair and are not part of the document but is transmitted > > in the reply. Currently gemini only returns the status code, the content type > > and potentially the language (this is not mandatory). > > > > That's an endless rabbithole that the Gemini protocol should better > > not explore because it allows endless extendability. > > This isn't headers of any sort, it's document metadata, similar to > HTML's <head> and <meta>. One can go overboard on metadata. Check out the source on this joker's website: http://boston.conman.org/2021/02/17.1 Almost a *hundred* lines of metadata! Madness! Madness I say! -spc (and he's not even sure he has enough! The fool!)
I've seen some people put key/value type metadata in their gmi files already ("tags: this,that,whatevs" for example). Go ahead and do it if you want; I personally like that you want to put them at the end of the document, where they won't bother anybody. As for including it in the spec... I'd rather not. Treat them as optional extensions :) Cheers, ew0k
On Mon, 22 Feb 2021 at 07:57, Bj?rn W?rmedal <bjorn.warmedal at gmail.com> wrote: > > I've seen some people put key/value type metadata in their gmi files > already ("tags: this,that,whatevs" for example). Go ahead and do it if > you want; I personally like that you want to put them at the end of > the document, where they won't bother anybody. > > As for including it in the spec... I'd rather not. Treat them as > optional extensions :) > The spec has "advanced line types" which are treated as optional: > 5.5 Advanced line types > The following advanced line types MAY be recognised by advanced clients. Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function. Having the format as part of the spec would be good, I don't think having an official list of key:values in the spec should be a thing though, that should be separate.
> On Feb 22, 2021, at 11:24, Oliver Simmons <oliversimmo at gmail.com> wrote: > > Having the format as part of the spec would be good text/parameters The ABNF [RFC5234] grammar for "text/parameters" content is: file = *((parameter / parameter-value) CRLF) parameter = 1*visible-except-colon parameter-value = parameter *WSP ":" value visible-except-colon = %x21-39 / %x3B-7E ; VCHAR - ":" value = *(TEXT-UTF8char / WSP) TEXT-UTF8char = <as defined in Section 20.1> WSP = <See RFC 5234> ; Space or HTAB VCHAR = <See RFC 5234> CRLF = <See RFC 5234> https://tools.ietf.org/html/rfc7826#page-305 <https://tools.ietf.org/html/rfc7826#page-305> ?0? -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210222/e498 c7f6/attachment.htm>
On Mon, 22 Feb 2021 at 10:31, Petite Abeille <petite.abeille at gmail.com> wrote: > > text/parameters > > The ABNF [RFC5234] grammar for "text/parameters" content is: > > file = *((parameter / parameter-value) CRLF) > parameter = 1*visible-except-colon > parameter-value = parameter *WSP ":" value > visible-except-colon = %x21-39 / %x3B-7E ; VCHAR - ":" > value = *(TEXT-UTF8char / WSP) > TEXT-UTF8char = <as defined in Section 20.1> > WSP = <See RFC 5234> ; Space or HTAB > VCHAR = <See RFC 5234> > CRLF = <See RFC 5234> > > > https://tools.ietf.org/html/rfc7826#page-305 > That's perfect! It's pretty much what I was describing, following an existing spec would be great! :)
> On Feb 22, 2021, at 11:51, Oliver Simmons <oliversimmo at gmail.com> wrote: > > That's perfect! It's pretty much what I was describing, following an > existing spec would be great! :) In terms of keys, RFC822 & Co. is not a bad place to start. + modernization, i.e. ISO 8601. ?0?
---
Previous Thread: Why not use the markdown way to deal with long lines?
Next Thread: [ANN] -- kinda? LEO, Molniya, and the greater question of Gemini webrings