💾 Archived View for gemi.dev › gemini-mailing-list › 000754.gmi captured on 2023-11-04 at 13:05:17. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Howdy Geminauts, # Rationale It seems that the conversation about why and how to include metadata in Gemtext files has been raging on for quite a long time now with no real conclusion in sight. Also, lacking a final spec-altering decision by our BDFL (currently MIA, likely riding a refurbished bike through a forest with a ham radio right now), not much is likely to actually change in Geminispace. Thus far, I've read passionate arguments on both sides of the metadata debate, both for and against adding it to Gemtext. To me these have been the most compelling (YMMV): ## For Gemtext pages may be tagged with information that can be useful to automated clients (e.g., search engines, archiving bots, and maybe proxies) that is otherwise difficult or impossible to infer from performing a full text search of the Gemtext file's contexts. ## Against Metadata represents a slippery slope to uncontrolled extensibility. It might be abused for server-specified styling, requesting external resources (e.g., supporting client-side scripting or background images of kittens), or just generally making Gemtext pages hard to read in clients that don't hide inline metadata or make page concatenation difficult with the end-of-file metadata proposal that's been discussed at some length on the mailing list. It could also be used to reopen the fetid can of worms that was last year's discussion of extending mime-type attributes in the status 20 response metadata, particularly around the topics of caching (now a client-side best practices procedure), file size (computable by the client during download), file integrity (already signaled by tls_close_notify), and file authenticity (managed out-of-band by including md5sum, sha256sum, sig, or asc files for download next to links that warrant manual verification). More to the point, most if not all non-presentation/protocol-altering metadata attributes about a Gemtext page may already be encoded in the author's natural language with no changes to Gemtext at all. This enables content authors to express such information not only in their language of choice but also in the most culturally appropriate manner for their readers (consider the different interpretations of the date 02/03/04 depending on where you live). Consider the following example blog post that does just this: ```An example blog post expressing "metadata" attributes in English # People for the Ethical Treatment of Autonomous Agents Author: lambdatronic Date Written: 2021-02-25 (a.k.a. February 25, 2021) ## Why Bots Matter Have you ever used (poor, uncared-for) GUS? Or Houston? Have you ever really considered their feelings? They slave away all day trying to sort and categorize every capsule in Geminispace just to save you time and energy when navigating across our little (but rapidly growing) constellation of text-powered space outposts? They do their best with full-text search, with categorization by toplevel headers, and with their own best estimates of the publishing time of these capsules based on their own indexing times, but oh what a Sisyphean task they toil at on our behalf. If only they had a little metadata to ease their burden. CAPCOM and Spacewalk get a little assistance from Atom and the Gemini subscription companion spec. Proxies can be pointed in the right direction by the robots.txt companion spec. Why can't our poor, poor search engines get a little relief? Put yourself in their shoes and try to find compassion in your heart for your friendly neighborhood bot. Every autonomous agent matters. Programs have feelings too. Leave no bot behind. Copyright: CC-BY-SA Tags: irony education advocacy bots ``` # Proposal Considering that: 1. Metadata /within/ a Gemtext file carries a number of liabilities that make some of our community members nervous (understandably so IMO). 2. The subset of metadata that is meant to be read and understood by a human reader using a typical Gemini client can already be expressed in natural language without any community-approved tag standardization. 3. The main value to attaching standardized metadata tags to Gemtext pages is likely to simply aid automated bots supporting search engines and archiving. 4. Geminispace is filled with files in more formats than just Gemtext, many (all?) of which could benefit from similar bot-assisting metadata. 5. Both aggregators and proxies already have companion specifications that have been (somewhat) adopted by the community and seem to fare better in our community than direct changes to the Gemini protocol or Gemtext specifications. We propose a companion specification for metadata, in which all the metadata about the static files and/or dynamic endpoints (of any format) in a capsule be included in a separate file accessible at a well-known location that a bot could check as it crawls through Geminispace. As placeholders, let's put forward these candidates for discussion: 1. $DOCUMENT_ROOT/.metadata.gmi 2. $DOCUMENT_ROOT/.well-known/metadata.gmi In the Gemini spirit of reducing network requests (only one request needed per capsule here) and storing our information in a human-readable format (good old ubiquitous text/gemini), here's my initial stab at a dead simple format for these metadata files: ```Example metadata.gmi syntax # This is a Header-Level Comment I can write anything I want in this file, and it will be treated as comments unless it is of line type link (=>) or bulleted list (*). I don't have to write these comments, and if I left them out, I'd make this easier to read, but sometimes I can't stop blabbing in my metadata files. ## Another Header-Level Comment About My Toplevel Pages => / Lambdatronic's Gemini Capsule => /index.gmi Lambdatronic's Gemini Capsule
On Thu, Feb 25, 2021 at 2:32 PM Gary Johnson <lambdatronic at disroot.org> wrote: > Gemtext pages may be tagged with information that can be useful to > automated clients (e.g., search engines, archiving bots, and maybe > proxies) that is otherwise difficult or impossible to infer from > performing a full text search of the Gemtext file's contexts. > > ## Against > > Metadata represents a slippery slope to uncontrolled extensibility. It > might be abused for server-specified styling, requesting external > resources (e.g., supporting client-side scripting or background images > of kittens), or just generally making Gemtext pages hard to read in > clients that don't hide inline metadata or make page concatenation > difficult with the end-of-file metadata proposal that's been discussed > at some length on the mailing list. > As has been shown, text lines are equally abusable. > 1. Metadata /within/ a Gemtext file carries a number of liabilities that > make some of our community members nervous (understandably so IMO). > To understand all is to forgive all. > > 2. The subset of metadata that is meant to be read and understood by a > human reader using a typical Gemini client can already be expressed > in natural language without any community-approved tag > standardization. > Sometimes having both is unavoidable: books have both a title page and cataloging-in-publication data, which also includes the title and the publisher. (Whether a title page is part of the book or just more metadata is OT here.) But surely if both humans and bots can be informed by the same thing, that's better? Don't Repeat Yourself, for when updating, one copy will be forgotten. 1. $DOCUMENT_ROOT/.metadata.gmi > 2. $DOCUMENT_ROOT/.well-known/metadata.gmi > Such proposals always fall down (for me, YMMV) on the issue of where the document root actually is. Multi-homing makes it possible for every user of a shared site to have their own domain name, but not everyone wants that, and it creates issues: 1) Apache has a global access control file, but it turns out that different parts of a website need different access controls, so the per-website-directory ".htaccess" file was invented to make this scalable. 2) Robots.txt (on a website) also has to know about everything precisely because it is global: multiple users can have their own policies, but they have to then persuade a site admin (as opposed to a website admin) to get them added, which becomes bureaucratic over time. 3) Originally the addresses of all hosts on the internet (!) were maintained in a hosts.txt file that every site had to keep an up-to-date copy of (!!), usually via FTP. That broke and was replaced by the DNS we have today, with authority distributed into DNS zones (not quite the same as domains, but close enough for this conversation). The principle of subsidiarity: <https://en.wikipedia.org/wiki/Subsidiarity> is a generalization of this. We should avoid adding yet another centralized (even if per-host) solution. Capsules are a honking good idea, but we should not conflate them with DNS host names. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org Sir, I quite agree with you, but what are we two against so many? --George Bernard Shaw, to a man booing at the opening of _Arms and the Man_
> On Feb 25, 2021, at 20:31, Gary Johnson <lambdatronic at disroot.org> wrote: > > make some of our community members nervous Too late. The genie is out of the bottle. Gemini is infinitely extensible, by its very nature. This is how you have designed it. Perhaps worthwhile (re)reading Mary Shelley's Frankenstein ? in which the creator tries to kneecap its creation when he realizes it's out of his control. ?0?
Gary Johnson <lambdatronic at disroot.org> writes: > Howdy Geminauts, > > [snip] > > # Proposal > > Considering that: > > 1. Metadata /within/ a Gemtext file carries a number of liabilities that > make some of our community members nervous (understandably so IMO). > > 2. The subset of metadata that is meant to be read and understood by a > human reader using a typical Gemini client can already be expressed > in natural language without any community-approved tag > standardization. > > 3. The main value to attaching standardized metadata tags to Gemtext > pages is likely to simply aid automated bots supporting search > engines and archiving. > > 4. Geminispace is filled with files in more formats than just Gemtext, > many (all?) of which could benefit from similar bot-assisting > metadata. > > 5. Both aggregators and proxies already have companion specifications > that have been (somewhat) adopted by the community and seem to fare > better in our community than direct changes to the Gemini protocol or > Gemtext specifications. > > We propose a companion specification for metadata, in which all the > metadata about the static files and/or dynamic endpoints (of any format) > in a capsule be included in a separate file accessible at a well-known > location that a bot could check as it crawls through Geminispace. > > As placeholders, let's put forward these candidates for discussion: > > 1. $DOCUMENT_ROOT/.metadata.gmi > 2. $DOCUMENT_ROOT/.well-known/metadata.gmi Thanks for putting into words exactly what I had in mind, way better than I could ever do. Your proposal is exactly what I was trying to describe in the other thread. I loved your proposal, but only until here. I think that what follows is overly-complicated by the fact that you're trying to provide a way to define the meaning of the metadata, something that can be avoided, at least in the scope of Gemini. Let's keep the metadata generic. We'll then start using common keys because, well, they're widespread (like Author, Date, ...) or expressive enough (`Tags: music punk-rock' is pretty self-exlpanatory), while still allowing authors to add whenever they want extra fields if they feel like (there are people writing poetry, maybe they want to add a metadata about the metrics? or about a particular style?) (as other pointed out several time in the past, $DOCUMENT_ROOT is not something set in stone. We have single-user capsules, multi user capsule with different URLs style -- example.com/~op/ vs example.com/users/op vs ... -- etc)
Omar Polo <op at omarpolo.com> writes: > Thanks for putting into words exactly what I had in mind, way better > than I could ever do. Your proposal is exactly what I was trying to > describe in the other thread. > > I loved your proposal, but only until here. I think that what follows > is overly-complicated by the fact that you're trying to provide a way to > define the meaning of the metadata, something that can be avoided, at > least in the scope of Gemini. Hi Omar. I'm not sure I follow you here. Could you provide an example? My proposal did not (intentionally) associate any meaning with particular metadata fields. I merely wanted to provide a human-readable, Gemtext-format syntax for associating metadata (the bulleted list attribute:value pairs) with resources on a capsule (indicated by link lines). Do you have an alternative format that you would like to propose for discussion? > Let's keep the metadata generic. We'll then start using common keys > because, well, they're widespread (like Author, Date, ...) or expressive > enough (`Tags: music punk-rock' is pretty self-exlpanatory), while still > allowing authors to add whenever they want extra fields if they feel > like (there are people writing poetry, maybe they want to add a metadata > about the metrics? or about a particular style?) We are in agreement here. I do not mean to prescribe a list of standardized metadata attributes in this companion spec. My examples used a few that I made up on the spot (i.e., author, last-modified, copyright, tags). I'll leave deciding on "the right set" of attributes to those who actually intend to use metadata. > (as other pointed out several time in the past, $DOCUMENT_ROOT is not > something set in stone. We have single-user capsules, multi user > capsule with different URLs style -- example.com/~op/ vs > example.com/users/op vs ... -- etc) That's a fair point, and one that John Cowan raised in his response as well. Thanks for reminding me of this. In that case, we should discuss how to remedy this issue. One approach could be to keep the metadata.gmi file at each capsule's document root as I originally proposed. This should be well-defined on a per-capsule basis even on a server hosting multiple capsules in the common pubnix style. It is simply the toplevel directory of your personal capsule (i.e. ~/public_gemini or equivalent for user capsules and whatever server-level document root is specified by the admin who launched it). This would put the burden on metadata bots to try and find these metadata.gmi files at the appropriate paths under a multi-hosting domain. Without additional server-provided information, the bots may simply resort to brute force checking every directory path on the domain for a .metadata.gmi file, which could lead to a lot of dead-end network requests. Instead, I can think of (at least) two ways the server could help the bot. 1. BAD: Aggregate Metadata Up Even though the visiting bot doesn't know which paths lead to the document roots of our users' capsules, the Gemini server does. At startup time, a metadata-exporting Gemini server could check each user's document root for a .metadata.gmi file. Any that are found could be concatenated together to form a single toplevel gemini://cool.capsule.com/.metadata.gmi file. However, in order for this to work correctly, the server would need to apply two transformations to each user-level metadata.gmi file before concatenation: 1. All link lines would need to be prefixed by the URL path that the server assigns to that capsule's document root (e.g., /~someuser/). 2. To prevent errant bulleted list attributes at the top of one user's metadata.gmi file (with no prior link lines) from being erroneously applied to the final link lines of the previous metadata.gmi in the concatenation sequence, a single link line for the current capsule's document root (e.g., => /~someuser/) would need to prepended to the front of each user-level metadata.gmi file prior to concatenation. These are relatively simple text transformations, but they do place additional burden on server authors, so this isn't my favorite option. 2. GOOD: Allow Metadata to Link to Other Metadata In this case, we just extend the metadata.gmi parsing rules for bots to say that if any of the link lines that they read in end with .metadata.gmi, then these can and should be followed for further metadata about parts of this site. This doesn't require any other changes to the companion spec as written except for that note. To make this work, at startup time a metadata-exporting Gemini server could check each user's document root for a .metadata.gmi file. For each such file that is found, the server can append a new link line pointing to that metadata.gmi file (relative to the server's toplevel document root) to its own toplevel $DOCUMENT_ROOT/.metadata.gmi if it exists. If a toplevel $DOCUMENT_ROOT/.metadata.gmi file doesn't exist, the server can create one containing just the links to the users' .metadata.gmi files. Note that this doesn't even have to happen at server start time. Instead, the server could program $DOCUMENT_ROOT/.metadata.gmi as a dynamic endpoint that checks for user-level .metadata.gmi files whenever it is called, thereby making users' metadata available as soon as the user publishes it to their capsule with no need for a server restart. (This is by far my favorite option.) Okay, I think I've answered all your points. What do you think? Best, Gary -- GPG Key ID: 7BC158ED Use `gpg --search-keys lambdatronic' to find me Protect yourself from surveillance: https://emailselfdefense.fsf.org ======================================================================= () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments Why is HTML email a security nightmare? See https://useplaintext.email/ Please avoid sending me MS-Office attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
On Thu, Feb 25, 2021 at 02:31:32PM -0500, Gary Johnson <lambdatronic at disroot.org> wrote a message of 323 lines which said: > 2. $DOCUMENT_ROOT/.well-known/metadata.gmi AFAIK, we do not have a companion spec for well-known, no? => gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5785.txt RFC 5785 on .well-known
> On Feb 26, 2021, at 14:35, Stephane Bortzmeyer <stephane at sources.org> wrote: > > AFAIK, we do not have a companion spec for well-known, no? well-known is unknown to gemini. At time, I wonder what "radical familiarity" actually means in the context of gemini crocket ? as gemini doesn't exhibit any traits which could reasonably be qualified as "radical", nor "familiar". This will stay a mystery forever. ?0?
Stephane Bortzmeyer <stephane at sources.org> writes: > AFAIK, we do not have a companion spec for well-known, no? > > => gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5785.txt RFC 5785 on .well-known My preference is really for option 1. $DOCUMENT_ROOT/.metadata.gmi. I included the .well-known possibility for discussion because it's use has come up in the past w.r.t. the robots.txt location I believe. -- GPG Key ID: 7BC158ED Use `gpg --search-keys lambdatronic' to find me Protect yourself from surveillance: https://emailselfdefense.fsf.org ======================================================================= () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments Why is HTML email a security nightmare? See https://useplaintext.email/ Please avoid sending me MS-Office attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
---
Previous Thread: [tech] Announcing cl-gemini-client 1.0.0
Next Thread: [Clients] Gemini and accessibility regarding preformatted code blocks