<-- back to the mailing list

[SPEC] Backwards-compatible metadata in Gemini

John Cowan cowan at ccil.org

Wed Feb 24 14:03:23 GMT 2021

- - - - - - - - - - - - - - - - - - - 

On Tue, Feb 23, 2021 at 4:16 AM Lars Noodén <lars.nooden at gmx.com> wrote:

The metadata does not have to be marked up in a difficult manner to be

both machine readable and human readable. Borrowing from the link
syntax [3],
=:[<whitespace>]<TERM><whitespace><METADATA>
which could look like this in the body, but would be up to the client as
to how it is dealt with.

I agree that this syntax makes sense and is easy to read: as you say, itdegrades to plain text nicely, so there is no pressure to do anything inparticular with it. However, it makes the information available to searchengines in such a way that it is possible to find documents.

VERY IMPORTANT: I do *not* think that this convention needs to be part ofthe text/gemini spec, because understanding it is not a requirement forclients. One client (for human use) can render an =: line as an ordinarygemtext line; another client can ignore such lines; a smarter client canrender them but translate "language en" into "English language" and "formattext/html" into "HTML".

Search engines and other metadata processors have the same three choices.A simple approach to making use of metadata is to treat "creator Crowder,Mary" in the index as if it were "creator:Crowder creator:Mary", thusallowing people to search for these things Google-style, without confusingthem with subject:Crowder.

Here's some lines in the above format for characterizing one of ProjectGutenberg's books. This is an extensive example: I do not mean thattypical metadata creators will use anything this complex.

=: pgterms.ebook 22222=: creator Crowther, Mary Owens=: language en=: subject Etiquette=: type Text=: title How to Write Letters (Formerly The Book of Letters) A CompleteGuide to Correct Business and Personal Correspondence=: issued 2007-08-02=: lcc PE=: rights Public domain in the USA=: publisher Project Gutenberg=

https://www.gutenberg.org/files/22222/22222.txt =: format text/plain;charset=us-ascii =: size 392109 bytes=
https://www.gutenberg.org/ebooks/22222.kindle.images =: formatapplication/x-mobipocket-ebook =: size 3304322 bytes=
https://www.gutenberg.org/files/22222/22222-8.txt =: format text/plain;charset=iso-8859-1 =: size 392115 bytes=
https://www.gutenberg.org/ebooks/22222.kindle.noimages =: formatapplication/x-mobipocket-ebook := size 917781 bytes=
https://www.gutenberg.org/files/22222/22222-h/22222-h.htm := media-typetext/html; charset=iso-8859-1 =: size 508856 bytes=
https://www.gutenberg.org/ebooks/22222.rdf := format application/rdf+xml

Note that some metadata lines are actually links to other formats of thisbook, so a metadata-aware processor would look at links and see that afterthe URL there is an "=:" and process it as metadata. For this reason, I donot think that metadata lines should be required to be in a fixed place inthe document: I have put the links at the end because they are most likelyless important to people than the rest of the metadata.

In addition, "=:" lines can be joined together if they are related, with asecond "=:" on the same line, since that is unlikely to be part of thevalue. This provides the benefit of structured metadata with a depth of 1.

Note to Lars and other metadata people: I have simplified"dcterms.creator" to "creator" and "dcterms.subject.LCSH" to "subject", soas not to be too scary-looking. I have also omitted some of the availableformats of this particular book.

John Cowan http://vrici.lojban.org/~cowan cowan at ccil.orgUnless it was by accident that I had offended someone, I never apologized. --Quentin Crisp-------------- next part --------------An HTML attachment was scrubbed...URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210224/19c93334/attachment.htm>