πŸ’Ύ Archived View for gemi.dev β€Ί gemini-mailing-list β€Ί 000756.gmi captured on 2024-03-21 at 18:03:07. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-12-28)

-=-=-=-=-=-=-

Metadata Without A Proposal

1. nothien (a) uber.space (nothien (a) uber.space)

Hi!

I've lost track of the currently raging metadata thread entirely, and so
I've started this as a new post.

Thus far, I think there's general consensus on the following needs for
any metadata proposal:

1. Must degrade gracefully for clients that don't understand metadata.

2. Must not be English-specific.
  
   Although the majority of gemtext/Gemini content is in English at the
   moment, we want more diversity.  "Forcing" (by convention) the usage
   of English upon non-English users is unwanted.
  
   This rules out some of the current proposals which are oriented around
   'tags', e.g. 'author' or 'license'.  Theoretically, you could have a
   list of tags for different languages, but that would grow into a
   horrifically long list, and is generally unsustainable.
  
3. Must be machine-parsable.
  
   Search engines, archivers, and other crawler-style clients need to be
   attended to.  Some of the information they need is: date, author, and
   license.
  
4. Should affect presentation.
  
   gemtext as a whole is about separating content from presentation.
   Some of the earlier metadata proposals referred to metadata for
   presentation, e.g. to specify a color to view the text in.  This is
   against the spirit of gemtext/Gemini (if not the spec).
  
5. Must be difficult to extend.
  
   Again, this comes from the general Gemini philosophy that anything
   that can be misused will be misused.  This rules out lots of current
   proposals because they specify tags, and the usage of tags can only be
   controlled by convention, which is subject to change.
  
6. Must be accessible.
  
   Some proposals discussed the usage of emojis, and others have opted
   for creating new unofficial line types.  These don't degrade
   gracefully for things like screen readers, until they adopt the
   metadata proposal.  That's not great.

I think that we don't need a "metadata proposal" to solve any of these
problems.  We already have everything we need in pre-existing formats
and specifications.  Only three metadata fields are really necessary:
date, author, and license.  New fields, if completely necessary, need to
be handled on a case-by-case basis.

## Dates

Dating content is mostly relevant to search engines, so that old (or
new) results can be filtered out.  My proposal with dates is to use what
we already have - the gmisub companion spec.  If any content (e.g. an
article) has an associated date, the index page should in gmisub format
list the content page with the date.  If content pages don't have any
associated date, simply don't list a date in the index.  Search engines
and crawlers can still choose to include date information based on when
they last crawled the page.

=> gemini://gemini.circumlunar.space/docs/companion/

One question this raises is what index page to use.  I think that the
engine should search through parent directories until it finds one which
fits the gmisub format and has the content page (they would need to do
this anyways in order to crawl the capsule containing the content page).
If the engine already knows about an index page which is on the same
capsule and that has the content page, it can use that.

## Licenses

We already have a great convention for licenses: giving it on the last
line of the document, with the line starting with `--`.  For example:

 ```An example of page with a license line
Hello world!

-- CC-BY-SA nothien
 ```

All we need to do with this convention is to formalize it as a companion
specification, maybe as `-- [SPDX license identifier] [owner]`.

## Authors

There are two possibilities I see with author metadata: either take it
from the license line, discussed above, or extend the gmisub spec to
also allow for an optional author field.

 ```A possible extension to the gmisub syntax with an author field
=> URL YYYY-MM-DD (Author) Title
 ```

We can tweak the format around a bit so that currently existing titles
which start with parenthesized text aren't misinterpreted.  In addition,
one shouldn't have to repeat the author field for every line; we can
have some system like only requiring the author field when it is
different from the immediately previous author.  I prefer the first
option, but I haven't explored when the license owner would differ from
the author (which I think is the case for e.g. news companies).

## Other Fields

Clearly, other fields aren't supported by this.  If you want to place
additional metadata in your content, then I suggest writing it in
natural language.  If it is absolutely necessary to have it
machine-parsable (so that it can be specially understood by e.g. search
engines) then we can talk about that here on the ML, but others have
argued against e.g. tags because they allow easily manipulating search
results.  Expect resistance.

## Metadata for Storage

Author and license metadata is stored within the page itself, and so
that's not a problem.  Personally, I store date information in the file
name of the document (e.g. 2021-02-26-proposal.gmi), but I understand
that this doesn't work for everyone: in that case, see below.

There are legitimate uses for additional metadata when storing gemtext,
such as for capsule-local tagging.  These fields should be stored using
any arbitrary convention in the content: after all, these fields are not
meant to be parsed by external client software (i.e. search engines and
crawlers), but are only parsed by capsule-local software (such as to
organize content by tag).

## Conclusion

I don't think we need a 'metadata proposal' to achieve the goals we're
looking for.  The format conventions are already mostly in place; we
just need to formalize them.

~aravk | ~nothien

Link to individual message.

2. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 11:51, nothien at uber.space wrote:
> 
> I don't think we need a 'metadata proposal' to achieve the goals we're
> looking for.  The format conventions are already mostly in place; we
> just need to formalize them.

FWIW, I personally agree with the trust of your position.

Even though I personally prefer using the existing capabilities of the 
link construct, which is the only structured artifact in text/gemini, e.g.:

LICENSE
=> https://creativecommons.org/publicdomain/zero/1.0/ rel=license Licensed 
under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication 

AUTHOR
=> gemini://gemini.circumlunar.space/users/solderpunk/solderpunk.vcf 
rel=author Authored by The One & Only Solderpunk

DATE
=> tag:gemini.circumlunar.space,2020-05-26:/dns/gemini.circumlunar.space/tc
p/1965/gemini/users/solderpunk/gemlog/the-mercury-protocol.gmi Created on May 26th 2020

Furthermore, all the RELs & schemas are already defined:

Uniform Resource Identifier (URI) Schemes
https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Link Relations
https://www.iana.org/assignments/link-relations/link-relations.xml

There is literally nothing for Gemini to invent. 

Just use what already exists.

?0?

Link to individual message.

3. CΓ΄me Chilliet (come (a) chilliet.eu)

Le vendredi 26 f?vrier 2021, 11:51:08 CET nothien at uber.space a ?crit :
> ## Licenses
> 
> We already have a great convention for licenses: giving it on the last
> line of the document, with the line starting with `--`.  For example:
> 
> ```An example of page with a license line
> Hello world!
> 
> -- CC-BY-SA nothien
> ```
> 
> All we need to do with this convention is to formalize it as a companion
> specification, maybe as `-- [SPDX license identifier] [owner]`.

Thank you, I did not know of this convention, it seems fitting for the license.

SPDX id are also a good fit, it?s a standard we can rely on.

Is there a gemini mirror of common licenses text?
Intuitively I would?ve put the license as a link to the legal text.

C?me

Link to individual message.

4. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 12:11, C?me Chilliet <come at chilliet.eu> wrote:
> 
> Intuitively I would?ve put the license as a link to the legal text.

=> https://creativecommons.org/publicdomain/zero/1.0/ rel=license Licensed 
under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication 


license	
Refers to a license associated with this context.	
[RFC4946]	
For implications of use in HTML, see: 
http://www.w3.org/TR/html5/links.html#link-type-license

https://tools.ietf.org/html/rfc4946

?0?

Link to individual message.

5. Philip Linde (linde.philip (a) gmail.com)

On Fri, 26 Feb 2021 11:51:08 +0100
nothien at uber.space wrote:

> Hi!
> 
> I've lost track of the currently raging metadata thread entirely, and so
> I've started this as a new post.
> 
> Thus far, I think there's general consensus on the following needs for
> any metadata proposal:
> 
> 1. Must degrade gracefully for clients that don't understand metadata.

Agreed.

> 2. Must not be English-specific.

What is the preferable alternative? We could use numbers to indicate
element type, but ultimately numbers are dependent on numeral systems,
which depend on language and culture.

If instead of using English directly, we define opaque strings of
characters for the tags, such that the tag "author" consistently means
"author", we really achieve the same thing. That is a simple solution
that is language independent.

Or we could use emoji, although I believe most computer users in the
world would have a harder time typing out a given emoji than a given
opaque, ASCII- and English-compatible string.

> 3. Must be machine-parsable.

We should consider the difference between needs and wants here. If I
have no interest in specifying another license to use my work than what
is implied from my sharing it, that doesn't necessarily mean I don't
want to specify date or author, so perhaps all or most elements should
be optional.

>   
> 4. Should affect presentation.
>   
>    gemtext as a whole is about separating content from presentation.
>    Some of the earlier metadata proposals referred to metadata for
>    presentation, e.g. to specify a color to view the text in.  This is
>    against the spirit of gemtext/Gemini (if not the spec).

Agreed, but as I understand it you do *not* want it to affect
presentation.

>   
> 5. Must be difficult to extend.
>   
>    Again, this comes from the general Gemini philosophy that anything
>    that can be misused will be misused.  This rules out lots of current
>    proposals because they specify tags, and the usage of tags can only be
>    controlled by convention, which is subject to change.

What do you propose that prevents conventional use from dictating
reality? And why is it important that the specification can not be
extended? Unlike e.g. text/gemini, if a client doesn't support some
superset of the tags initially specified, there is no degradation. If
in the future we want to extend a meta data format to support e.g.
specifying where, in addition to when, it was written, the clients that
don't support it shouldn't suffer from it.

The only important concern to me is that there is a canonical
description of tags. That description can be extended indefinitely as
far as I'm concerned, for as long as the original meanings of the
initial set of supported tags aren't changed or overloaded by newer
tags.

> 6. Must be accessible.
>   
>    Some proposals discussed the usage of emojis, and others have opted
>    for creating new unofficial line types.  These don't degrade
>    gracefully for things like screen readers, until they adopt the
>    metadata proposal.  That's not great.

Agreed.

I think that instead of defining ourselves what fields are important we
should start from a standard, e.g. DCMI with the element set defined in
IETF RFC 5013.

With that as a basis, if there is no suitable format already, we can
define a human readable, text-compatible data format and a corresponding
text/xyz MIME type. Then, a text/gemini document that feels like
supplying additional metadata can link to a metadata file which the
server serves with the above MIME type. A client that does not support
the MIME type should defer to serving unknown text/* types as plain
text. A client that does support it can localize the elements, including
things like names and date and time formats. If the client is a
crawler, it should find the linked metadata document as a matter of its
normal operation because it is linked from the document.

Such formats already exist, but there is little interest in authoring
such files.

In that way, no extension or change to Gemini is necessary. No
specialized sub-formats for existing line types either.

Personally I don't think this is a standard I would use either way.
It's mostly for the benefit of robots that there's a point in
formalizing information like this. Humans can interpret such
information as indicated in the document itself in a much wider variety
of formats. It's not my intention, primarily, to serve robots.

-- 
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210226/5de2
92a7/attachment.sig>

Link to individual message.

6. Solene Rapenne (solene (a) perso.pw)

On Fri, 26 Feb 2021 13:16:31 +0100
Philip Linde <linde.philip at gmail.com>:

> On Fri, 26 Feb 2021 11:51:08 +0100
> nothien at uber.space wrote:
> 
> > Hi!
> > 
> > I've lost track of the currently raging metadata thread entirely, and so
> > I've started this as a new post.
> > 
> > Thus far, I think there's general consensus on the following needs for
> > any metadata proposal:
> > 
> > 1. Must degrade gracefully for clients that don't understand metadata.  
> 
> Agreed.
> 
> > 2. Must not be English-specific.  
> 
> What is the preferable alternative? We could use numbers to indicate
> element type, but ultimately numbers are dependent on numeral systems,
> which depend on language and culture.
> 
> If instead of using English directly, we define opaque strings of
> characters for the tags, such that the tag "author" consistently means
> "author", we really achieve the same thing. That is a simple solution
> that is language independent.
> 
> Or we could use emoji, although I believe most computer users in the
> world would have a harder time typing out a given emoji than a given
> opaque, ASCII- and English-compatible string.

So you wouldn't mind using ?????? instead of "author" if we agree
it can be opaque to the users?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: Signature digitale OpenPGP
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210226/2b1c
7129/attachment.sig>

Link to individual message.

7. nothien (a) uber.space (nothien (a) uber.space)

Philip Linde <linde.philip at gmail.com> wrote:
> > 2. Must not be English-specific.
> 
> What is the preferable alternative? We could use numbers to indicate
> element type, but ultimately numbers are dependent on numeral systems,
> which depend on language and culture.

... Did you read the rest of the e-mail?  I listed specific ways in
which we can support date, author, and license metadata, using existing
formats and conventions, none of which use English (for licenses,
instead of SPDX license identifiers, Petite Abeille has neatly suggested
using links, although I don't agree with their format).

> If instead of using English directly, we define opaque strings of
> characters for the tags, such that the tag "author" consistently means
> "author", we really achieve the same thing. That is a simple solution
> that is language independent.
> 
> Or we could use emoji, although I believe most computer users in the
> world would have a harder time typing out a given emoji than a given
> opaque, ASCII- and English-compatible string.

So you want to force non-English Gemini writers to use English words?
When (seeing my original proposal) it's unnecessary?  Imagine if you had
to end every Gemini document with the magic incantanation "?????????G?".
That doesn't seem fun.

> > 3. Must be machine-parsable.
> 
> We should consider the difference between needs and wants here. If I
> have no interest in specifying another license to use my work than what
> is implied from my sharing it, that doesn't necessarily mean I don't
> want to specify date or author, so perhaps all or most elements should
> be optional.

The sole purpose of giving a fixed format to metadata is so that it is
machine parsable; all other metadata can simply be stated using natural
language.  And yes, if you read the rest of my e-mail, you would notice
that everything in it is completely optional.

> > 4. Should affect presentation.
> >   
> >    gemtext as a whole is about separating content from presentation.
> >    Some of the earlier metadata proposals referred to metadata for
> >    presentation, e.g. to specify a color to view the text in.  This is
> >    against the spirit of gemtext/Gemini (if not the spec).
> 
> Agreed, but as I understand it you do *not* want it to affect
> presentation.

Yep, typo.

> > 5. Must be difficult to extend.
> > 
> >   ...
> 
> What do you propose that prevents conventional use from dictating
> reality? And why is it important that the specification can not be
> extended? Unlike e.g. text/gemini, if a client doesn't support some
> superset of the tags initially specified, there is no degradation. If
> in the future we want to extend a meta data format to support e.g.
> specifying where, in addition to when, it was written, the clients
> that don't support it shouldn't suffer from it.
> 
> The only important concern to me is that there is a canonical
> description of tags. That description can be extended indefinitely as
> far as I'm concerned, for as long as the original meanings of the
> initial set of supported tags aren't changed or overloaded by newer
> tags.

Non-extensibility is a fundamental part of the spirit of Gemini.  We
want to prevent metadata from being used for all but the specified
purposes so that it is not misused in the future.  Consider, for
example, a 'color' metadata key that had been suggested early on in the
original metadata thread.  We want to prevent these kinds of misuses
from happening at all.  Notice that my proposal-not-proposal handles
each metadata field on a case-by-case basis; there is no way provided to
handle additional fields.  In addition, I've stated that other metadata
fields, which don't have to be known to search engines, can use an
arbitrary, capsule-specific convention, so that you can use additional
metadata fields internally.

> I think that instead of defining ourselves what fields are important
> we should start from a standard, e.g. DCMI with the element set
> defined in IETF RFC 5013.
> 
> With that as a basis, if there is no suitable format already, we can
> define a human readable, text-compatible data format and a
> corresponding text/xyz MIME type. Then, a text/gemini document that
> feels like supplying additional metadata can link to a metadata file
> which the server serves with the above MIME type. A client that does
> not support the MIME type should defer to serving unknown text/* types
> as plain text. A client that does support it can localize the
> elements, including things like names and date and time formats. If
> the client is a crawler, it should find the linked metadata document
> as a matter of its normal operation because it is linked from the
> document.

This has a few problems:

1. It is extensible.  As I've argued above, we don't want extensibility.
   This would mean that we have to have a very strict format for this
   metadata file, and given how few fields are really necessary to be
   machine-parsable, this would be a very small file.  With my proposal,
   we can embed all the necessary metadata into the existing files.

2. The keys specified in IETF RFC 5013 are English-specific.  As I've
   explained in my original mail, this is not sustainable for
   non-English Gemini clients and writers, as either the writers are
   forced to use English (bad), or the clients are forced to support the
   same keywords across a /lot/ of languages (bad).

> Personally I don't think this is a standard I would use either way.
> It's mostly for the benefit of robots that there's a point in
> formalizing information like this. Humans can interpret such
> information as indicated in the document itself in a much wider
> variety of formats. It's not my intention, primarily, to serve robots.

Many gemlogs use the gmisub format, which is essentially providing date
metadata.  There are uses, and making your content understandable to
'robots' will also make it understandable to the users behind them.  One
particularly helpful area that my proposal-not-proposal provides for is
basic search engine filtering (by date, author, and license).

~aravk | ~nothien

Link to individual message.

8. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 13:54, Solene Rapenne <solene at perso.pw> wrote:
> 
> So you wouldn't mind using ?????? instead of "author" if we agree
> it can be opaque to the users?

Wow. Brutal :P

One thing is crystal clear: we do not share any common ground whatsoever 
on what "metadata" means :D

You should go talk to the RDF people ?. They would love your collective input :)

?0?


? https://en.wikipedia.org/wiki/Resource_Description_Framework

Link to individual message.

9. nothien (a) uber.space (nothien (a) uber.space)

Petite Abeille <petite.abeille at gmail.com> wrote:
> Even though I personally prefer using the existing capabilities of the
> link construct, which is the only structured artifact in text/gemini,
> e.g.:
> 
> LICENSE
> => https://creativecommons.org/publicdomain/zero/1.0/ rel=license 
Licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication 
> 
> AUTHOR
> => gemini://gemini.circumlunar.space/users/solderpunk/solderpunk.vcf 
rel=author Authored by The One & Only Solderpunk
> 
> DATE
> => tag:gemini.circumlunar.space,2020-05-26:/dns/gemini.circumlunar.space/
tcp/1965/gemini/users/solderpunk/gemlog/the-mercury-protocol.gmi Created on May 26th 2020

The primary issue with this is that you're relying on English words
('LICENSE', 'AUTHOR', 'DATE'), which isn't sustainable or accessible.
The same holds for the rel= type tagging.  In addition, it is
extensible, which (as I've detailed) is not necessary and is against the
spirit of Gemini.

Typing out these links is also going to be time-consuming, which is
irritating when you're in the middle of writing a post.  It's like we're
adding boilerplate text; we want to make it as small and unnoticeable as
possible, which is contradictory to using links.

Finally, (and this is a personal opinion), I don't like
non-network-related schemes (except for mailto).  I also think that a
lot of the information is repetitive.

However, I think that using a URL for licenses isn't a bad idea, because
it removes dependency upon SPDX license identifiers, which are
English-oriented.  At the same time, I don't know of any licenses which
are written in other languages (please correct me if I'm wrong), which
means that anybody licensing their work is using a license written in
English, and so they may as well use the SPDX license identifier.

~aravk | ~nothien

Link to individual message.

10. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 14:17, nothien at uber.space wrote:
> 
> The primary issue with this is that you're relying on English words

Thank you for reaching out.

But really... it's all good. We have no common ground to share in this 
conversation ? that much is clear.

No drama though. This is all a hobby. No one cares :)

In the memorable words of President Dale, of Mars Attacks! fame:

"Why can't we work out our differences? Why can't we work things out? 
Little people, why can't we all just get along?"

Sometime, there is nothing whatsoever to work out at all. 

Mutual indifference is best :)

?0?

Link to individual message.

11. Jason McBrayer (jmcbray (a) carcosa.net)

nothien at uber.space writes:
> ## Conclusion
>
> I don't think we need a 'metadata proposal' to achieve the goals we're
> looking for.  The format conventions are already mostly in place; we
> just need to formalize them.

That's pretty fair. I'd add that metadata is not really all it's cracked
up to be, in an open system like Gemini (or the web). It works fine
within a site or an organization, but it almost never works to depend on
"other people's" metadata for anything.

Cory Doctorow wrote about it in 2001:
https://people.well.com/user/doctorow/metacrap.htm 


-- 
Jason McBrayer      | ?Strange is the night where black stars rise,
jmcbray at carcosa.net | and strange moons circle through the skies,
                    | but stranger still is lost Carcosa.?
                    | ? Robert W. Chambers,The King in Yellow

Link to individual message.

12. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 14:30, Jason McBrayer <jmcbray at carcosa.net> wrote:
> 
> Cory Doctorow wrote about it in 2001:
> https://people.well.com/user/doctorow/metacrap.htm 

Ohhhhh! METACRAP! Blast from the past. Thanks :)

And yet. All those endless conversations, 20 years ago, did bare fruits. 
And helped moved the discussion along. Except for RDF. :P

?0?

Link to individual message.

13. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 14:30, Jason McBrayer <jmcbray at carcosa.net> wrote:
> 
> https://people.well.com/user/doctorow/metacrap.htm 

Here is another one:

Semantic Web, proper noun An attempt to apply the Dewey Decimal system to an orgy.

The good old days :D

?0?

Link to individual message.

14. Petite Abeille (petite.abeille (a) gmail.com)



> On Feb 26, 2021, at 11:51, nothien at uber.space wrote:
> 
> I've lost track of the currently raging metadata thread entirely, and so
> I've started this as a new post.

TGIF silliness:

meta, modifier Masturbatory, but without the potential for a satisfying conclusion.

?Let?s discuss the mailing list?s meta-issues.?

?0?

Link to individual message.

15. Oliver Simmons (oliversimmo (a) gmail.com)

On Fri, 26 Feb 2021 at 10:51, <nothien at uber.space> wrote:
>
> I've lost track of the currently raging metadata thread entirely, and so
> I've started this as a new post.

Good choice :)



> 3. Must be machine-parsable.
>
>    Search engines, archivers, and other crawler-style clients need to be
>    attended to.  Some of the information they need is: date, author, and
>    license.

Every form of somewhat organised info is "machine readable", only
sentences and stuff aren't.
(although ML is getting really good - but i don't think anyone want to use that)



> 5. Must be difficult to extend.
>
>    Again, this comes from the general Gemini philosophy that anything
>    that can be misused will be misused.  This rules out lots of current
>    proposals because they specify tags, and the usage of tags can only be
>    controlled by convention, which is subject to change.

We use a text-based format, so this is semi-bogus.
I can easily add stuff, such as styling, to my documents *without* a
tag format and make software to support it - without extending the
spec.
Gemini wants to be "non-extensible", but having freeform text breaks that.
This is an unfixable problem though, and just a side effect of what Gemini is.



> ## Dates
>
> My proposal with dates is to use what we already have - the gmisub companion spec.
[...]
> Search engines
> and crawlers can still choose to include date information based on when
> they last crawled the page.

This would only really work for things that are looking at sites as a
whole, mainly search engines.
My issue with these metadata in separate location ideas is that it
creates additional work, and network requests, to get the info about
one file.
Also for more one-off things with dates, creating gmisub stuff for it
is slightly overboard.



> ## Licenses

This is really nice, I didn't know there was a convention for it.

> ## Authors
>
> There are two possibilities I see with author metadata: either take it
> from the license line, discussed above, or extend the gmisub spec to
> also allow for an optional author field.

See above about gmisub.
The licence line makes most sense to me, however not everyone adds
licenses (meaning they get copyright), and may still want their name
on it, the current method of licence-first doesn't work in this case.

- Oliver Simmons
-- DBAD

(`- name` is how I sign my emails and stuff when I remember)
There's probably many other ways this could be done, the above was
just a quickly typed example.

In the example I have pointed out a second issue - licenses that aren't in SPDX.
I'm not entirely sure what SPDX is, but from a quick search it appears
it doesn't contain the DBAD license (which is what I personally use
for stuff I really don't care about).

=> https://dbad-license.org/



> ## Other Fields
>
> Clearly, other fields aren't supported by this.  If you want to place
> additional metadata in your content, then I suggest writing it in
> natural language.  If it is absolutely necessary to have it
> machine-parsable (so that it can be specially understood by e.g. search
> engines) then we can talk about that here on the ML, but others have
> argued against e.g. tags because they allow easily manipulating search
> results.  Expect resistance.

Agreed on this, tag metadata formats are just a catch-all, and
catch-alls are typically bad.



> ## Conclusion
>
> I don't think we need a 'metadata proposal' to achieve the goals we're
> looking for.  The format conventions are already mostly in place; we
> just need to formalize them.

I agree that the catch-all metadata proposals are unneeded, I think we
should stop with them.
I would also think we should start calling them catch-all metadata or
something similar, there's a distinction between a generic format that
allows any metadata, and dedicated formats for individual pieces of
metadata, such as dates, authors and licenses.



- Oliver Simmons

Link to individual message.

16. nothien (a) uber.space (nothien (a) uber.space)

Glad to see we agree on most of the details.

Oliver Simmons <oliversimmo at gmail.com> wrote:
> Every form of somewhat organised info is "machine readable", only
> sentences and stuff aren't.

My point is that thus far, we've kept Gemini software from trying to
read any natural language.  I want to keep it that way.

> > 5. Must be difficult to extend.
> >
> >    ...
> 
> We use a text-based format, so this is semi-bogus.  I can easily add
> stuff, such as styling, to my documents *without* a tag format and
> make software to support it - without extending the spec.  Gemini
> wants to be "non-extensible", but having freeform text breaks that.
> This is an unfixable problem though, and just a side effect of what
> Gemini is.

I disagree, because although you _can_ use additional syntaxes and write
software to support it, Gemini is already too big to spread to make it
a shared consensus from any one person's content alone.  However, I want
to stay on the safe side (in particular, with the 'individual actor'
assumption), and so I'm trying to prevent adding new areas of
extensibility.

> > ## Dates
> >
> > My proposal with dates is to use what we already have - the gmisub
> > companion spec.
> > [...]
> > Search engines and crawlers can still choose to include date
> > information based on when they last crawled the page.
> 
> This would only really work for things that are looking at sites as a
> whole, mainly search engines.  My issue with these metadata in
> separate location ideas is that it creates additional work, and
> network requests, to get the info about one file.  Also for more
> one-off things with dates, creating gmisub stuff for it is slightly
> overboard.

You're right, this only works for crawlers.  What other use cases can
you think of where the software has to know the date of external
content?

If there are actual use cases, then we can probably tweak the license
line format, using full dates instead of just the year.  For example:

 ```Example license line using a ISO 8601 date
-- ? 2021-02-26 nothien
 ```

> > ## Licenses
> 
> This is really nice, I didn't know there was a convention for it.

Yeah, it's pretty neat!

> > ## Authors
> >
> > There are two possibilities I see with author metadata: either take
> > it from the license line, discussed above, or extend the gmisub spec
> > to also allow for an optional author field.
> 
> See above about gmisub.  The licence line makes most sense to me,
> however not everyone adds licenses (meaning they get copyright), and
> may still want their name on it, the current method of licence-first
> doesn't work in this case.

The format can be tweaked, sure.  But I think it's more important first
to agree that this is the way to go before trying to choose a specific
syntax.

> In the example I have pointed out a second issue - licenses that
> aren't in SPDX.  I'm not entirely sure what SPDX is, but from a quick
> search it appears it doesn't contain the DBAD license (which is what I
> personally use for stuff I really don't care about).
> 
> => https://dbad-license.org/

Oh no.  I don't want non-SPDX licenses to be second-class citizens.  The
most 'correct' way to deal with this in Gemini, IMO, is to use a URL (so
a link line) to the license, but I'm worried about the extra typing
needed, which would put authors off typing out license lines.
Boilerplate is a powerful tool for generating irritation.  However,
short links in most cases (e.g. 'gemini://spdx.dev/GPL-3.0-or-later',
although this doesn't work) should help ease that issue, while allowing
for other licenses to be used.  Of course, this would require reworking
the syntax, but as I said above, that's fine.  Thoughts?

> > ## Conclusion
> 
> I agree that the catch-all metadata proposals are unneeded, I think we
> should stop with them.  I would also think we should start calling
> them catch-all metadata or something similar, there's a distinction
> between a generic format that allows any metadata, and dedicated
> formats for individual pieces of metadata, such as dates, authors and
> licenses.

Not a bad idea.

~aravk | ~nothien

Link to individual message.

---

Previous Thread: [Clients] Gemini and accessibility regarding preformatted code blocks

Next Thread: John Cowan bails out