Unicode vs. the World

On Thu, Dec 17, 2020 at 2:39 AM Bj?rn W?rmedal <bjorn.warmedal at gmail.com>
wrote:\

How can the client tell if it's percent encoded or not? If you start
> by decoding it you distort the filename. If you just assume it isn't
> percent encoded and go ahead and do that you will handle this link
> correctly but break any links that are already percent encoded.

Exactly.  To make things worse, space is a protocol element in link lines
and *can't* be left unencoded by the author, whichever way we choose.


> We can decide to *always* percent encode links in gemtext (as the spec
> states now) or to *never* do it, but I don't see how we can reasonably
> have both.


I agree.  But what we can have (and it's messy, but not as messy as the
alternatives) is "authors encode percent and space" and "clients encode all
other reserved and non-ASCII characters."

> Consider another hypothetical case:

=> teddybearoftheyear.com/vote?ew0k%20The%20Great Vote for me!
>

That's the best you can do.  But in the case where the link line is

> => teddybearoftheyear.com/vote?????%20???????
> <http://teddybearoftheyear.com/vote?ew0k%20The%20Great> ??????? ?? ????! [1]
> [2] [3]
>
then the client must translate it for sending over the wire into

gemini://
teddybearoftheyear.com/vote?%D0%98%D0%B2%D0%B0%D0%BD%20%D0%93%D1%80%D0%BE%D
0%B7%D0%BD%D1%8B%D0%B9
<http://teddybearoftheyear.com/vote?ew0k%20The%20Great>

because making the author type all that is wholly abominable.  Online
URL-encoders are not that helpful, because they give you + instead of %20.

[1] This is Ivan the Terrible, who for most of his life was actually a
quite effective tsar despite his (occupational) paranoia and a serious
outbreak of madness just before he died; a better translation would be
"Ivan the Formidable".  Still, nobody would call him a teddy bear (and so
his ukase "Vote for me!" would probably be in vain).

[2] The latest spec change makes this line incorrect unless "
teddybearoftheyear.com/vote" is to be interpreted as a relative path.  It
needs to be prefixed by "gemini://" or at the very least "//".

[3] If the space had not been %-encoded by the author, the Tsar's second
name would be part of the link name and not part of the IRI.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
'My young friend, if you do not now, immediately and instantly, pull
as hard as ever you can, it is my opinion that your acquaintance in the
large-pattern leather ulster' (and by this he meant the Crocodile) 'will
jerk you into yonder limpid stream before you can say Jack Robinson.'
        --the Bi-Coloured-Python-Rock-Snake
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201217/7e96
8ab6/attachment.htm>

---

Previous in thread (24 of 34): 🗣️ Jason McBrayer (jmcbray (a) carcosa.net)

Next in thread (26 of 34): 🗣️ Katarina Eriksson (gmym (a) coopdot.com)

View entire thread.