💾 Archived View for gemi.dev › gemini-mailing-list › 000398.gmi captured on 2023-11-04 at 12:46:45. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-12-28)

-=-=-=-=-=-=-

Asian Text in URLs

mieum <mieum (a) tilde.team>

Hi Everyone,

I recently decided to start an experimental Gemlog written in Korean. I
found, however, that some clients do not like having Asian text in the
path of the URL---either as the name of a folder or a file.

What is the general consensus about non-latin characters in URLs? The
Gemini Spec doesn't say anything that would lead me to believe I
shouldn't name my files and folders in Korean. Also, for most clients it
is apparently a non-issue. 

For an example, you can point your client to gemini://namu.blue/??/

Some clients will crash, some will not attempt to load the page, and
some will just return a Permanent Failure. Some will even crash 
attempting to parse the link to that page on gemini://namu.blue/. 

Anyway, I just wanted to see what everyone thinks about the best 
practice in this situation. 

Thanks!
~mieum

gemini://rawtext.club/~mieum/

Link to individual message.

Sandra Snan <sandra.snan (a) idiomdrottning.org>

https://stackoverflow.com/questions/2742852/unicode-characters-in-urls
Gemini uses URLs.

So ? ??, but, percent encoded.

"mieum" <mieum at tilde.team> writes:

> Hi Everyone,
>
> I recently decided to start an experimental Gemlog written in Korean. I
> found, however, that some clients do not like having Asian text in the
> path of the URL---either as the name of a folder or a file.
>
> What is the general consensus about non-latin characters in URLs? The
> Gemini Spec doesn't say anything that would lead me to believe I
> shouldn't name my files and folders in Korean. Also, for most clients it
> is apparently a non-issue. 
>
> For an example, you can point your client to gemini://namu.blue/??/
>
> Some clients will crash, some will not attempt to load the page, and
> some will just return a Permanent Failure. Some will even crash 
> attempting to parse the link to that page on gemini://namu.blue/. 
>
> Anyway, I just wanted to see what everyone thinks about the best 
> practice in this situation. 
>
> Thanks!
> ~mieum
>
> gemini://rawtext.club/~mieum/

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great mieum once stated:
> Hi Everyone,
> 
> I recently decided to start an experimental Gemlog written in Korean. I
> found, however, that some clients do not like having Asian text in the
> path of the URL---either as the name of a folder or a file.
> 
> What is the general consensus about non-latin characters in URLs? 

  The actual specification for URLs is RFC-3986, and that lists the valid
characters for the path portion of a URL, which are:

	abcdefghijklmnopqrstuvwxyz
	ABCDEFGHIJKLMNOPQRSTUVWXYZ
	0123456789
	-._~
	!{body}amp;'()*+,;=:@

  ANY OTHER CHARACTER has to be percent-encoded.  


> For an example, you can point your client to gemini://namu.blue/??/

  It should be:

	gemini://namu.blue/%EC%8C%8D%EB%A1%9D/

The server will then have to decode the percent-encoded data to get the
proper file to use.
  
> Anyway, I just wanted to see what everyone thinks about the best 
> practice in this situation. 

  Ensure you are using UTF-8 on the server, and percent-encode the path when
generating the URL.

  -spc

Link to individual message.

mieum <mieum (a) tilde.team>

Thank you, Sandra! I was unaware of this!
Sorry to ask such a basic question here!

~mieum

Link to individual message.

mieum <mieum (a) tilde.team>

Thanks Sean, I appreciate your thorough response! I'll look into
properly encoding everything :)

Link to individual message.

Sandra Snan <sandra.snan (a) idiomdrottning.org>

Now, I wouldn't be opposed to Gemini instead using iris with utf8mb4
supported both RTL and LTR. I rail against change on here but that would
be a welcome change. There are some severe security risks that come with
such a change so it's not undertaken lightly. As discussed earlier with
the NN/.. issue.

Sandra Snan <sandra.snan at idiomdrottning.org> writes:

> https://stackoverflow.com/questions/2742852/unicode-characters-in-urls
> Gemini uses URLs.
>
> So ? ??, but, percent encoded.
>
> "mieum" <mieum at tilde.team> writes:
>
>> Hi Everyone,
>>
>> I recently decided to start an experimental Gemlog written in Korean. I
>> found, however, that some clients do not like having Asian text in the
>> path of the URL---either as the name of a folder or a file.
>>
>> What is the general consensus about non-latin characters in URLs? The
>> Gemini Spec doesn't say anything that would lead me to believe I
>> shouldn't name my files and folders in Korean. Also, for most clients it
>> is apparently a non-issue. 
>>
>> For an example, you can point your client to gemini://namu.blue/??/
>>
>> Some clients will crash, some will not attempt to load the page, and
>> some will just return a Permanent Failure. Some will even crash 
>> attempting to parse the link to that page on gemini://namu.blue/. 
>>
>> Anyway, I just wanted to see what everyone thinks about the best 
>> practice in this situation. 
>>
>> Thanks!
>> ~mieum
>>
>> gemini://rawtext.club/~mieum/

Link to individual message.

---

Previous Thread: Notes on Molly Brown in FreeBSD

Next Thread: [ANN] Satellite, yet another gemini server