It was thus said that the Great Bj?rn W?rmedal once stated: > > > but unreasonable for it to have to urlencode the path (a common encoding > > for which libraries are ubiquitous)? > > Because ? as I tried to point out ? there is no reasonably simple > heuristic for determining whether a URL is already percent encoded or not. > And percent encoding a URL that is already percent encoded exchanges all % > characters with %25. Attempting to punycode a domain name that is already > punycoded, however, changes nothing at all. No heuristics are needed, the > client can just punycode everything. I can't say for certain what most clients do, but I'm under the impression that some (the majority?) use some existing library to parse links. The specification states that relative links are allowed in text/gemini: => ../%F0%9D%92%BB%F0%9D%92%B6%F0%9D%93%83%F0%9D%92%B8%F0%9D%93%8E.txt Some ????? stuff here but a full URI needs to be sent to the server, so some processing of the link is required (specifically, section 5.2 of RFC-3986). And existing libraries help here. The library I'm currently using will parse the above link into the following structure: { path = "../?????.txt" } Note how the text has been translated and any percent encoding has been decoded. Next, the base URL of the page: gemini://example.com/files/others/ has previously parsed (because it was needed to retrieve the page currently being viewed): { path = "/files/others/", port = 1965.000000, host = "example.com", scheme = "gemini", } The two are then merged into a single reference: { path = "/files/?????.txt" port = 1965.000000, host = "example.com", scheme = "gemini", } Then to make a request, this new link is converted into a URI to make the request: gemini://example.com/files/%F0%9D%92%BB%F0%9D%92%B6%F0%9D%93%83%F0%9D%92%B 8%F0%9D%93%8E.txt As you can see, that process has re-encoded the path, percent-encoding it. I would expect that some (the majority?) of clients are doing something similar to this---doing a conversion from percent-encoding, marging references, then converting to percent-encoding (except for the host, which needs to be converted to punycode). It would be instructive to know how clients are handling this---do they decode percent-encoded data, merge the base link to the relative link and re-encode? Or something different? -spc
---
Previous in thread (17 of 34): 🗣️ ew.gemini (ew.gemini (a) nassur.net)
Next in thread (19 of 34): 🗣️ colecmac (a) protonmail.com (colecmac (a) protonmail.com)