What is required to be IRI compliant?

On Mon, 28 Dec 2020 12:41:15 +0100
"Solderpunk" <solderpunk at posteo.net>:

> On Mon Dec 28, 2020 at 12:15 PM CET, Solene Rapenne wrote:
> 
> > Requests such as the following are working well:
> >
> > - gemini://ho?t/? ?.gmi
> > - gemini://?//??.gmi
> >
> > Honestly, I am very surprised it works  
> 
> Me too!  Are you using a third party library to parse URIs/IRIs, or did
> you implement it yourself?  People have acted like there is no easy
> availability of reliable libraries for this kind of thing in C.  If that
> is false, it would be very good to know.
> 
> To be fair, for a server, in addition to being able to parse the request
> IRI, there is also possibly the need to normalise it, e.g. the
> server's idea of its domain name might involve two separate characters
> (a basic vowel plus an accent symbol, say) while the request's version
> uses a single combined character (or the other way around).  We might
> spec that one form is required, but robustness would require checking.
> It might be that *this* is the really hard requirement, rather than
> simply parsing.
> 
> (servers seem to get off lighter than clients, as they don't e.g. need
> to do DNS lookups or resolve relative URLs - which, by the way, seems to
> be the correct terminology, not "absolutise" as I've confused people
> with earlier)
> 
> Cheers,
> Solderpunk
> 
> is something people have acteThe impression I've received from other 
people here is that
> parsing an IRI in C is prohibitively difficult., in my code
> > everything are simple char arrays (in C), but it does.
> >
> > Are there other specifics handling required for being
> > IRI compliant? I'm not sure I understood exactly what
> > it means.  
> 

the code doesn't check anything, it only serves what is requested [1].

I don't understand what you mean by normalizing the request.
For the hostname, I see no reason to write "?crire.hostname" as
"e'crire.hostname" if it what you mean.

What I see as an issue would be people using puny code if we go
using IRI. That would mean the server will have to check the puny
code of the hostname to check to a request using the punycode.

A library will certainly be required for that.


[1]:
 ```
/*
 * look for the first / after the hostname
 * in order to split hostname and uri
 */
pos = strchr(request, '/');

if (pos != NULL) {
	/* if there is a / found */
	/* separate hostname and uri */
	estrlcpy(file, pos, strlen(pos)+1);
	/* just keep hostname in request */
	pos[0] = '\0';

	/*
	 * use a default file if no file are requested this
	 * can happen in two cases gemini://hostname/
	 * gemini://hostname/directory/
	 */
	if (strlen(file) == 0)
		estrlcpy(file, "/index.gmi", 11);
	if (file[strlen(file) - 1] == '/')
		estrlcat(file, "index.gmi", sizeof(file));
} else {
	/*
	 * there are no slash / in the request
	 */
	estrlcpy(file, "/index.gmi", 11);
}
estrlcpy(hostname, request, sizeof(hostname));
 ```

---

Previous in thread (4 of 16): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (6 of 16): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

View entire thread.