[spec] IRIs, IDNs, and all that international jazz

It was thus said that the Great Petite Abeille once stated:
> 
> 
> > On Dec 27, 2020, at 00:22, Sean Conner <sean at conman.org> wrote:
> > 
> >  For the record, my own URL parsing library will just return 
> > 
> > 	Research/A/BTesting/Results
> 
> Tragic. I take back my assessment of your LPEG grammar. It's clearly
> wrong. Oh well.

  Okay, given your two examples:

	Research%2FA%2FB%20Testing%2FResults
	Research/A%2FB%20Testing/Results

what should a "proper" URL parser return?  And how should client code handle
such a construct?  Perhaps even attempt to write a URL (or IRI) parser
yourself?

  At one point, my URL parser would return the following for these:

	{
	  path =
	  {
	    "Research/A/B Testing/Results",
	  }
	}

	{
	  path = 
	  {
	    "Research",
	    "A/B Testing",
	    "Results",
	  }
	}

but I found working with such paths to be painful.  First off, how to
distinguish between

	Research/A%2FB%20Testing/Results

and

	/Research/A%2FB%20Testing/Results

  How would I specify that any URL with a path starting with "/foo" be
redirected to a path starting with "/bar"?

		/foo/this	-> /bar/this
		/foobar		-> /barbar

  And how would I deal with this in the code?

  Yes, you can say I ruined the purity of my URL parser with an ugly
pragmatic approach (keep the path a string, but decoded and ignore the
semantics of encoded delims), but there's also the saying, "Perfect is the
enemy of good."

  -spc

[1]	https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good

---

Previous in thread (102 of 109): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (104 of 109): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

View entire thread.