Some new tests in the Gemini Client Torture Test

1. Sean Conner (sean (a) conman.org)


  I just added 10 new tests to the Gemini Client Torture Test, tests 41
through 50.  They all test section 5.4.1 of the Gemini Specification (text
lines).  Each page contains a line that exceeds 8,500 bytes (yes, bytes, not
characters, although some of then exceed 8,500 characters, depends upon the
characters used).  A few mild spoilers:

	Some have the spaces replaced with dashes.
	Some have no spaces, dashes or any puntuation to speak of.
	Some have Unicode combining characters.

  I do apologize for the snark in test 50, but it represents one of the many
aspects that I dislike about Unicode in general.  

  I expect these tests to be among the hardest to deal with for a client. 
You have been warned.  If anyone thinks these tests are unfair, well, here's
the thread to discuss it.

  -spc

Link to individual message.

2. Luke Emmet (luke.emmet (a) gmail.com)

Hi Sean

Thanks for the new tests - I just ran them on GemiNaut.

I believe my client GemiNaut is doing the right thing for all of them. 
But this is because I'm using a system web browser control to do the 
hard work of the text rendering. Apart from the ones mentioned below 
which I disagree with, it wraps them all as I would expect.

The only quibble I would have with the tests is the ones with no spacers 
at all (43, 46 and 49). I don't agree that the client should try to 
hyphenate the words. Doing so is a non-trivial problem and for a real 
language is very language specific (where are the syllable boundaries 
perhaps). So the correct thing to do is to simply lay them out in a 
non-wrapped line. I don't think there could be any authority about how 
to wrap an arbitrary sequence of unicode points? If there is such a 
thing please say.

So I think for those tests, you should say that the client should not 
crash, but should display either a) as a wrapped link (wrapped any old 
how for those clients that insist on forcing a wrap, perhaps on mobile 
or b) as a single unwrapped line with a scrolling mechanism.

A suggestion for a possible improvement - it would be helpful if there 
was a "back to tests index" link on each page, that way you can choose a 
few tests from the index, then go back to the index when you are done - 
otherwise you might have to go back N times, which is not quite as nice.

Best Wishes

  - Luke

On 20-Jun-2020 02:28, Sean Conner wrote:
>    I just added 10 new tests to the Gemini Client Torture Test, tests 41
> through 50.  They all test section 5.4.1 of the Gemini Specification (text
> lines).  Each page contains a line that exceeds 8,500 bytes (yes, bytes, not
> characters, although some of then exceed 8,500 characters, depends upon the
> characters used).  A few mild spoilers:
>
> 	Some have the spaces replaced with dashes.
> 	Some have no spaces, dashes or any puntuation to speak of.
> 	Some have Unicode combining characters.
>
>    I do apologize for the snark in test 50, but it represents one of the many
> aspects that I dislike about Unicode in general.
>
>    I expect these tests to be among the hardest to deal with for a client.
> You have been warned.  If anyone thinks these tests are unfair, well, here's
> the thread to discuss it.
>
>    -spc
>

Link to individual message.

3. Luke Emmet (luke (a) marmaladefoo.com)

Hi Sean

Thanks for the new tests - I think they are all reasonable, even the 
last one.

I just ran them on GemiNaut and I believe it is doing the right thing 
for all of them, which is to wrap at the word, hyphen or soft-hypen. But 
I shouldn't claim much credit, I'm just using a system control to 
display the text content, which does the hard work of rendering the 
unicode into the display.

For tests 43, 46 and 49, it is unclear in the test what you expect 
should happen. Do you just want the client to confirm it doesn't barf on 
the content, but can display it somehow? For these I don't agree that 
the client should try to split the content as doing so is a non-trivial 
problem and for a real language is very language specific (where are the 
syllable boundaries perhaps). So the correct thing to do is to simply 
lay them out in a non-wrapped line. Or if your client takes a hard line 
it could arbitrarily break the content up (perhaps on mobile).

I think the main thing is that the character content is displayed and 
the client can continue. I think the preference should be for an 
unwrapped line with a scrolling mechanism.

A suggestion for a possible improvement - it would be helpful if there 
was a "back to tests index" link on each page, that way you can choose a 
few tests from the index, then go back to the index when you are done - 
otherwise you might have to go back N times, which is not quite as nice.

Best Wishes

  - Luke


On 20-Jun-2020 02:28, Sean Conner wrote:
>    I just added 10 new tests to the Gemini Client Torture Test, tests 41
> through 50.  They all test section 5.4.1 of the Gemini Specification (text
> lines).  Each page contains a line that exceeds 8,500 bytes (yes, bytes, not
> characters, although some of then exceed 8,500 characters, depends upon the
> characters used).  A few mild spoilers:
>
> 	Some have the spaces replaced with dashes.
> 	Some have no spaces, dashes or any puntuation to speak of.
> 	Some have Unicode combining characters.
>
>    I do apologize for the snark in test 50, but it represents one of the many
> aspects that I dislike about Unicode in general.
>
>    I expect these tests to be among the hardest to deal with for a client.
> You have been warned.  If anyone thinks these tests are unfair, well, here's
> the thread to discuss it.
>
>    -spc
>

Link to individual message.

4. Sean Conner (sean (a) conman.org)

It was thus said that the Great Luke Emmet once stated:
> Hi Sean
> 
> Thanks for the new tests - I think they are all reasonable, even the 
> last one.

  Heh, I was a bit worried about that last one.

> I just ran them on GemiNaut and I believe it is doing the right thing 
> for all of them, which is to wrap at the word, hyphen or soft-hypen. 

  There is a Unicode Line Breaking Algorithm:

	https://www.unicode.org/reports/tr14/

  Warning:  it's long.  Longer than the Gemini specification, and it's in
it's 13th revision.  This is *not* an easy problem.

> But 
> I shouldn't claim much credit, I'm just using a system control to 
> display the text content, which does the hard work of rendering the 
> unicode into the display.
> 
> For tests 43, 46 and 49, it is unclear in the test what you expect 
> should happen. 

  I think "not crash" is a good starting point.  Past that, it's up to the
client.  The way I've done it (in my gopher client) is to just simply wrap
the text (being careful to not break between a glyph codepoint and a
combining codepoint).  Given this:

LoremipsumdolorsitametconsecteturadipiscingelitInlaciniasemperfringillaDone
cvehiculafermentummaximusAliquamegetfelisquamCrasegetullamcorpernuncSuspend
isseidlaoreetrisusUtaerosmiMaurisaloremposuereelementumjustosedpellentesque
quamAeneansagittisquameupretiumporttitorInnecsemaenimvehiculafaucibusAenean
luctusnonloremacblanditIntegertinciduntlectusnecpulvinarcongueestenimpharet
raliberoutgravidadolorneque

And a "screen width" of say, 40 characters, you end up with:

Loremipsumdolorsitametconsecteturadipisc
ingelitInlaciniasemperfringillaDonecvehi
culafermentummaximusAliquamegetfelisquam
CrasegetullamcorpernuncSuspendisseidlaor
eetrisusUtaerosmiMaurisaloremposuereelem
entumjustosedpellentesquequamAeneansagit
tisquameupretiumporttitorInnecsemaenimve
hiculafaucibusAeneanluctusnonloremacblan
ditIntegertinciduntlectusnecpulvinarcong
ueestenimpharetraliberoutgravidadolorneq
ue

  Leaving it as one line and side-scrolling is another valid method. 
There's not much you can do in the face of such input.

> Do you just want the client to confirm it doesn't barf on 
> the content, but can display it somehow? 

  Pretty much.  I figure if the client doesn't crash, and does The Right
Thing (even if for some tests quite arbitrary what The Right Thing is), then
it's good to go.

> A suggestion for a possible improvement - it would be helpful if there 
> was a "back to tests index" link on each page, that way you can choose a 
> few tests from the index, then go back to the index when you are done - 
> otherwise you might have to go back N times, which is not quite as nice.

  Sounds like a good idea.  I'll probably have them added in the next day or
so.  Thank you for the suggestion.

  -spc

Link to individual message.

---

Previous Thread: [ANN] Tiny ad-hoc server

Next Thread: fingerprint art