💾 Archived View for gemi.dev › gemini-mailing-list › 000237.gmi captured on 2024-08-31 at 16:19:04. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
I just added 10 new tests to the Gemini Client Torture Test, tests 41 through 50. They all test section 5.4.1 of the Gemini Specification (text lines). Each page contains a line that exceeds 8,500 bytes (yes, bytes, not characters, although some of then exceed 8,500 characters, depends upon the characters used). A few mild spoilers: Some have the spaces replaced with dashes. Some have no spaces, dashes or any puntuation to speak of. Some have Unicode combining characters. I do apologize for the snark in test 50, but it represents one of the many aspects that I dislike about Unicode in general. I expect these tests to be among the hardest to deal with for a client. You have been warned. If anyone thinks these tests are unfair, well, here's the thread to discuss it. -spc
Hi Sean Thanks for the new tests - I just ran them on GemiNaut. I believe my client GemiNaut is doing the right thing for all of them. But this is because I'm using a system web browser control to do the hard work of the text rendering. Apart from the ones mentioned below which I disagree with, it wraps them all as I would expect. The only quibble I would have with the tests is the ones with no spacers at all (43, 46 and 49). I don't agree that the client should try to hyphenate the words. Doing so is a non-trivial problem and for a real language is very language specific (where are the syllable boundaries perhaps). So the correct thing to do is to simply lay them out in a non-wrapped line. I don't think there could be any authority about how to wrap an arbitrary sequence of unicode points? If there is such a thing please say. So I think for those tests, you should say that the client should not crash, but should display either a) as a wrapped link (wrapped any old how for those clients that insist on forcing a wrap, perhaps on mobile or b) as a single unwrapped line with a scrolling mechanism. A suggestion for a possible improvement - it would be helpful if there was a "back to tests index" link on each page, that way you can choose a few tests from the index, then go back to the index when you are done - otherwise you might have to go back N times, which is not quite as nice. Best Wishes - Luke On 20-Jun-2020 02:28, Sean Conner wrote: > I just added 10 new tests to the Gemini Client Torture Test, tests 41 > through 50. They all test section 5.4.1 of the Gemini Specification (text > lines). Each page contains a line that exceeds 8,500 bytes (yes, bytes, not > characters, although some of then exceed 8,500 characters, depends upon the > characters used). A few mild spoilers: > > Some have the spaces replaced with dashes. > Some have no spaces, dashes or any puntuation to speak of. > Some have Unicode combining characters. > > I do apologize for the snark in test 50, but it represents one of the many > aspects that I dislike about Unicode in general. > > I expect these tests to be among the hardest to deal with for a client. > You have been warned. If anyone thinks these tests are unfair, well, here's > the thread to discuss it. > > -spc >
Hi Sean Thanks for the new tests - I think they are all reasonable, even the last one. I just ran them on GemiNaut and I believe it is doing the right thing for all of them, which is to wrap at the word, hyphen or soft-hypen. But I shouldn't claim much credit, I'm just using a system control to display the text content, which does the hard work of rendering the unicode into the display. For tests 43, 46 and 49, it is unclear in the test what you expect should happen. Do you just want the client to confirm it doesn't barf on the content, but can display it somehow? For these I don't agree that the client should try to split the content as doing so is a non-trivial problem and for a real language is very language specific (where are the syllable boundaries perhaps). So the correct thing to do is to simply lay them out in a non-wrapped line. Or if your client takes a hard line it could arbitrarily break the content up (perhaps on mobile). I think the main thing is that the character content is displayed and the client can continue. I think the preference should be for an unwrapped line with a scrolling mechanism. A suggestion for a possible improvement - it would be helpful if there was a "back to tests index" link on each page, that way you can choose a few tests from the index, then go back to the index when you are done - otherwise you might have to go back N times, which is not quite as nice. Best Wishes - Luke On 20-Jun-2020 02:28, Sean Conner wrote: > I just added 10 new tests to the Gemini Client Torture Test, tests 41 > through 50. They all test section 5.4.1 of the Gemini Specification (text > lines). Each page contains a line that exceeds 8,500 bytes (yes, bytes, not > characters, although some of then exceed 8,500 characters, depends upon the > characters used). A few mild spoilers: > > Some have the spaces replaced with dashes. > Some have no spaces, dashes or any puntuation to speak of. > Some have Unicode combining characters. > > I do apologize for the snark in test 50, but it represents one of the many > aspects that I dislike about Unicode in general. > > I expect these tests to be among the hardest to deal with for a client. > You have been warned. If anyone thinks these tests are unfair, well, here's > the thread to discuss it. > > -spc >
It was thus said that the Great Luke Emmet once stated: > Hi Sean > > Thanks for the new tests - I think they are all reasonable, even the > last one. Heh, I was a bit worried about that last one. > I just ran them on GemiNaut and I believe it is doing the right thing > for all of them, which is to wrap at the word, hyphen or soft-hypen. There is a Unicode Line Breaking Algorithm: https://www.unicode.org/reports/tr14/ Warning: it's long. Longer than the Gemini specification, and it's in it's 13th revision. This is *not* an easy problem. > But > I shouldn't claim much credit, I'm just using a system control to > display the text content, which does the hard work of rendering the > unicode into the display. > > For tests 43, 46 and 49, it is unclear in the test what you expect > should happen. I think "not crash" is a good starting point. Past that, it's up to the client. The way I've done it (in my gopher client) is to just simply wrap the text (being careful to not break between a glyph codepoint and a combining codepoint). Given this: LoremipsumdolorsitametconsecteturadipiscingelitInlaciniasemperfringillaDone cvehiculafermentummaximusAliquamegetfelisquamCrasegetullamcorpernuncSuspend isseidlaoreetrisusUtaerosmiMaurisaloremposuereelementumjustosedpellentesque quamAeneansagittisquameupretiumporttitorInnecsemaenimvehiculafaucibusAenean luctusnonloremacblanditIntegertinciduntlectusnecpulvinarcongueestenimpharet raliberoutgravidadolorneque And a "screen width" of say, 40 characters, you end up with: Loremipsumdolorsitametconsecteturadipisc ingelitInlaciniasemperfringillaDonecvehi culafermentummaximusAliquamegetfelisquam CrasegetullamcorpernuncSuspendisseidlaor eetrisusUtaerosmiMaurisaloremposuereelem entumjustosedpellentesquequamAeneansagit tisquameupretiumporttitorInnecsemaenimve hiculafaucibusAeneanluctusnonloremacblan ditIntegertinciduntlectusnecpulvinarcong ueestenimpharetraliberoutgravidadolorneq ue Leaving it as one line and side-scrolling is another valid method. There's not much you can do in the face of such input. > Do you just want the client to confirm it doesn't barf on > the content, but can display it somehow? Pretty much. I figure if the client doesn't crash, and does The Right Thing (even if for some tests quite arbitrary what The Right Thing is), then it's good to go. > A suggestion for a possible improvement - it would be helpful if there > was a "back to tests index" link on each page, that way you can choose a > few tests from the index, then go back to the index when you are done - > otherwise you might have to go back N times, which is not quite as nice. Sounds like a good idea. I'll probably have them added in the next day or so. Thank you for the suggestion. -spc
---