💾 Archived View for gemi.dev › gemini-mailing-list › 000466.gmi captured on 2023-11-04 at 12:50:39. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Given that we are *not* going to change the definition of text/gemini, but that many people want italics (including me), I have prepared some instructions aimed at programmers of client software who would like to support the _underscore emphasis_ convention. 0. Under no circumstances are underscores to be removed from the text. That compensates for less-than-infallible algorithms. 1. If an underscore appears outside an emphasized text section, and is at the beginning of a text line or after the # characters in a header line, or is preceded by a whitespace character, then it marks the beginning of an emphasized-text section (rendered as italics or in some other way). 2. If an underscore appears inside an emphasized text section, and is at the end of a line, or is followed by whitespace, sentence-terminating punctuation, or parenthesis- or quotation-terminating punctuation, then it marks the end of the emphasized-text section. These rules exclude underscores in things like snake_case_variables, while supporting most actual uses. 3. An emphasized-text section ends unconditionally at the end of a line. The attached file specifies all the Unicode whitespace and terminating punctuation, from the Unicode Character Database. There are quite a few, but you don't even need a regular expression, just a list of the characters. I hope this is helpful and/or inspirational. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org People go through the bother of Christmas because Christmas helps them to understand why they go through the bother of living out their lives the rest of the year. For one brief instant, we see human society as it should and could be, a world in which business has become the exchanging of presents and in which nothing is important except the happiness and well-being of the ultimate consumer. --Northrop Frye (1948)
Can you guys give it a rest already? NACK on behalf of my client. Gemini is good because it's not going to change. We don't need italics for "serious writing". Everyone has something different that they NEED very BADLY or else they can't SERIOUSLY use gemini, yak, yak, yak. Stop it. Gemini is good because it's simple.
On Thu, Nov 12, 2020 at 8:07 PM Drew DeVault <sir at cmpwn.com> wrote: Stop it. > Y'know, if you hadn't written your message, I wouldn't have written this reply and that would be two less messages on the list. But knock yourself out. Gemini is good because it's simple. > I agree. Some clients are bare-bones, others add all sorts of bells and whistles. That doesn't affect the simplicity of the protocol *or* the format. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org The penguin geeks is happy / As under the waves they lark The closed-source geeks ain't happy / They sad cause they in the dark But geeks in the dark is lucky / They in for a worser treat One day when the Borg go belly-up / Guess who wind up on the street.
Oooh! A bike sheeding thread! I know Drew DeVault might complain, but hey, this list wouldn't be this list unless the majority of messages were about text formatting (seriously---over half the messages are not about the protocol at all, but about text formatting). So with my introduction out of the way, let me nitpick [1] this proposal with a bunch of corner cases I can already see ... It was thus said that the Great John Cowan once stated: > Given that we are *not* going to change the definition of text/gemini, but ^^^^^ shouldn't that be _not_? Or are you going for strong emphasis here? > 1. If an underscore appears outside an emphasized text section, and is at > the beginning of a text line or after the # characters in a header line, or > is preceded by a whitespace character, then it marks the beginning of an > emphasized-text section Pattern wise, it's like: (start_of_line ?('#'*) | whitespace) '_' > (rendered as italics or in some other way). Such as bold, a larger font, a smaller font, or some other way other than using italics. Okay, got it. > 2. If an underscore appears inside an emphasized text section, and is at > the end of a line, or is followed by whitespace, sentence-terminating > punctuation, or parenthesis- or quotation-terminating punctuation, then it > marks the end of the emphasized-text section. I've found that terminating italic sections before sentence-terminating punctuation can lead to very ugly output. For example: He asked about _blandit_? Here, the italic t will run into the trailing question mark, which I feel looks terrible. That's why I tend to include sentence terminating marks within the italic section: He asked about _blandit?_ This is of less concern with periods and commas, since there isn't much of a difference, stylistic wise, between a normal period and italic period. It also sounds like you are expecting users to write stuff like _lorem ipsum dolor sit amet_ or _lorem_ipsum_dolor_sit_amet_ else, why not just say that once in an emphasized text section, the next underscore ends it. Much easier to deal with, and a bit easier to deal with when wrapping text (although I suppose one can add '_' to the list, along with whitespace and hypens). > These rules exclude underscores in things like snake_case_variables, while > supporting most actual uses. > > 3. An emphasized-text section ends unconditionally at the end of a line. Odd, but I can see why you say so, given the nature of parsing gemtext. But one unaware of that might end up writing: blah blabh _lorem ipsum dolor sit amet_ blah blah blah and wonder why the italicised text is all wrong. > The attached file specifies all the Unicode whitespace and terminating > punctuation, from the Unicode Character Database. There are quite a few, > but you don't even need a regular expression, just a list of the characters. All 352 of the characters. For now. That might be updated at the next Unicode revision. Got it. > I hope this is helpful and/or inspirational. -spc (Unicode is hard! Let's do rocketry!) [1] Can we still say that term?
On Thu, Nov 12, 2020 at 9:02 PM Sean Conner <sean at conman.org> wrote: Oooh! A bike sheeding thread! Yup. Let's sheed some bikes together! > So with my introduction out of the way, let me nitpick [1] Yes, you can say that. Head lice do not yet have a pressure group insisting that you call their eggs something more polite. > t> Given that we are *not* going to change the definition of text/gemini, > but > ^^^^^ shouldn't that be _not_? Or are you going for > strong emphasis here? > No, just habit. I write a *lot* of git-flavored Markdown. > I've found that terminating italic sections before sentence-terminating > punctuation can lead to very ugly output. Yes, though that's up to the content author. But this isn't about terminating the italic section; it's about deciding whether an underscore actually does terminate it. See below. > why not just say that once in an emphasized text section, the next > underscore ends it. So that lines like "It is important to understand that _although the standard in C is to use snake_case for variables, C compilers do not support numbers like 123_456_789_." are interpreted correctly. To put it in HTML terms, the first underscore is preceded by whitespace, so it is an <i>, and the last one is followed by terminating punctuation, so it is an </i>. The others, however, don't satisfy either rule 1 or rule 2, so the emphatic text just goes on right through them. > blah blabh _lorem ipsum dolor > sit amet_ blah blah blah > They will quickly find out that that doesn't work. Text/gemini lines are typically used in prose for paragraphs, and italic text doesn't normally cross paragraph boundaries. > That might be updated at the next Unicode revision. > That's true. But as time goes by, the new scripts with script-specific punctuation become fewer and harder to find. Until we join the Galactic Federation, there just aren't many more scripts out there. Newly invented ones tend to use Latin/Greek/Cyrillic/etc. punctuation. > -spc (Unicode is hard! Let's do rocketry!) > You kidding? This is one of the easy bits! All the work has been done for us. We don't even need regular expressions to figure it out, just keep the 352 characters in two arrays. Unless your browser runs on an Arduino, that is practically free. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org I amar prestar aen, han mathon ne nen, http://vrici.lojban.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR
---