Oooh! A bike sheeding thread! I know Drew DeVault might complain, but hey, this list wouldn't be this list unless the majority of messages were about text formatting (seriously---over half the messages are not about the protocol at all, but about text formatting). So with my introduction out of the way, let me nitpick [1] this proposal with a bunch of corner cases I can already see ... It was thus said that the Great John Cowan once stated: > Given that we are *not* going to change the definition of text/gemini, but ^^^^^ shouldn't that be _not_? Or are you going for strong emphasis here? > 1. If an underscore appears outside an emphasized text section, and is at > the beginning of a text line or after the # characters in a header line, or > is preceded by a whitespace character, then it marks the beginning of an > emphasized-text section Pattern wise, it's like: (start_of_line ?('#'*) | whitespace) '_' > (rendered as italics or in some other way). Such as bold, a larger font, a smaller font, or some other way other than using italics. Okay, got it. > 2. If an underscore appears inside an emphasized text section, and is at > the end of a line, or is followed by whitespace, sentence-terminating > punctuation, or parenthesis- or quotation-terminating punctuation, then it > marks the end of the emphasized-text section. I've found that terminating italic sections before sentence-terminating punctuation can lead to very ugly output. For example: He asked about _blandit_? Here, the italic t will run into the trailing question mark, which I feel looks terrible. That's why I tend to include sentence terminating marks within the italic section: He asked about _blandit?_ This is of less concern with periods and commas, since there isn't much of a difference, stylistic wise, between a normal period and italic period. It also sounds like you are expecting users to write stuff like _lorem ipsum dolor sit amet_ or _lorem_ipsum_dolor_sit_amet_ else, why not just say that once in an emphasized text section, the next underscore ends it. Much easier to deal with, and a bit easier to deal with when wrapping text (although I suppose one can add '_' to the list, along with whitespace and hypens). > These rules exclude underscores in things like snake_case_variables, while > supporting most actual uses. > > 3. An emphasized-text section ends unconditionally at the end of a line. Odd, but I can see why you say so, given the nature of parsing gemtext. But one unaware of that might end up writing: blah blabh _lorem ipsum dolor sit amet_ blah blah blah and wonder why the italicised text is all wrong. > The attached file specifies all the Unicode whitespace and terminating > punctuation, from the Unicode Character Database. There are quite a few, > but you don't even need a regular expression, just a list of the characters. All 352 of the characters. For now. That might be updated at the next Unicode revision. Got it. > I hope this is helpful and/or inspirational. -spc (Unicode is hard! Let's do rocketry!) [1] Can we still say that term?
---
Previous in thread (3 of 5): 🗣️ John Cowan (cowan (a) ccil.org)