Supporting optional underscores for italics


  Oooh!  A bike sheeding thread!  I know Drew DeVault might complain, but
hey, this list wouldn't be this list unless the majority of messages were
about text formatting (seriously---over half the messages are not about the
protocol at all, but about text formatting).

  So with my introduction out of the way, let me nitpick [1] this proposal
with a bunch of corner cases I can already see ...

It was thus said that the Great John Cowan once stated:
> Given that we are *not* going to change the definition of text/gemini, but
                    ^^^^^ shouldn't that be _not_?  Or are you going for
strong emphasis here?  

> 1.  If an underscore appears outside an emphasized text section, and is at
> the beginning of a text line or after the # characters in a header line, or
> is preceded by a whitespace character, then it marks the beginning of an
> emphasized-text section

  Pattern wise, it's like:

	(start_of_line ?('#'*) | whitespace) '_'

> (rendered as italics or in some other way).

  Such as bold, a larger font, a smaller font, or some other way other than
using italics.  Okay, got it.  

> 2.  If an underscore appears inside an emphasized text section, and is at
> the end of a line, or is followed by whitespace, sentence-terminating
> punctuation, or parenthesis- or quotation-terminating punctuation, then it
> marks the end of the emphasized-text section.

  I've found that terminating italic sections before sentence-terminating
punctuation can lead to very ugly output.  For example:

	He asked about _blandit_?

  Here, the italic t will run into the trailing question mark, which I feel
looks terrible.  That's why I tend to include sentence terminating marks
within the italic section:

	He asked about _blandit?_

This is of less concern with periods and commas, since there isn't much of a
difference, stylistic wise, between a normal period and italic period.

  It also sounds like you are expecting users to write stuff like

	_lorem ipsum dolor sit amet_

or

	_lorem_ipsum_dolor_sit_amet_

else, why not just say that once in an emphasized text section, the next
underscore ends it.  Much easier to deal with, and a bit easier to deal with
when wrapping text (although I suppose one can add '_' to the list, along
with whitespace and hypens).

> These rules exclude underscores in things like snake_case_variables, while
> supporting most actual uses.
> 
> 3.  An emphasized-text section ends unconditionally at the end of a line.

  Odd, but I can see why you say so, given the nature of parsing gemtext. 
But one unaware of that might end up writing:

	blah blabh _lorem ipsum dolor
	sit amet_ blah blah blah

  and wonder why the italicised text is all wrong.

> The attached file specifies all the Unicode whitespace and terminating
> punctuation, from the Unicode Character Database.  There are quite a few,
> but you don't even need a regular expression, just a list of the characters.

  All 352 of the characters.  

  For now.

  That might be updated at the next Unicode revision.

  Got it.

> I hope this is helpful and/or inspirational.

  -spc (Unicode is hard!  Let's do rocketry!)

[1]	Can we still say that term?

---

Previous in thread (3 of 5): 🗣️ John Cowan (cowan (a) ccil.org)

Next in thread (5 of 5): 🗣️ John Cowan (cowan (a) ccil.org)

View entire thread.