💾 Archived View for gemi.dev › gemini-mailing-list › 000466.gmi captured on 2023-11-04 at 12:50:39. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-12-28)

-=-=-=-=-=-=-

Supporting optional underscores for italics

John Cowan <cowan (a) ccil.org>

Given that we are *not* going to change the definition of text/gemini, but
that many people want italics (including me), I have prepared some
instructions aimed at programmers of client software who would like to
support the _underscore emphasis_ convention.

0.  Under no circumstances are underscores to be removed from the text.
That compensates for less-than-infallible algorithms.

1.  If an underscore appears outside an emphasized text section, and is at
the beginning of a text line or after the # characters in a header line, or
is preceded by a whitespace character, then it marks the beginning of an
emphasized-text section (rendered as italics or in some other way).

2.  If an underscore appears inside an emphasized text section, and is at
the end of a line, or is followed by whitespace, sentence-terminating
punctuation, or parenthesis- or quotation-terminating punctuation, then it
marks the end of the emphasized-text section.

These rules exclude underscores in things like snake_case_variables, while
supporting most actual uses.

3.  An emphasized-text section ends unconditionally at the end of a line.

The attached file specifies all the Unicode whitespace and terminating
punctuation, from the Unicode Character Database.  There are quite a few,
but you don't even need a regular expression, just a list of the characters.

I hope this is helpful and/or inspirational.




John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
People go through the bother of Christmas because Christmas helps them
to understand why they go through the bother of living out their lives
the rest of the year. For one brief instant, we see human society as it
should and could be, a world in which business has become the exchanging
of presents and in which nothing is important except the happiness and
well-being of the ultimate consumer.  --Northrop Frye (1948)

Link to individual message.

Drew DeVault <sir (a) cmpwn.com>

Can you guys give it a rest already? NACK on behalf of my client. Gemini
is good because it's not going to change. We don't need italics for
"serious writing". Everyone has something different that they NEED very
BADLY or else they can't SERIOUSLY use gemini, yak, yak, yak.

Stop it.

Gemini is good because it's simple.

Link to individual message.

John Cowan <cowan (a) ccil.org>

On Thu, Nov 12, 2020 at 8:07 PM Drew DeVault <sir at cmpwn.com> wrote:

Stop it.
>

Y'know, if you hadn't written your message, I wouldn't have written this
reply and that would be two less messages on the list.  But knock yourself
out.

Gemini is good because it's simple.
>

I agree.  Some clients are bare-bones, others add all sorts of bells and
whistles.  That doesn't affect the simplicity of the protocol *or* the
format.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
The penguin geeks is happy / As under the waves they lark
The closed-source geeks ain't happy / They sad cause they in the dark
But geeks in the dark is lucky / They in for a worser treat
One day when the Borg go belly-up / Guess who wind up on the street.

Link to individual message.

Sean Conner <sean (a) conman.org>


  Oooh!  A bike sheeding thread!  I know Drew DeVault might complain, but
hey, this list wouldn't be this list unless the majority of messages were
about text formatting (seriously---over half the messages are not about the
protocol at all, but about text formatting).

  So with my introduction out of the way, let me nitpick [1] this proposal
with a bunch of corner cases I can already see ...

It was thus said that the Great John Cowan once stated:
> Given that we are *not* going to change the definition of text/gemini, but
                    ^^^^^ shouldn't that be _not_?  Or are you going for
strong emphasis here?  

> 1.  If an underscore appears outside an emphasized text section, and is at
> the beginning of a text line or after the # characters in a header line, or
> is preceded by a whitespace character, then it marks the beginning of an
> emphasized-text section

  Pattern wise, it's like:

	(start_of_line ?('#'*) | whitespace) '_'

> (rendered as italics or in some other way).

  Such as bold, a larger font, a smaller font, or some other way other than
using italics.  Okay, got it.  

> 2.  If an underscore appears inside an emphasized text section, and is at
> the end of a line, or is followed by whitespace, sentence-terminating
> punctuation, or parenthesis- or quotation-terminating punctuation, then it
> marks the end of the emphasized-text section.

  I've found that terminating italic sections before sentence-terminating
punctuation can lead to very ugly output.  For example:

	He asked about _blandit_?

  Here, the italic t will run into the trailing question mark, which I feel
looks terrible.  That's why I tend to include sentence terminating marks
within the italic section:

	He asked about _blandit?_

This is of less concern with periods and commas, since there isn't much of a
difference, stylistic wise, between a normal period and italic period.

  It also sounds like you are expecting users to write stuff like

	_lorem ipsum dolor sit amet_

or

	_lorem_ipsum_dolor_sit_amet_

else, why not just say that once in an emphasized text section, the next
underscore ends it.  Much easier to deal with, and a bit easier to deal with
when wrapping text (although I suppose one can add '_' to the list, along
with whitespace and hypens).

> These rules exclude underscores in things like snake_case_variables, while
> supporting most actual uses.
> 
> 3.  An emphasized-text section ends unconditionally at the end of a line.

  Odd, but I can see why you say so, given the nature of parsing gemtext. 
But one unaware of that might end up writing:

	blah blabh _lorem ipsum dolor
	sit amet_ blah blah blah

  and wonder why the italicised text is all wrong.

> The attached file specifies all the Unicode whitespace and terminating
> punctuation, from the Unicode Character Database.  There are quite a few,
> but you don't even need a regular expression, just a list of the characters.

  All 352 of the characters.  

  For now.

  That might be updated at the next Unicode revision.

  Got it.

> I hope this is helpful and/or inspirational.

  -spc (Unicode is hard!  Let's do rocketry!)

[1]	Can we still say that term?

Link to individual message.

John Cowan <cowan (a) ccil.org>

On Thu, Nov 12, 2020 at 9:02 PM Sean Conner <sean at conman.org> wrote:

  Oooh!  A bike sheeding thread!


Yup.  Let's sheed some bikes together!

> So with my introduction out of the way, let me nitpick [1]


Yes, you can say that.  Head lice do not yet have a pressure group
insisting that you call their eggs something more polite.

> t> Given that we are *not* going to change the definition of text/gemini,
> but
>                     ^^^^^ shouldn't that be _not_?  Or are you going for
> strong emphasis here?
>

No, just habit.  I write a *lot* of git-flavored Markdown.

>   I've found that terminating italic sections before sentence-terminating
> punctuation can lead to very ugly output.


Yes, though that's up to the content author.  But this isn't about
terminating the italic section; it's about deciding whether an underscore
actually does terminate it.  See below.

> why not just say that once in an emphasized text section, the next
> underscore ends it.


So that lines like "It is important to understand that _although the
standard in C is to use snake_case for variables, C compilers do not
support numbers like 123_456_789_." are interpreted correctly.  To put it
in HTML terms, the first underscore is preceded by whitespace, so it is an
<i>, and the last one is followed by terminating punctuation, so it is an
</i>.  The others, however, don't satisfy either rule 1 or rule 2, so the
emphatic text just goes on right through them.

>         blah blabh _lorem ipsum dolor
>         sit amet_ blah blah blah
>

They will quickly find out that that doesn't work.  Text/gemini lines are
typically used in prose for paragraphs, and italic text doesn't normally
cross paragraph boundaries.

>   That might be updated at the next Unicode revision.
>

That's true.  But as time goes by, the new scripts with script-specific
punctuation become fewer and harder to find.  Until we join the Galactic
Federation, there just aren't many more scripts out there.  Newly invented
ones tend to use Latin/Greek/Cyrillic/etc. punctuation.


>   -spc (Unicode is hard!  Let's do rocketry!)
>

You kidding?  This is one of the easy bits!  All the work has been done for
us.  We don't even need regular expressions to figure it out, just keep the
352 characters in two arrays.  Unless your browser runs on an Arduino, that
is practically free.



John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
I amar prestar aen, han mathon ne nen,    http://vrici.lojban.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR

Link to individual message.

---

Previous Thread: adding space after heading and more

Next Thread: Geminisphere