I've mostly been sitting back through all of this list and raw mode talk, only occasionally popping in small points here or there. But I figure if this is a community decision it is good if those in the community voice their support or lack of support for ideas. I am mostly ambivalent about the need for a raw text mode. Terminals and any graphical clients (GTK, FLTK, Tcl/Tk) all wrap to the window at either a hard or a word wrap. If we arent reflowing, then we are already doing what needs to be done. There has been talk about not wrapping python code. Ok. I get how that would look bad. But which is more readable: 1. Hard-wrapped python code 2. Truncated python code where you just straight up lose the content Clearly #1. I get it as a nice thing for ascii art, but again I feel like using ascii art is a risk a content creator takes... knowing that it may degrade poorly on various output systems. Tomasino writes: > 1. Listing things > 2. Quick instructions where headers are overkill > 3. Track listings > 4. Top 10 lists > 5. The same reasons we want bullets the reflow > 6. Recipes! I think Tomasino's response to Julien, in plain text, just proved the _lack_ of need for lists. Want a list? Write a list.... handled. I really really really do not see the point of adding all of this markup :-/ Just serve files as markdown if you want markdown and let clients either support markdown or not. I really do not want to, and likely will not, add support for lists in my current or any future gemini client I make. I don't see the point. I do appreciate all of the work and thought that has gone into trying to figure these things out. It just feels unnecessary for a simple protocol like gemini. Why get complex when just serving markdown or html or whatever you like will do the job and people can read it as they see fit. We are definitely starting to get into the territory with the proposals where some client authors will support some things and some wont... which will create a fractured view of gemini to users. So why not keep it simple and cohesive by just having links and move on to other things? Just my two cents at the moment.
On Sun, Jan 19, 2020 at 08:15:14PM +0100, Brian Evans wrote: > I've mostly been sitting back through all of this list and raw mode > talk, only occasionally popping in small points here or there. But > I figure if this is a community decision it is good if those in the community > voice their support or lack of support for ideas. It definitely is, if people who don't feel good about proposed changes keep quiet and all I hear on the list is agreement it will just lead to the most vocal members driving things. I'm glad you've spoken up. This will be a quick and kind of partial reply, sorry. But, very briefly, I understand the concern about lists (of all the recent proposed changes, they are the least valuable IMHO because they only concern presentation). How do you feel about the heading proposals? IMHO they are probably the most valuable of these changes because they can be totally ignored from the perspective of presentation and yet still do genuinely useful stuff, like automatically generated tables of contents, which do nothing but make large and well-structured text content easier to read, or help with automatic generation of user-friendly menus or feeds to bodies of content. That's good stuff, surely? > I really do not want to, and likely will not, add support for lists in my current > or any future gemini client I make. I don't see the point. As I intend to actually write all this stuff in the spec, if I decide to adopt it, supporting the list line types will be strictly optional, so this is absolutely fine. For a text-based client like Bombadillo, the difference between supporting lists and not is an extremely small aesthetic thing that some users probably won't even notice. > I do appreciate all of the work and thought that has gone into trying to > figure these things out. It just feels unnecessary for a simple protocol > like gemini. Why get complex when just serving markdown or html or > whatever you like will do the job and people can read it as they see fit. > We are definitely starting to get into the territory with the proposals > where some client authors will support some things and some wont... > which will create a fractured view of gemini to users. So why not keep it > simple and cohesive by just having links and move on to other things? The fractured Geminispace thing is a real concern, but actually I think an argument can be made in both ways here. If text/gemini is kept extremely simple and plain and the official policy is "Serve Markdown or HTML if you want even basic text styling" then people may well do that. Some clients will add support for rendering those but many won't because it's so much more complicated (far worse than what has been proposed here!). Then we end up with two regions of Geminispace, the text/gemini subspace which anybody can visit and the Markdown subspace which only people using the fanciest clients can visit. IMHO, this is a worse kind of fracturing than one where we adopt the proposed new text/gemini syntax and different clients implement different subsets of the optional features. After all, the only reason I've been so positive about these recently proposed changes is that they all seem to degrade very gracefully if a client doesn't recognise them and treats them equivalent to text. The degree of fracturing possible is actually very slight. > Just my two cents at the moment. One cent of opinion from client or server authors is valued equal to one dollar of opinion from anybody else. :) Cheers, Solderpunk
On Sun, Jan 19, 2020 at 08:01:23PM +0000, solderpunk wrote: > The fractured Geminispace thing is a real concern, but actually I think > an argument can be made in both ways here. If text/gemini is kept > extremely simple and plain and the official policy is "Serve Markdown or > HTML if you want even basic text styling" then people may well do that. > Some clients will add support for rendering those but many won't because > it's so much more complicated (far worse than what has been proposed > here!). Then we end up with two regions of Geminispace, the text/gemini > subspace which anybody can visit and the Markdown subspace which only > people using the fanciest clients can visit. IMHO, this is a worse kind > of fracturing than one where we adopt the proposed new text/gemini > syntax and different clients implement different subsets of the optional > features. After all, the only reason I've been so positive about these > recently proposed changes is that they all seem to degrade very > gracefully if a client doesn't recognise them and treats them equivalent > to text. The degree of fracturing possible is actually very slight. Oh, there was also the argument made at some point that if Gemini doesn't have any standardised syntax for these common formatting tasks, ambitious clients might start trying to recognise the most popular non-standard ways of doing it, which could easily lead to divergent implementations across clients. So, better to provide a standard way to do to provide us control and uniformity. I don't mean to claim either of these arguments are bulletproof, I just think that a principle of "we shouldn't do anything that risks fragmentation of Geminispace" (which is obviously a good principle) necessarily comes down clearly on one side of this question or the other. Cheers, Solderpunk
I think there have been a lot of good suggestions coming from all sides. Here's my take on a compromised format that tries to take everything into account while also inserting a few of my own opinions: Parser pseudo-code (actually, it's valid python): ``` preformat_mode = False preformat_buffer = [] for line in document: if line.startswith('```'): if not preformat_mode: # Start preformat block preformat_mode = True preformat_buffer = [] else: # End preformat block preformat_mode = False display_preformat_block(preformat_buffer) elif preformat_mode: # Inside of preformat block preformat_buffer.append(line) elif line.startswith('###'): display_header_level_3(line) elif line.startswith('##'): display_header_level_2(line) elif line.startswith('#'): display_header_level_1(line) elif line.startswith('=>'): display_link(line) elif line.startswith('---'): display_horizontal_rule() else: display_paragraph(line) if preformat_mode: # Flush the preformat block if there was no end tag add_preformat_block(preformat_buffer) ``` This pseudo-code was written with "fancy" gemini clients in mind. In other words, this should be close to the worst-case scenario for how complicated a gemini document parser would ever need to be. ## Preformat mode Many clients are going to want to display a preformat block of text in a horizontally scrollable window or some other type of block widget. This pseudo-code reflects that by sticking the pre-formatted lines in a separate buffer until the end of the block. I think this is a more accurate representation of what most client parsers would end up looking like. ## Headers I'm of the opinion that there should only be a fixed number of header levels. It keeps the matching logic flat and straightforward. Three levels is few enough that most clients should be able to come up with distinct styles to display them. Fixed header lines are trivial to parse and provide a lot of utility for organizing a document and linking to sub-sections. ### Ordered Lists & Unordered Lists Lists are tricky because while they would be nice to have, the complicate the parsing significantly. In order to parse a list while preserving its semantic structure, you will need to keep track of where it starts and ends. Nested lists complicate this even further, no matter which syntax for nesting is used. Parsing lists semantically would require keeping a separate buffer for each type of list, and then keeping flags and making sure that these buffers are flushed after the last element in the list. Because of this, I do not believe that they pass the power-to-weight ratio smell test. For authors, they still have a few choices for lists: 1. Stick the list in a preformat block 2. Write the list in plain mode without special formatting I accept that neither of these options is *ideal* for all use cases, but I think they are *good enough* for most use cases. Don't forget that unicode bullets can already be added directly in gemini documents if the author wishes to do so. ### Quotes Quote blocks with ">" would be ok if we could count on them being only a single line long. However, many quotes will necessarily include line breaks that should be displayed together in a single block. This complicates parsing in the same way that lists do, so I think that quotes should also be omitted for the same reason. If you want to display something like a quote from a mailing list message, I think that would be a perfect candidate for copying it into a preformat block. For other types of quotes, stick them between two horizontal rules to separate them from the surrounding text. ### Horizontal Rule I find the horizontal rule useful for separating sections of a page. I see them commonly use on gopher to add a footer to the bottom. They can likewise be used for header sections. ``` Header --- Content --- Footer ``` The following gemini sites already use some form of horizontal separator on their front pages (the precise syntax varies): - gemini://vger.cloud/ - gemini://gemini.conman.org/ - gemini://zaibatsu.circumlunar.space/ - gemini://carcosa.net/ - gemini://yam655.com/ I think that since horizontal rules are easy to parse and they add utility for structuring pages, they should be included in the spec. ### Other Random Opinions - Leading and trailing whitespace should be stripped from all of lines outside of the preformat block. If you're allowing a non-monospace font for these elements, then leading whitespace can look inconsistent and trailing whitespace serves no real purpose. By leading whitespace, I mean that ## heading text Should be interpreted as "heading text", not " heading text". - I have no opinion on whether the ``` should allow text after it on the same line. I think I would be satisfied enough with the above document to at least try it out by converting all of my existing gemini content. I also think I would be fine keeping everything fixed-width and hard wrapping. I *don't* think I would want to implement nested lists or quote blocks, or anything significantly more complicated than what is outlined above. - mozz
On Sun, Jan 19, 2020 at 08:15:14PM +0100, Brian Evans wrote: I've been thinking more about lists and, while I understand the concern about maintaining simplicity, I really do think that: > I think Tomasino's response to Julien, in plain text, just proved the _lack_ of > need for lists. > > Want a list? Write a list.... handled. is a bit too simplistic. Tomasino's response consisted entirely of short list items which didn't require wrapping, and also happened in a plain text email, which is a hard wrapping environment and so not comparable to a "long line" text/gemini document. In Gopherspace, it's very common for people writing lists to format multi-line list items "nicely", e.g. people write this: ---
Just a quick response for now: nice post, thanks, there's a lot in here that I agree with (and I had been starting to think similar things about quotes), but can I ask you to elaborate on: On Sun, Jan 19, 2020 at 09:01:17PM -0500, Michael Lazar wrote: > Lists are tricky because while they would be nice to have, the complicate the > parsing significantly. In order to parse a list while preserving its semantic > structure, you will need to keep track of where it starts and ends. Nested > lists complicate this even further, no matter which syntax for nesting is used. > > Parsing lists semantically would require keeping a separate buffer for each type > of list, and then keeping flags and making sure that these buffers are flushed > after the last element in the list. Because of this, I do not believe that they > pass the power-to-weight ratio smell test. In particular, what do you mean by "parsing lists semantically"? At no point in these discussions have I been envisaging anything to do with lists which requires clients to recognise or keep track of whether or not they are "inside" a list or not, or sticking lists in buffers. I have imagined list items standing alone and "lists" being an emergent property of a document that clients have no awareness of - in exactly the same way that "paragraphs" are an emergenty property of lines (if some of those lines happen to be blank). Well, that's true for unordered lists, at least. Ordered lists are another story Cheers, Solderpunk
On 1/20/20 10:28 AM, solderpunk wrote: > Well, that's true for unordered lists, at least. Ordered lists are > another story Ordered lists are--so far--the only thing that really does break the linear line-by-line processing approach. If they are sacrificed to the gods of a cleaner spec, I for one wouldn't cry too much. Unordered lists that support reflow is the more important bit. One could always cheat and just use unordered lists and start each one with a number:
On Mon, Jan 20, 2020 at 12:09:01PM +0000, James Tomasino wrote: > One could always cheat > and just use unordered lists and start each one with a number: > > * 1) item one! > * 2) A really long item two that will be wonderfully reflowed on small > screens by our awesome client writers. > ** 2a) I'm lookin' at you, Michael! > ** 2b) And the rest of you too. ;) > > Saves us a bit o' logic. Keeps everything line based. You could run a > stateless renderer on a stream now and it wouldn't choke. This has two other advantages, too. First, when processed by a simple client which opts not to treat list items as anything special, the result is still obviously an ordered list. Under the + proposal, it will look just like an unordered list whose author chose a different bullet character for reasons of taste. Ambiguous degredation is not graceful degredation! Second, authors can unambiguously refer back to a list item in later writing. I can say "Tomasino used a winky face in item 2b above" and you can all go back and find 2b and confirm this. If I said that and you were using a simple client that just rendered + and ++, it would be up to you to mentally figure out which point was 2b. And even if you're using an advanced client that does render ordered lists, I might write "2b" but your fancy client might use Roman numerals for second level lists and print 2ii instead of 2b, and again the connection wouldn't be immediate. This could only be solved by tediously specifying that first level lists MUST use Arabic numerals, second level lists MUST use lowercase letters, third level lists MUST use Roman numerals, and so on and so on. And then what happens when somebody uses more than 26 second level list items and we run out of lowercase letters to use? A spec that can avoid all these problems will be exactly the kind of long, tedious, fiddly spec that I really don't want us to use, and which nobody will want to code to anyway. I'm starting to think we should either drop the ordered list idea, or at the very least strictly limit it to one level with no nesting. Cheers, Solderpunk
On Mon, Jan 20, 2020 at 5:29 AM solderpunk <solderpunk at sdf.org> wrote: > > Just a quick response for now: nice post, thanks, there's a lot in here > that I agree with (and I had been starting to think similar things about > quotes), but can I ask you to elaborate on: > > On Sun, Jan 19, 2020 at 09:01:17PM -0500, Michael Lazar wrote: > > > Lists are tricky because while they would be nice to have, the complicate the > > parsing significantly. In order to parse a list while preserving its semantic > > structure, you will need to keep track of where it starts and ends. Nested > > lists complicate this even further, no matter which syntax for nesting is used. > > > > Parsing lists semantically would require keeping a separate buffer for each type > > of list, and then keeping flags and making sure that these buffers are flushed > > after the last element in the list. Because of this, I do not believe that they > > pass the power-to-weight ratio smell test. > > In particular, what do you mean by "parsing lists semantically"? > > At no point in these discussions have I been envisaging anything to do > with lists which requires clients to recognise or keep track of whether > or not they are "inside" a list or not, or sticking lists in buffers. > I have imagined list items standing alone and "lists" being an emergent > property of a document that clients have no awareness of - in exactly > the same way that "paragraphs" are an emergenty property of lines (if > some of those lines happen to be blank). By "parsing lists semantically" I mean that if I build an AST, I want all of the list items grouped together inside of single list object. This is how I did it when I was playing around with markdown a while ago [0]. From my research this seems to be the common way to do it [1]. Sophisticated gemini clients could utilize this is a variety of ways. Maybe you want to add a little bit of extra whitespace surrounding the list. Or you want to make sure that the your display does not cut-off half way through the list. Or you want to support re-ordering list items alphabetically. I don't know, the sky is the limit. I'm willing to admit that HTML has perhaps tainted my thinking here, but it just feels *wrong* to me to have an <li> without the surrounding <ul>. Doing the same thing with "paragraphs" (i.e. each line is a new paragraph) doesn't feel wrong in the same way. I just have a hard time mentally getting past it. If I understand correctly, the main argument that I'm hearing in favor of unordered lists is so that users can visually distinguish the first line of the list from subsequent lines that have been wrapped by the client. I can emphasize with this. Bullet lists have been called out because they're an obvious example of where this is painful. But this might be a more generalized problem. For example, a poem will have deliberate line breaks, but you would also like your poem to be wrapped by the client. What if I were to say this: When a client is wrapping a line longer than the viewport, the client may chose to add indents or other visual indicators to distinguish the beginning of the line from a continuation line. The simplest way to do this would be by adding a hanging indent to continuation lines. Expanding on my previous code example: ``` def display_paragraph(line): # Strip leading and trailing whitespace line = line.strip() initial_indent = '' subsequent_indent = ' ' wrapped_text = textwrap.wrap(line, initial_indent, subsequent_indent) for line in wrapped_text: print(line) ``` If we generalize this to all lines, we don't need to handle list items as a special case. Is there anything that this would break? [0] gemini://mozz.us/markdown/design_document.json [1] https://github.com/syntax-tree/mdast#list - mozz
> I'm of the opinion that there should only be a fixed number of header levels. > It keeps the matching logic flat and straightforward. Three levels is few > enough that most clients should be able to come up with distinct styles to > display them. Fixed header lines are trivial to parse and provide a lot of > utility for organizing a document and linking to sub-sections. I like that this would discourage the "markdown hacking" seen on GitHub where unnecessary depth is used to make the HTML render look nicer. OTOH, it would be easy for authors to:
On Mon, Jan 20, 2020 at 12:30:14PM -0500, Michael Lazar wrote: > By "parsing lists semantically" I mean that if I build an AST, I want all of > the list items grouped together inside of single list object. This is how I > did it when I was playing around with markdown a while ago [0]. From my > research this seems to be the common way to do it [1]. > > Sophisticated gemini clients could utilize this is a variety of ways. Maybe > you want to add a little bit of extra whitespace surrounding the list. Or you > want to make sure that the your display does not cut-off half way through the > list. Or you want to support re-ordering list items alphabetically. I don't > know, the sky is the limit. Got it, thanks for clarifying. We'll never be able to stop people going nuts and defining their own structure on top of the official structure in the spec if they really want to, but I think if the official spec can define a perfectly flat structure (such that actually building an AST is unnecessary) which is rich enough to take care of the most compelling styling that's needed to achieve good readability, then that's absolutely fine. There's no need to have a concept of a list encapsulating consecutive list items in order to implement the clean formatting I discussed previously, so I think we can do without it. It might feel weird compared to HTML or LaTeX, but if it works, where's the problem? I think this is how lists work in common troff macros, actually, but I can't swear to it. > I'm willing to admit that HTML has perhaps tainted my thinking here, but it > just feels *wrong* to me to have an <li> without the surrounding <ul>. Doing > the same thing with "paragraphs" (i.e. each line is a new paragraph) doesn't > feel wrong in the same way. I just have a hard time mentally getting past it. In the rough spec I sent around for this line-oriented syntax, each line
Attached is a <100 line python gemini renderer with the following features:
Whoops, I forgot to reset the list counter. Fixed version attached. Still < 100 lines of code, including empty lines. -------------- next part -------------- A non-text attachment was scrubbed... Name: gemini.py Type: text/x-python Size: 2511 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200120/da53 5619/attachment.py>
Thanks for taking the time to write this! There are a few details that could be nitpicked (e.g. a lot of this code seems to assume that *s or #s at the start of lines are followed by whitespace, which hasn't been specced), but I'm totally happy that this code is representative of the complexity involved in handling everything proposed so far. If a bare-minimum renderer implementing only the compulsory core line types can be done in ~10 lines and a full-strength renderer implementing everything to the max can be done in ~100 lines then I'm totally happy with that - in terms of implementation difficulty. I still want to think very carefully about graceful degradation, to make sure those ~10 line renderers still yield usable results. I still have real concerns about ordered lists in that regard. Cheers, Solderpunk On Mon, Jan 20, 2020 at 11:53:51AM -0800, Aaron Janse wrote: > Attached is a <100 line python gemini renderer with the following features: > * no external dependencies > * only two state variables > * unlimited-depth ordered lists that rotates through > * numbers > * letters (does `az` after `z`, `aaa` after `zz`, etc) > * roman numerals > * unlimited-depth unordered lists > * unlimited-depth headers with rotating colors > * wraps at word boundaries, with fancy indents for lists and quotes > * colors for all special syntax > * horizontal rules that span the width of the display > * preformatted text > * links > > To use this script, pipe text/gemini into stdin. > > I hope this makes a strong case that these features aren't too complex to > implement. > #!/usr/bin/env python3 > > import sys > import textwrap > > def int2roman(number): > numerals = { 1 : "I", 4 : "IV", 5 : "V", 9 : "IX", 10 : "X", 40 : "XL", > 50 : "L", 90 : "XC", 100 : "C", 400 : "CD", 500 : "D", > 900 : "CM", 1000 : "M" } > result = "" > for value, numeral in sorted(numerals.items(), reverse=True): > while number >= value: > result += numeral > number -= value > return result > > width = 80 > > # only two state variables > preformatted = False > list_counter = [0] > > for line in sys.stdin: > if line.startswith('```'): > preformatted = not preformatted > continue > > if preformatted: > print('\033[37m'+line+'\033[m', end='') > continue > > line = line.rstrip() > > if line.startswith('=>'): > parts = line[2:].strip().split(maxsplit=1) > print('\033[36m'+parts[0]) # url > print(parts[1]+'\033[m') # text > elif line.startswith('#'): > parts = line.split(maxsplit=1) > depth = len(parts[0]) - 1 > colors = ['31', '93', '92', '34'] > color = colors[depth % len(colors)] > print('\033['+color+'m'+line+'\033[m') > elif line.startswith('*'): > parts = line.split(maxsplit=1) > depth = len(parts[0]) > text = textwrap.fill(parts[1], width) > text = textwrap.indent(text, ' '*(2*depth)).lstrip() > print(2*(depth-1)*' '+'\033[93m???\033[m '+text) > elif line.startswith('+'): > list_counter = list_counter if len(list_counter) > 0 else [0] > > parts = line.split(maxsplit=1) > depth = len(parts[0]) > > if depth > len(list_counter): > list_counter += [0] > elif depth < len(list_counter): > list_counter = list_counter[:depth] > > assert len(list_counter) == depth > > marker = '' > > counter_type = (len(list_counter) - 1) % 3 > list_counter[-1] += 1 > count = list_counter[-1] > if counter_type == 0: > marker = str(count) > elif counter_type == 1: > while True: > count -= 1 > marker = chr(97+(count%26)) + marker > count = count // 26 > if count == 0: > break > else: > marker = int2roman(count) > > text = textwrap.fill(parts[1], width) > text = textwrap.indent(text, ' '*(3*depth)+' '*(len(marker)-1)).lstrip() > print('\033[93m'+(depth-1)*3*' ' + marker + '. \033[m' + text) > elif line.startswith('>'): > depth = 0 > while True: > if line.startswith('>'): > line = line[1:].lstrip() > depth += 1 > else: > break > text = textwrap.fill(line, width) > text = textwrap.indent(text, '\033[93m>\033[m '*depth) > print(text) > elif line.startswith('---'): > print('\033[37m'+'-'*width+'\033[m') > else: > print(textwrap.fill(line, width)) >
---
Previous Thread: Color and other escape sequences in Gemini
Next Thread: [SPEC-CHANGE] Full text reflow is out, long line wrapping is in