Hi Geminauts, I am trying to understand what got wrong with the document below, the very first title is not rendered: gemini://gnuser.land/gemlog/draft/bug01.gmi The "bug01.gmi" was started in Micro and finalized in Mousepad (Debian, XFCE4). Than I copied all the content into Geany (Debian, XFCE4), removed some white spaces, and saved it as "bug02.gmi", and the title was formatted properly. gemini://gnuser.land/gemlog/draft/bug01.gmi A diff made to both files shows there is actually a **difference** a the very first line, but what is that? diff -bZ bug01.gmi bug02.gmi 1c1 < ?# Title 1 --- > # Title 1 Thanks! TGL
The Gnuserland <gnuserland at mailbox.org> writes: > A diff made to both files shows there is actually a > **difference** a > the very first line, but what is that? > > diff -bZ bug01.gmi bug02.gmi > 1c1 > < ?# Title 1 > --- > > # Title 1 Perhaps try running hexdump(1) (or something analogous) on both files and compare them byte-for-byte. Alexis.
Thank for the suggestion, something came out, but not sure what does it mean: < 00000000: efbb bf23 2054 6974 6c65 2031 0d0a 0d0a? ...# Title 1.... --- > 00000000: 2320 5469 746c 6520 310d 0a0d 0a23 2054? # Title 1....# T You can check the full output here: gemini://gnuser.land/gemlog/draft/xxd.gmi TGL p.s. really loved it how sweat was setting up this page with Gemini! On 7/6/21 12:09 AM, Alexis wrote: > > The Gnuserland <gnuserland at mailbox.org> writes: > >> A diff made to both files shows there is actually a **difference** a >> the very first line, but what is that? >> >> diff -bZ bug01.gmi bug02.gmi >> 1c1 >> < ?# Title 1 >> --- >> > # Title 1 > > Perhaps try running hexdump(1) (or something analogous) on both files > and compare them byte-for-byte. > > > Alexis.
The Gnuserland <gnuserland at mailbox.org> writes: > Thank for the suggestion, something came out, but not sure what > does > it mean: > > < 00000000: efbb bf23 2054 6974 6c65 2031 0d0a 0d0a ...# Title > 1.... > --- >> 00000000: 2320 5469 746c 6520 310d 0a0d 0a23 2054 # Title >> 1....# T Byte Order Mark at the start of the first file. http://www.herongyang.com/Unicode/Notepad-Byte-Order-Mark-BOM-FEFF-EFBBBF.html Alexis.
On 6. Jul 21, at 7.52, The Gnuserland <gnuserland at mailbox.org> wrote: > Thank for the suggestion, something came out, but not sure what does it mean: > > < 00000000: efbb bf23 2054 6974 6c65 2031 0d0a 0d0a ...# Title 1.... > --- > > 00000000: 2320 5469 746c 6520 310d 0a0d 0a23 2054 # Title 1....# T I checked how Lagrange handles the Byte Order Mark (BOM), and sure enough it breaks the first line's type detection. Fixed for future releases! --jaakko
Thank you guys, You are amazing. :) Actually Mousepad has an option that says: "Write Unicode BOM" Hence my question is: does it need to be checked or unchecked? Anyway so far the clients I have tried that do not render the bug01.gmi page properly are:
Is this bug something already covered underthe Torture Test => gemini://gemini.conman.org/test/torture If not, should it (and other points of concern) be appended? I wonder, does the Torture Test and other similar services get used by browsers as part of a CI workflow? ==================== Jonathan McHugh indieterminacy at libre.brussels July 6, 2021 3:19 PM, "The Gnuserland" <gnuserland at mailbox.org> wrote: > Thank you guys, > > You are amazing. :) > > Actually Mousepad has an option that says: "Write Unicode BOM" > > Hence my question is: does it need to be checked or unchecked? > > Anyway so far the clients I have tried that do not render the bug01.gmi page properly are: > > * Amfora > > * Lagrange > > * Telescope > > Clients that render the page properly: > > * Geminauts > > Cheers, > > TGL > > On 7/6/21 1:56 AM, skyjake wrote: > >> On 6. Jul 21, at 7.52, The Gnuserland <gnuserland at mailbox.org> wrote: >> >>> Thank for the suggestion, something came out, >>> >>> Cheers, >>> >>> TGL >>> >>> but not sure what does it mean: >>> >>> < 00000000: efbb bf23 2054 6974 6c65 2031 0d0a 0d0a ...# Title 1.... >>> --- >> >> 00000000: 2320 5469 746c 6520 310d 0a0d 0a23 2054 # Title 1....# T >> I checked how Lagrange handles the Byte Order Mark (BOM), and sure enough it breaks the first >> line's type detection. >> >> Fixed for future releases! >> >> --jaakko
> Another thing to clarify in the next version of the spec. Given my own recent face slamming against BOM, I wouldn't mind handling of this as part of the spec.
On 7. Jul 21, at 20.40, Andrew Singleton <singletona082 at gmail.com> wrote: > >> Another thing to clarify in the next version of the spec. > > Given my own recent face slamming against BOM, I wouldn't mind handling of this as part of the spec. I agree. While this is a relatively minor issue, it's always better to avoid undefined behavior (that depends on invisible characters!). Submitted to GitLab: => https://gitlab.com/gemini-specification/protocol/-/issues/36 --jaakko
The Gnuserland writes: > Actually Mousepad has an option that says: "Write Unicode BOM" > > Hence my question is: does it need to be checked or unchecked? The BOM at the start of the file is invalid for UTF-8 documents. I
On 14. Jul 21, at 2.30, Jason McBrayer <jmcbray at carcosa.net> wrote: > The BOM at the start of the file is invalid for UTF-8 documents. I > *think* it's required for UTF-16, but I may be wrong and I'm too tired > to look it up. BOM is not invalid for UTF-8, although it has limited usefulness: > UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature ? an indication that an otherwise unmarked text file is in UTF-8. => https://www.unicode.org/faq/utf_bom.html#bom1 I recommend this FAQ to anyone wondering about what to do with BOMs, there's plenty of good info. --jaakko
---