Changing the historical record of my blog

Twenty-one years ago I was worried about loosing the historical presentation of my blog [1] both because it was template driven, and through the use of CSS (Cascading Style Sheets). Changes that effect everything at once certainly appeared quite Orwellian to me, although I might be in a very small minority in worring about this.

And yet, since then, I've tweaked the CSS quite a bit since I wrote that. I figure I'm not changing the content, so it's okay. right?

It was over a year ago when I noticed that a lot of my earlier entries had the initial paragraph shifted over to the left, due to a change in the template file I made around 2003. The old template had an initial <P> tag so I didn't have to type it, and the new one removed said tag. That left maybe a thousand posts (give or take) that needed fixing. I started doing the job manually at first, then gave up at the sheer number of posts to fix. Again, it was not changing the content but fixing the presentation. And it bothered me that there were posts that weren't formatted correctly.

About a week or two ago, I realized that the markup I used for foreign words:

<span lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</span>

is probably not sematically sound HTML (HyperText Markup Language). I even wrote about that issue twenty years ago [2], and now realize it should be:

<i lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</i>

Around the same time, I read up on the “proper” use of <BLOCKQUOTE [3]> and that the attribution should appear outside the blockquote, not inside as I've been doing for years, even though I was doing The Right Thing™ when I first started blogging, but changed for some reason I long forgot.

And then several days ago, I noticed the sample BASIC code [4] was incorrect and it was bugging me—the keyword THEN would always show up as THENNOT. How that happened is a topic for another post [5], but in the meantime, I decided to fix the issue without mentioning it. The change didn't change the intended meaning of the post, it was fixing incorrect output, not saying we were always at war with Eastasia.

After that, I decided to go back and fix the “formatting” issues in the blog. I have code that will read entries and parse the HTML I use into into an AST (Abstract Syntax Tree) (or should it be a DOM (Document Object Model), even though I'm using Lua, not Javascript?) which I use to generate the Gopher [6] and Gemini [7] versions. To fix the initial paragraph issue, all I needed to do was identify the entries that didn't start with a <P> tag and just prefix the raw text with said tag.

To update the HTML for foreign words, it was enough to identify entries with <SPAN LANG="language"> and with some sed magic, switch it to read <I LANG="language"> (and fix the corresponding closing tags). It's just fixing the semantics of the HTML, not changing the past, right?

The fix for the <BLOCKQUOTE> issue wasn't quite so easy—I still had over 700 entries that needed to be fixed, so I ended up writing code that would spit out the parsed HTML back into HTML. It would have been easy to output it as:


<p>I've been following the various Linux <abbr title="Initial Public Offerin
g">IPO</abbr>s and today I see that <a class="external" href="http://www.val
inux.com/">VA Linux Systems</a> had their <a class="external" href="http://d
ailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO today.</a>. 
 Briefly, it IPOed (can you verb a TLA?  Can you verb the word “verb?” Whate
ver … ) at US$30 and opened at US$299.  Inbloodysane.</p><p><a class="extern
al" href="http://www.andover.net/">Andover.Net</a> wasn't nearly as inbloody
sane.</p>


one long line—the browsers don't care, but I do if I ever have to go back and edit this. Instead, I want the output to still be editable:

<p>I've been following the various Linux <abbr title="Initial Public
Offering">IPO</abbr>s and today I see that <a class="external"
href="http://www.valinux.com/">VA Linux Systems</a> had their <a
class="external"
href="http://dailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO
today.</a>. Briefly, it IPOed (can you verb a TLA? Can you verb the word
“verb?” Whatever … ) at US$30 and opened at US$299. Inbloodysane.</p>

<p><a class="external" href="http://www.andover.net/">Andover.Net</a> wasn't
nearly as inbloodysane.</p>

That meant handling not only <P> but all the block level tags in HTML, <BLOCKQUOTE>, <TABLE>, <DL> (which I use for emails [8] and screenplay dialog [9]), <UL>, <OL>, and <PRE>. Now that I have that working, I can identify the citation paragraphs for blockquotes, and move them to the appropriate location.

I'm about to do that, yet I'm still a bit hesitent. Yes, it's just fixing the semantic presentation, but now that I have the code to read and write HTML, future mass changes are easy to do.

I'm probably thinking too much on this.

I think.

[1] /boston/2002/07/23.1

[2] /boston/2003/02/05.2

[3] https://html.spec.whatwg.org/#the-blockquote-element

[4] /boston/2023/05/10.1

[5] /boston/2023/09/27.1

[6] gopher://gopher.conman.org:70/1Phlog:

[7] gemini://gemini.conman.org/boston/

[8] /boston/2023/03/01.1

[9] /boston/2008/06/20.1

Gemini Mention this post

Contact the author