Hypertext editing and the Semantic Web

There's an interesting discussion [1] about Jason Kottke's new design for his weblog [2] and it brings up a topic I was thinking about earlier today.

Blogging software in general has made the publishing of new web pages (or entries) easier, automating a several step process as the click of a button. But what hasn't gotten any easier is the actual creating, or editing, of HTML (HyperText Markup Language) content. I've talked about this before [3], how I sometimes have problems with the writing process with hypertext because the act of creating the hyperlink isn't seamless, but yet if I skip creating hyperlinks as I write, waiting until I'm done writing, I may forget what it was I wanted to link to exactly.

Some markup, say, <EM> or <STRONG> can be handled invisibly like it's been done for years in more traditional editors. So for example, I could be typing along, whem bam! I want to emphasize something I can hit ALT-E and start typing, hitting ALT-E when done. But hypertext and any possible metadata associated with said hypertext is harder to streamline like that.

For instance, when I quote a passage:

Oh, and I'd just like to point out that I'm not bashing any current weblog software for not being flexible enough or being wrong or whatever. As Anil has said, it's harder than just saying that a particular tool should do this or that. In fact, I love MT (Moveable Type) (not to mention the army of plug-in developers who put out these fantastic plug-in for free) more than ever for the amazing amount of flexibility and control that is possible (with a bit of work).

“Jason Kottke [4]”

It's actually quite a bit of work for me. First it's cut-n-paste the quote from the webpage to the editor I use, then go through to clean it up (changing double quotes to two single back tics or two regular single quotes (which my software will then pick up and change to &ldquo; and &rdquo; respectively) and adding any appropriate HTML) but also adding the <BLOCKQUOTE> with appropriate attributes:

<BLOCKQUOTE CITE="http://www.kottke.org/03/11/kottke- redesign#8304" TITLE="the redesign continues ... ">

And adding the attribution line

<P CLASS="cite"> <CITE> <A CLASS="external" HREF="http://www.kottke.org/03/11/kottke-redesign#8304"> Jason Kottke </A> </CITE> </P>

I used to place this outside the <BLOCKQUOTE> but recently I moved this inside the <BLOCKQUOTE>—I'm not sure which I like better. How would you automate this? Partly by integrating the editor with the browser and and passing along more information in the cut buffer (like URL (Uniform Resource Locator) and title of the page where the text is selected), but the main issue is one of layout, like I mentioned above. Context sensitive templates for pasting perhaps? And how to you handle links? Same way? A key-sequence for pasting a blockquote and a separate one for a link? All I do know is that the HTML WYSIWYG (What You See Is What You Get) editors I've seen have never handled links cleanly. Want a link? Highlight the text, select link and then have to type in the URL and forget about having other attributes like TITLE or CLASS; or perhaps not, but there are other buttons to select to set those and by the time you're done, it would have been easier to type the actual code than to have the editor so helpfully do it for you.

The discussion at Kottke's site is about applying different layouts to different types of posts—the posts about movies are formatted one way, book reviews another and just regular posts yet another way and how to trigger the appropriate template for the type of post. Granted, the software used, Moveable Type [5], is geared more for people who don't care to learn or type by hand HTML so having a different layout for different posts is a bit more difficult to achieve than say, mod_blog where one pretty much has to know HTML to format posts. But there's a tradeoff to be made— since I use HTML raw (so to speak) I can go in a fudge the formatting as I see fit. My PhotoFriday [6] posts (yes, I've seriously slacked off on those) used a different format than my regular posts and it was easy enough to handle—a new division, some definitions in the CSS (Cascading Style Sheet) file and there you go.

But the cost is that this isn't automatic. I don't have a menu item or a keyboard sequence to designate “this is a PhotoFriday post” in much the same way I don't have a menu item or keyboard sequence that says “these are a series of photos to display sequentially” or “here is a section of text I'm quoting from this web page.” Mind you, I wouldn't mind such an editor, and if done to my liking it would certainly make editing of posts much easier than it is now (and right now, I'm looking at all this text I've written so far, pretty much sans HTML and somewhat dreading having to go back and format it, but since I did skip the HTML formatting I had an easier time getting this out without forgetting what I wanted to mention, although hopefully I'll remember all the links I wanted to add).

Now, having finally formatted what I have, I will also say that this lack of good hypertext (or HTML) editors will also have an effect on the Semantic Web. There's been quite a bit of stir lately over the Semantic Web (stirred by Clay Shirky's essay, The Semantic Web, Syllogism, and Worldview [7]) but except for a few (Mark Pilgrim) [8] diehard (Shelly Powers) [9] people (Dorothea Salo) [10] who add semantic information to their webpages, it won't really take off until we get good HTML editors that will automagically include the required semantic information for us, and I don't see that happening any time soon.

For example, if you are using a web browser that supports the <ACRONYM> tag, you may notice that the TLA (Three Letter Acronym)s and ETLA (Extended Three Letter Acronym)s are lightly underlined (at least, that's the default for IE (Microsoft Internet Exploder) and Mozilla it appears) and that if you mouse over them, the acronym is expanded in a small text window, giving you the meaning. I add that, by hand, to every acronym I use and yes, it does get to be a pain. I could automate that, but the problem there is that computers are rather bad at figuring out context. With only 17,576 TLAs available, there is definitely going to be some overlap. Take for instance, IRA.

While the IRA (Irish Republican Army) may take actions against US (United States) interests that would effect Alice's (a member of the IRA) IRA, can an automated process work out which expansion of IRA should be used for each instance? Just ask yourself that question next time you ask YER (Yemeni Rial (ISO currency code)) computer two check you're spelling.

And while I'll probably never use the letters “I,” “R,” and “A” I would like to note that WAP, as a technical acronym, has two close meanings. There is WAP (Wireless Access Protocol), which is a proprietary and expensive replacement for HTTP (HyperText Transport Protocol) for cellphones, and WAP, which is how I get my laptop onto the network here in the Facility in the Middle of Nowhere, and while I tend to mention WAP quite often, I don't think I'll ever use WAP as I think it's quite silly (and I pity the person who has to read that paragraph in a browser that doesn't support the <ACRONYM> tag).

I suppose acronym expansion could work as spell checking does now, come across a potential TLA and if it isn't expanded, offer up a choice of possible expantions, which may help to prevent IRA GERSHWIN from becoming an Individual Retirement Account GERSHWIN (fahrfenugen).

And now I'm off to format what I've written since the last portion I've formatted. I would kill for a decent HTML editor that does The Right Thing™.

[1] http://www.kottke.org/03/11/kottke-

[2] http://www.kottke.org/

[3] /boston/2002/07/11.2

[4] http://www.kottke.org/03/11/kottke-redesign#8304

[5] http://www.moveabletype.org/

[6] http://www.photofriday.com/

[7] http://www.shirky.com/writings/semantic_syllogism.html

[8] http://www.diveintomark.org/

[9] http://weblog.burningbird.net/

[10] http://www.yarinareth.net/caveatlector/

Gemini Mention this post

Contact the author