Semantic HTML

There's quite the buzz in the weblogging community over Mark Pilgrim's (Pushing the envelope) [1] use of the <CITE> tag (among other more esoteric tags in HTML (HyperText Markup Language)). It's a nice idea, but all the standard (HTML 4.0: § 9.2.1—Phrase elements) [2] says about <CITE> is:

**CITE:**…>     Contains a citation or a reference to other sources.

“HTML 4.0 § 9.2.1 Phrase elements [3]”

And only a few scant and quite trivial examples. I'm not sure of the exact usage of the <CITE> tag. In the following:

In _Snowcrash_, Neal Stephenson explored the implications of neuro-linguistic hacking …

Now, am I supposed to mark that up like:

In <CITE>Snowcrash</CITE>, Neal Stephenson explored the implications of neuro-linguistic hacking ...

Because I'm citing the book _Snowcrash_? So, along those lines, if I had instead written it as:

Neal Stephenson, in his book _Snowcrash_, explored the implications of neuro-linguistic hacking …

Would I then mark it up as:

<CITE>Neal Stephenson</CITE>, in his book Snowcrash, explored the implications of neuro-linguistic hacking ...

since now I'm emphasizing Neal Stephenson over the book? But the book was written by Neal Stephenson so should it instead be:

In <CITE>Snowcrash</CITE>, <CITE>Neal Stephenson</CITE> explored the implications of neuro-linguistic hacking ...

Okay, so it's a contrived example, but generating semantically correct markup isn't trivial and expecting the general public to get it correct is asking a bit too much. As one person pointed out [4], given a hypothetical tag like <EDITOR>, is it:

<EDITOR>Joe Blow</EDITOR>

or

<EDITOR>vi</EDITOR>

(except when it's <EDITOR>Frontpage</EDITOR> but I won't go there)?

There are other semi-obscure tags for semantic mark-up and fortunately, most of them are less ambiguous as for usage, like <CODE> is for mark-up of computer source code, or <SAMP> for program output. Unfortunately the HTML spec lists both <CODE> and <SAMP> as an inline tag, not a block tag which really restricts their use. I'm not sure what the W3C (World Wide Web Consortium) [5] was thinking when they made <CODE> and <SAMP> inline. Using <CODE> to mark-up code fragments will turn something like:

for (i = 0 ; types[i].sl != NULL ; i++)
{
  if (strstr(filename,types[i].sl) != NULL)
    return(types[i].sl);
}
return("text/plain");

into:

for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl != NULL) return(types[i].sl); } return("text/plain");

Nice, huh?

Dougal Campbell [6] suggests using:

CODE
{
  white-space: pre;
}

Which sounds good, but doesn't work. The CSS spec (§ 16.6 Whitespace—Cascading Style Sheets Level 2) [7] states that white-space is only valid for a display type of “block”, which <CODE> isn't (remember, it's “inline”). To work, you really need:

CODE
{
  display:     block;
  white-space: pre;
}

Which works fine in Mozilla [8], but fails for IE (Microsoft Internet Explorer) 5x (which is most likely a bug) and Lynx [9], which doesn't even look at the CSS (Cascading Style Sheet) file (and it looks like I have one regular reader who uses Lynx). As much as I would love to use <CODE> and <SAMP> for semantically better mark-up, I'm afraid I'm still stuck with using <PRE>; otherwise I'll end up with:

<CODE>for (i = 0 ; types[i].sl != NULL ; i++)</CODE><BR>
<CODE>{</CODE></BR>
<CODE>  if (strstr(filename,types[i].sl != NULL)</CODE><BR>
<CODE>    return(types[i].sl);</CODE><BR>
<CODE>}</CODE></BR>
<CODE>return("text/plain");</CODE><BR>

Which is silly. (Okay, it's easy enough to write some code to automatically convert the source code, but semantically, does it even make sense?)

The upshot of all this rambling about semantically correct HTML? Um … not much really. I won't be changing the mark-up I use too much since I do lose the visual appearance in most browsers (although I may try giving the <CITE> tag a bit of a go).

[1] http://diveintomark.org/archives/2002/12/27.html#pushing_the_envelope

[2] http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2.1

[3] http://www.w3.org/TR/1998/REC-html40-19980424/struct/text.html#h-9.2.1

[4] http://www.kuro5hin.org/comments/2002/12/29/202939/15/5#5

[5] http://www.w3c.org/

[6] http://dougal.gunters.org/myphpblog//archive.php?blogid=1&tem=emc3&y=2002&m=12&d=27

[7] http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop

[8] http://www.mozilla.org/

[9] http://lynx.browser.org/

Gemini Mention this post

Contact the author