in Internet Services

HTML5: What kind of standard is this, anyway?

I haven’t written much HTML since 1996. Back in those wild west days before CSS existed, we used <font> tags and <table>-based layouts to control how websites were presented to end-users. Although these ugly hacks limited the sophistication of web sites and later proved to be a barrier to the development of rich Internet applications, they were at least part of a standard: HTML 2.0.

Today, however, I think the state of markup on the Internet is far worse, despite the existence of this beast we call HTML5. To my utter shock, I discovered that HTML5 isn’t even what one could call a standard. On the contrary, HTML5 represents standards committees (and there are two — I’ll get into this later) throwing in the towel because of internecine fighting, to the enormous detriment of web, application and browser developers everywhere. The most poignant illustration of the problem is that there is no DTD for HTML5: you merely write <!DOCTYPE html> and you’re on your way. Nothing says “anything goes” better than “don’t even bother mechanically validating this because who the hell knows what’s valid?”

I learned all this after reading Jeremy Keith’s HTML5 For Web Designers. Keith opens the book with a chapter called “A Brief History of Markup”, describing the the “standards” process(es) that led to HTML5. The sad state of HTML standardization is a direct result of the internal politics and bickering that followed the publication of the HTML 4.01 standard back in 1999. On one hand, the W3C was pushing for an HTML standard based on XML: this was XHTML 1.0 and 1.1. They then began work on XHTML 2.0, which was, as Keith says, “a pure language, unburdened by the sloppy history of previous specifications. It was a disaster.” Meanwhile, unhappy rebels, calling themselves the WHATWG (Web Hypertext Application Technology Working Group) had started work on a set of competing specifications to extend HTML for web applications, and called it HTML5.

Eventually the W3C came around to reality, cancelled the XHTML 2.0 project, and is adopting WHATWG’s work as the basis for their own standard. In the meantime, ten years have passed, and developers looking for more interactivity in their webpages have adopted proprietary plug-in solutions like Flash and Silverlight. HTML5, in a way, is an attempt to regain some of that lost ground. The standard does have some great features on paper. The <audio> and <video> tags, for example, make it easier to add multimedia to web pages. But as web developer Greg Hluska points out, because of the lack of underlying standards, browser manufacturers can’t even agree on what video formats to support. As a result, Web developers will probably continue using JavaScript and/or Flash/Silverlight to develop multimedia players, because the feature set of HTML5 for these use-cases is so far behind the curve as to be laughable.

What really irks me, though, is that, as Keith says, the so-called “standard” is based around concepts like “don’t reinvent the wheel” and “pave the cowpaths”. These concepts do not a standard make, because they amount to codifying bad behavior, past and future. If Internet Explorer implemented a tag tomorrow, say, <office>, to embed Microsoft Office documents in a webpage, Microsoft could argue that because of IE’s vast adoption rate that it’s a cowpath. Would we then move to pave it? The lack of agreement on what constitutes a standard just sets up working groups for years of bickering. In most other standards processes, the standard drives the implementation, not the other way around.

HTML5 is a shiny new concept with lots of marketing buzz but I don’t think it’s the next killer technology. It can hardly even be called a standard! This is probably what’s going to happen: The committees will continue to get bogged down in internecine bickering, and the W3C/WHATWG will eventually emit a standard that browser manufacturers will implement, sort of. They will also continue inventing their own tags and render “standard HTML5” markup in their own peculiar ways, thereby making a hash out of even this lax standards process. And ultimately, web developers lose, because they will forevermore have to implement ugly browser-specific hacks in their HTML/CSS like vendor-prefixes (-moz-some-shiny-new-feature) and also learn the various ways in which so-called “standard” tags behave/misbehave in assorted browsers.

Even as someone who doesn’t write HTML any longer, I find this landscape very bleak. It’s bad for developers and bad for users. The “standard” should be called HTML 4.02, not HTML5. It’s not worthy of the prestige of a major version number.