XHTML can do everything HTML can, so the real question is, do you need any of functions XHTML offers. And, will you be able to use the proper mimetype for it? Without the correct mimetype there is no reason to use XHTML as it will trigger quirks mode in the browser. And that is something you probably don’t want.
According to the w3c xhtml is:
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents.
So, HTML is a part of XHTML. But with an XML flavour to it.
XML
The Extensible Markup Language (XML) is a W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data. It is a simplified subset of SGML. Its primary purpose is to facilitate the sharing of data across different systems, particularly systems connected via the Internet.
Let’s start of with some important characteristics of XML. First of all, it’s well formed. Every element needs to be closed, and that needs to be done in the right order.
- <b><i>Something really important</b></i>
- <img src=’images/image.gif’ alt=’image’>
- <b><i>Something really important</i></b>
<img src=’images/image.gif’ alt=’image’ />
The first two examples are faulty XML. Number one could and can be found in the code of beginning webdevelopers, but also in that of more experienced coders that don’t look carefully at the order of their elements. In HTML, this is somewhat allowed, in XHTML is just plain wrong. The right notation can be found at point 3.
The second example shows the normal way to write an image tag. In XML however every single element needs to be closed for the document to render correctly. That includes img, input, br and meta tags. The space before the slash isn’t required but without it netscape 4 can’t interpret the code at all. With the space things will probably still not look good, but at least they will be able to see the page.
XML and errors
Because XML has strict rules for closing elements, and also for the encoding type (standard utf8 but it should be declared in the xml declaration). It just stops processing and gives out errors when these rules aren’t followed. So, especially for beginners to xml it can be a daunting task to make everything work.
This means that all characters like ë should be replaced by their numerical value which in this case is ë. When transforming a large website this could prove to be a difficult task. Luckily you can use something like a php function as utf8_encode($tekst); for that. But that does mean going back into the code, and checking all output, or all input for strange characters.
XML and new elements
In xml there are hardly any strict elements. It is a language that is used to describe data in a human readable format, while still making it easy to read for computers. Because of this everyone is allowed to make up their own elements.
This is also one of the ideas behind XHTML. Every developer can make addition to it when necessary. The w3c gives the following example for use of XHTML in conjuction with MathML.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>A Math Example</title>
</head>
<body>
<p>The following is MathML markup:</p>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply> <log/>
<logbase>
<cn> 3 </cn>
</logbase>
<ci> x </ci>
</apply>
</math>
</body>
</html>
Like in XML the use of a namespace shows where to find the defenition of the used elements. Of course you can use more than MathML. That is one of the strengths of XHTML. That and the fact that as the information is valid XML you can use XML editors and checkers to verify the content, and easily incorporate it into a different program.
Mime types
According to theW3C’s Note on XHTML Media Types:
- HTML 4 should be served as text/html.
- “HTML compatible” XHTML (as defined in appendix C of the XHTML 1.0 specification) may be served as text/html, but it should be served as application/xhtml+xml.
- XHTML 1.1 should not be served as text/html.
- The specifications of XHTML 2.0 aren’t finalised yet but all indications are that XHTML 2.0 must not be served as text/html.
The webserver gives a mimetype along with the document when you request it from the server. This is read before the doctype declaration of the document and determines in which mode a document is processed. The normal way for a html page to be send to a browser is ‘text/html’ however, for an XHTML document it should be ‘application/xhtml+xml’. And, if the header text/html is send to the browser the browser will interpret is as normal html, and will go into quirks mode.
There are a few different modes in which a browser can handle a html document, standard mode, almost standard and quirks mode. In quirks mode the browser tries to correct as much faulty html as possible. Since in html tags like <img> should not be closed and they must be closed in XHTML, the browser will render XHTML with a mime type of ‘text/html’ in quirks mode (and in some browsers in almost standards mode).
Writing XHTML without the matching mimetype is pointless as it will only trigger quirks mode and that is exactly what you don’t want when confirming to standards. Unfortunately Internet Explorer doesn’t understand the mimetype ‘application/xhtml+xml’ yet, so for that browser you need to give out the mimetype ‘text/html’.
Doctypes
Besides the use of mimetypes there’s also the doctype. Where you can leave that out in HTML, it’s obliged in XHTML. There’s three different ones, much similar to the ones used in HTML.
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“DTD/xhtml1-strict.dtd”>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
“DTD/xhtml1-transitional.dtd”>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN”
“DTD/xhtml1-frameset.dtd”>
Other differences
In HTML elements can be written in capitals, for XHTML however that is definately wrong. So, even when writing HTML start now with writing all tags in lowercase. Also, attributes should always be within quotes (single or double) and cannot be empty. In html it is okey to have the following code:
<input type='radio' checked name='radiobutton'>
For XHTML however the ‘checked’ would make everything shriek to a halt and jump right into errors. Not to forget the lack of the closing slash of course. While this is allowed in HTML however, it is better to not use those as empty attributes however. So change that today!
HTML or XHTML?
XHTML can do everything HTML can do, so the real question is, do you need any of functions XHTML offers. And, will you be able to use the proper mimetype for it? Without the correct mimetype there is no reason to use XHTML as it will trigger quirks mode in the browser. And that is something you probably don’t want.
However, XHTML is (according to the w3c) the future of HTML. And it’d be good to get some experience with it. Which you choose is up to you however.
sources:
w3 xhtml media types
XHTML™ 1.0: The Extensible HyperText Markup Language
Extensible Markup Language (XML) 1.0