This document describes how white space in your HTML source code can affect a graphical browser's presentation of your document, even though the white space should make no difference.
I last updated this document on 12-Jul-1997.
(If you're interested in this document, you may also be interested in a sample document that tests the effect of white space on tables, called tables8.)
In HTML code, the presence or absence of a carriage return before an
end tag should not affect the rendering of a document, according to
SGML specifications. However, many popular browsers do not get this
right. If you are using a browser that displays an underlined
space to the right of the letter "a" at the end of this link
(or another artefact depending on
your browser and its settings), then your browser is definitely
affected to some extent:
Zeigen's Dilemma
This behavior has consequences that affect many HTML elements, especially tables and font elements.
HTML authors should be aware of this browser behavior if they wish to avoid it. The solutions are:
I'm about to describe in detail a "bug" that browsers have regarding
the processing of white space (spaces, tabs and carriage
returns) and HTML end tags, such as
</A>.
I put the word bug in quotes above, because it seems that the browsers are doing it intentionally, despite the fact that the official HTML 3.2 specification and draft HTML 4.0 specification both explicitly dictate a different behavior.
The first part of my argument is an early draft excerpt from the third chapter of the book HTML 4.0: No experience required, which Janan Platt and I wrote for Sybex. (It was published in October 1997. The draft excerpt below changed significantly in the final manuscript, subsequent to several rounds of editing, but it still expresses the point I'm trying to explain.)
Below the excerpt are some further examples and arguments that do not appear in the book.
This whole issue is something that has bothered me for quite a while: Browsers shouldn't ignore HTML's white space rules, which are descended from SGML's end record token rules. I discovered this behavior quite by accident when helping my friend John Restrick debug extraneous blank space in bordered tables containing images.
There is a rule of white space for HTML that can help you when
you're creating an HTML document. Consider this code:
<P>
Text
</P>
Officially, the
specification
for HTML 3.2
defines the following piece of code as identical to the
above:
<P>Text</P>
Furthermore, the following piece of code is considered to be identical
to the above two arrangements:
<P>
Text</P>
The next piece of code is also equivalent:
<P>Text
</P>
Finally, because of the general white space rule, the following code is
also equivalent to all of the above examples:
<P>
Text
</P>
These five examples all use the
<P> and </P>
tags, but the rule holds for
every HTML tag. The consequence of this white space rule is that
you can put carriage returns between your tags and your text as
you see fit, in order to improve the legibility
of your HTML documents.
(There is a tag called
<PRE> (for "pre-formatted text") that can be used
to get around the collapsing of white space.
We'll learn about the pre element in Skill 5.)
However, there are two problems with the white space rules that we need to discuss: the exception to the white space rules, and the indenting dilemma.
Unfortunately, Netscape Navigator, Microsoft Internet Explorer and most
other browsers don't follow the white space rules! Sometimes, putting a
carriage return between some text and an end tag results in an
erroneous space sneaking into a browser's display of your document. To
illustrate this problem, we'll use the
<U>
tag, which is
used to underline text. Consider the following HTML code:
<TITLE>Underline Problem</TITLE>
Let's look at four examples of underlining:
<P>
<U>1. This text is underlined with no carriage returns.</U>
<BR>
<U>
2. This text is underlined with carriage returns before and after.
</U>
<BR>
<U>
3. This text is underlined with a carriage return before.</U>
<BR>
<U>4. This text is underlined with a carriage return after.
</U>
Now let's see how your browser displays this code:
|
Let's look at four examples of underlining:
1. This text is underlined with no carriage returns.
Caption: Navigator displays the first and third underlining samples in the same way, but a problematic and erroneous space has appeared after the period in the second and fourth underlined phrases. |
The error is hard to spot--but it's there, and it can show up in all sorts of documents in a way that makes them subtly different from how you'd expect. The problem may well be corrected in future versions of Navigator and other browsers, but this isn't too likely since the problem has existed in every version of Navigator and IE so far. This is just one example of how the people who create browsers don't always follow HTML specifications.
To avoid this problem, you may want to put your end tag immediately after the text, without any extra spaces or carriage returns.
[End of excerpt]
This behavior described above is something that's disturbed me since I first started writing HTML in 1994. It first became apparent to me with anchors -- if I did an anchor this way:
<a href="blah">read about blah
</a>
...then the anchor text would have an extra space at the end that was
highlighted and underlined (or otherwise indicated, depending
on the browser I was using).
Like this
Anchors should be displayed without trailing spaces.
Like this
To see if your browser has the same problem, compare the following link
to the above two:
Like this
If there is a trailing space, than your browser is not SGML-compliant. Surprise! Navigator, IE and Lynx are not compliant.
Another huge consequence of this problem is when you have two table cells containing images. A frequently asked question is why two images that should touch do not. If you write this HTML:
<TABLE border=0 cellpadding=0 cellspacing=0>
<TR>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0">
</TD>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0">
</TD>
</TR>
</TABLE>
The code above is rendered like this on your browser:
|
Notice if there is any the space between the two images. There shouldn't be any extraneous space at all.
The example above may or may not be correct, depending on your browser. For Windows 95, Internet Explorer 3.02 works correctly, and Navigator 3.01 works incorrectly. (Please let me know what browser you're using and whether or not it works correctly for this example.)
For those browsers that don't work correctly, what's happening
is that they see the carriage return between
the <IMG> tag and the <TD> tag
and render a space character, which makes the cell wider. I'll
force the same effect here (by hard-coding a non-breaking space
next to the image):
|
To prevent the problem, use this HTML code:
<TABLE border=0 cellpadding=0>
<TR>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0"></TD>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0"></TD>
</TR>
</TABLE>
</HTML>
Then the space disappears. The ONLY difference is where I put
the </TD> tags (as emphasized in the code
above).
There should be no extraneous spacing in this second version.
|
(I'm just talking about space between the images, not the border, since there's another table wrapped around the images that has white space. You may be interested in seeing some test examples that include a border.)
No one I know has a good reason why browsers have ignored the specs here. Carriage returns before end tags shouldn't make a difference.
(Clarification: I understand there should be difference between
<U>foo </U>
and
<U>foo</U>
but there shouldn't be a difference between
<U>foo</U>
and
<U>foo
</U>
That's what I mean.)
(Disclaimer: I also don't mean to advocate the use of the underline element, which should be discouraged since it's physical markup and confuses viewers about what's text and what's a link. However, it's a very visual example that unambiguously illustrates my point.)
Finally, I want to point out the specification text that illustrates why the browsers are behaving so incorrectly. The official specfication for HTML 3.2 (at http://www.w3.org/pub/WWW/TR/REC-html32.html) is unambiguous in its declaration. Here's an excerpt:
The SGML rules for record boundaries are tricky. In particular, a record end immediately following a start tag should be discarded. For example:
<P>
Textis equivalent to:
<P>TextSimilarly, a record end immediately preceding an end tag should be discarded. For example:
Text
</P>is equivalent to:
Text</P>
Can't get much clearer than that.
I wrote the above line too soon -- HTML 4.0's draft specification is much clearer. (Excerpted from the White Space section of the Paragraphs, Lines, and Phrases document of the HTML 4.0 draft, otherwise known as Section 7.3.1.)
A line break occurring immediately following a start tag should be discarded, as should a line break occurring immediately before an end tag. This applies to all HTML elements without exceptions.
Too bad it's not (yet?) true for most browsers.
Thanks to:
Please send any comments to me at the address below, or use my feedback form. I'm very interested in what browsers handle these white space rules correctly, and which ones do not.
E. Stephen Mack
(estephen@emf.net)