White Space Bugs in Browsers

Table of Contents

Introduction

This document describes how white space in your HTML source code can affect a graphical browser's presentation of your document, even though the white space should make no difference.

I last updated this document on 12-Jul-1997.

(If you're interested in this document, you may also be interested in a sample document that tests the effect of white space on tables, called tables8.)

Summary

In HTML code, the presence or absence of a carriage return before an end tag should not affect the rendering of a document, according to SGML specifications. However, many popular browsers do not get this right. If you are using a browser that displays an underlined space to the right of the letter "a" at the end of this link (or another artefact depending on your browser and its settings), then your browser is definitely affected to some extent:
Zeigen's Dilemma

This behavior has consequences that affect many HTML elements, especially tables and font elements.

HTML authors should be aware of this browser behavior if they wish to avoid it. The solutions are:

  1. Write the companies that make browsers and tell them you think they should follow SGML's end token specification.
  2. Don't put carriage returns before your end tags.

Detailed Explanation

I'm about to describe in detail a "bug" that browsers have regarding the processing of white space (spaces, tabs and carriage returns) and HTML end tags, such as </A>.

I put the word bug in quotes above, because it seems that the browsers are doing it intentionally, despite the fact that the official HTML 3.2 specification and draft HTML 4.0 specification both explicitly dictate a different behavior.

The first part of my argument is an early draft excerpt from the third chapter of the book HTML 4.0: No experience required, which Janan Platt and I wrote for Sybex. (It was published in October 1997. The draft excerpt below changed significantly in the final manuscript, subsequent to several rounds of editing, but it still expresses the point I'm trying to explain.)

Below the excerpt are some further examples and arguments that do not appear in the book.

This whole issue is something that has bothered me for quite a while: Browsers shouldn't ignore HTML's white space rules, which are descended from SGML's end record token rules. I discovered this behavior quite by accident when helping my friend John Restrick debug extraneous blank space in bordered tables containing images.


[From the middle of Chapter 3, where we talk about white space when you arrange HTML on the page in your document. We define white space and state the general rule, that multiple white space is collapsed into a single space. Then we talk about a more specific case.]

There is a rule of white space for HTML that can help you when you're creating an HTML document. Consider this code:
<P>
Text
</P>

Officially, the specification for HTML 3.2 defines the following piece of code as identical to the above:
<P>Text</P>

Furthermore, the following piece of code is considered to be identical to the above two arrangements:
<P>
Text</P>

The next piece of code is also equivalent:
<P>Text
</P>

Finally, because of the general white space rule, the following code is also equivalent to all of the above examples:
<P>

Text

</P>

These five examples all use the <P> and </P> tags, but the rule holds for every HTML tag. The consequence of this white space rule is that you can put carriage returns between your tags and your text as you see fit, in order to improve the legibility of your HTML documents.

(There is a tag called <PRE> (for "pre-formatted text") that can be used to get around the collapsing of white space. We'll learn about the pre element in Skill 5.)

However, there are two problems with the white space rules that we need to discuss: the exception to the white space rules, and the indenting dilemma.

The Exception to the White Space Rules

Unfortunately, Netscape Navigator, Microsoft Internet Explorer and most other browsers don't follow the white space rules! Sometimes, putting a carriage return between some text and an end tag results in an erroneous space sneaking into a browser's display of your document. To illustrate this problem, we'll use the <U> tag, which is used to underline text. Consider the following HTML code:

<TITLE>Underline Problem</TITLE>
Let's look at four examples of underlining:

<P>
<U>1. This text is underlined with no carriage returns.</U>
<BR>

<U>
2. This text is underlined with carriage returns before and after.
</U>
<BR>

<U>
3. This text is underlined with a carriage return before.</U>
<BR>

<U>4. This text is underlined with a carriage return after.
</U>

Now let's see how your browser displays this code:


Let's look at four examples of underlining:

1. This text is underlined with no carriage returns.
2. This text is underlined with carriage returns before and after.
3. This text is underlined with a carriage return before.
4. This text is underlined with a carriage return after.


Caption: Navigator displays the first and third underlining samples in the same way, but a problematic and erroneous space has appeared after the period in the second and fourth underlined phrases.

The error is hard to spot--but it's there, and it can show up in all sorts of documents in a way that makes them subtly different from how you'd expect. The problem may well be corrected in future versions of Navigator and other browsers, but this isn't too likely since the problem has existed in every version of Navigator and IE so far. This is just one example of how the people who create browsers don't always follow HTML specifications.

To avoid this problem, you may want to put your end tag immediately after the text, without any extra spaces or carriage returns.

[End of excerpt]


Further Examples

This behavior described above is something that's disturbed me since I first started writing HTML in 1994. It first became apparent to me with anchors -- if I did an anchor this way:

<a href="blah">read about blah
</a>

...then the anchor text would have an extra space at the end that was highlighted and underlined (or otherwise indicated, depending on the browser I was using).
Like this 

Anchors should be displayed without trailing spaces.
Like this

To see if your browser has the same problem, compare the following link to the above two:
Like this

If there is a trailing space, than your browser is not SGML-compliant. Surprise! Navigator, IE and Lynx are not compliant.

Another huge consequence of this problem is when you have two table cells containing images. A frequently asked question is why two images that should touch do not. If you write this HTML:

<TABLE border=0 cellpadding=0 cellspacing=0>
<TR>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0">
</TD>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0">
</TD>
</TR>
</TABLE>

The code above is rendered like this on your browser:

[example image] [example image]

Notice if there is any the space between the two images. There shouldn't be any extraneous space at all.

The example above may or may not be correct, depending on your browser. For Windows 95, Internet Explorer 3.02 works correctly, and Navigator 3.01 works incorrectly. (Please let me know what browser you're using and whether or not it works correctly for this example.)

For those browsers that don't work correctly, what's happening is that they see the carriage return between the <IMG> tag and the <TD> tag and render a space character, which makes the cell wider. I'll force the same effect here (by hard-coding a non-breaking space next to the image):

[example image]  [example image] 

To prevent the problem, use this HTML code:

<TABLE border=0 cellpadding=0>
<TR>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0"></TD>
<TD><IMG src="example.jpg" alt="[example image]" hspace="0"></TD>
</TR>
</TABLE>
</HTML>

Then the space disappears. The ONLY difference is where I put the </TD> tags (as emphasized in the code above).

There should be no extraneous spacing in this second version.

[example image] [example image]

(I'm just talking about space between the images, not the border, since there's another table wrapped around the images that has white space. You may be interested in seeing some test examples that include a border.)

No one I know has a good reason why browsers have ignored the specs here. Carriage returns before end tags shouldn't make a difference.

(Clarification: I understand there should be difference between
<U>foo </U>
and
<U>foo</U>
but there shouldn't be a difference between
<U>foo</U>
and
<U>foo
</U>

That's what I mean.)

(Disclaimer: I also don't mean to advocate the use of the underline element, which should be discouraged since it's physical markup and confuses viewers about what's text and what's a link. However, it's a very visual example that unambiguously illustrates my point.)

Finally, I want to point out the specification text that illustrates why the browsers are behaving so incorrectly. The official specfication for HTML 3.2 (at http://www.w3.org/pub/WWW/TR/REC-html32.html) is unambiguous in its declaration. Here's an excerpt:

The SGML rules for record boundaries are tricky. In particular, a record end immediately following a start tag should be discarded. For example:

<P>
Text

is equivalent to:

<P>Text

Similarly, a record end immediately preceding an end tag should be discarded. For example:

Text
</P>

is equivalent to:

Text</P>

Can't get much clearer than that.

I wrote the above line too soon -- HTML 4.0's draft specification is much clearer. (Excerpted from the White Space section of the Paragraphs, Lines, and Phrases document of the HTML 4.0 draft, otherwise known as Section 7.3.1.)

A line break occurring immediately following a start tag should be discarded, as should a line break occurring immediately before an end tag. This applies to all HTML elements without exceptions.

Too bad it's not (yet?) true for most browsers.

Credits and Comments

Thanks to:

Contact

Please send any comments to me at the address below, or use my feedback form. I'm very interested in what browsers handle these white space rules correctly, and which ones do not.


Zeigen's Dilemma
  E. Stephen Mack (estephen@emf.net)