Why HTML is Inappropriate for E-Mail

Here’s the overview:

Now, I’ll explain in detail.

A little history

Netscape, once upon a time, had a web browser. To make their offering a more complete Internet client, they decided to add email support. This made some sense; they already had the most popular client for the Internet technology that was bringing people to the ’net in bigger numbers than anyone had previously imagined, and e-mail was the undisputed Internet "Killer App." Why not offer both, and take over the world?

Well, that was probably the plan, anyway.

Now, consider for a moment where e-mail was at the time. Most people used text-based mail readers, while a growing number of people were using graphical mail readers. The graphical readers (and many of the text-based ones) supported styled text, attachments, and a whole host of modern features. They did this using such well-designed and extensible standards as MIME (Multipurpose Internet Mail Extensions) and Enriched text.

Now, return to Netscape. They saw that they had a fairly extensive code base implementing their web browser, and they wanted to have to do as little work as possible to implement a mail client. They decided that they could get style and embedded images (one of the infinite things that MIME messages can embed) if they used HTML as their "rich" email format, leveraging their existing code base. And it was true; they could.

Doing that, of course, was the Wrong Thing (tm), for all the reasons that we’re going to discuss in this essay. Suffice it to say, for the moment, that what they did was good for them on the short term, but it was lazy, it departed from working and well-accepted standards, and it caused a whole host of problems.

Now, the rest of the world is stuck fighting our way back out of the hole that Netscape has dug us into.

Why standards are good

To be filled in...

Refer to the RFC repository at http://www.rfc-editor.org/

There was already a better way

The reason that most people think they want HTML email is to enable styled text. They want to be able to make the text larger, or make some text bold, or change the font style or color.

All of those features are already present in a more appropriate format, called Enriched text (MIME type "text/enriched"). I hope you will see in a moment why this format is more appropriate.

HTML as a text format does not allow the flexibility that enriched text does, while it adds a number of features that are entirely inappropriate for email.

For instance, HTML mail allows JavaScripts to be embedded in mail messages. This means that when you view a mail message, you might be executing an arbitrary program written by the sender of the message. This should give you immediate pause. HTML mail also allows embedding links to images that your mail reader will, trying to be helpful, download and display in place. Why is this bad? For one thing, it prohibits batch (or offline) email reading. For another, it allows the mail sender to determine exactly when and how often you read the mail message, since your mail reader will download the images from the sender’s server whenever you read the message. We’ll go into more detail about this when we discuss security holes, below.

Remember, email is not hypertext. HTML is the HyperText Markup Language. Email is a message-based communication medium. It doesn’t make sense to use a language that’s intended for marking up hypertext to apply style to email messages. That’s what the Enriched text content-type is for. HTML gives you only some of the features that you want for styled text, and it brings along with it a number of features that are wholly inappropriate for email.

It reduces interoperability

Enriched text was created to be easily parsable with even pure text-based mail readers. HTML mail, on the other hand, requires a complete HTML rendering engine in your mail client to be able make sense of it. Unless you happen to be using Netscape as your mail client, you probably don’t care to have all that baggage dragged into your mail client, making it bigger, slower, and more prone to serious bugs. And many people don’t even have the option.

Then there’s the issue of attachments. While attachments are a simple matter using MIME, there’s no mechanism to attach a file to an HTML mail message, other than the image links I mentioned above. It turns out that, even if you use HTML, you will need to use MIME (remember that?) to attach anything. Interestingly, you will also end up using MIME if you want to have more than one version of the text of the message, too. For instance, many HTML mail clients format messages as both HTML and text/enriched, so that they’ll be viewable in most mail readers. Text-only users still have to deal with the ugly HTML markup in the HTML section, but they can read the enriched part.

So, at this point, you have to ask, if I need to use MIME to do what we really want to do with mail messages, and if we need to use text/enriched if we want to interoperate with other users, and if those formats allow us to do everything we want to with mail, then why would we use HTML in email messages at all?

The answer is, of course, that if we thought about it at all, we wouldn’t.

It opens security holes

It’s interesting to note that Spammers (see spam) are particularly fond of HTML mail. Why would this be?

There are several reasons. For one thing, it takes control of your email client away from you and gives it to them. (Think: JavaScript, obfuscation of the mail message, using URL links to pull content into your mail reader window, etc.) For another, it allows much better tracking of your actions. If the HTML mail message contains links to images on the spammer’s web server, (this is a very common technique) then the spammer automatically has a log of when you read the message, which exact message you read (from a special string encoded in the URL), and from what computer.

A spammer is in the business of sending email messages to people who don’t want to receive them. So, even if you are very careful to protect your email address, not using it in public places, not posting it on web sites, and so on, a spammer might guess your email address. The spammer can send you a message, and if you read it, then they know that your email address is a real, valid email address, and that you read the spam that’s sent to you. Your address will go immediately onto the "A" list -- the list of addresses to which it would be particularly advantageous to send spam.

But it’s worse than that. If someone cared to, they could send you an HTML message and then be able to tell what computer, running what operating system, on what network, connected via which ISP, you used to read the message. This may be more information than you care to give out to anyone who can send you email -- which is to say, anyone in the world.

But it’s worse than just privacy violations. You’ve no doubt heard about all the email viruses that have been attacking computers and networks around the world. These attacks rely on the JavaScript (or other scripting) support enabled by using HTML as the email format.

(To be fair, several of the viruses recently have relied on really brain-dead behavior in Microsoft email clients, which are perfectly happy to run any program that an attacker sends to a victim in email. HTML isn’t strictly necessary to exploit these Microsoft bugs.)

Clearly, you don’t want to have HTML mail reading support in your mail reader. (Equally clear is that you should avoid using Microsoft email clients, too.) Unfortunately, in many email clients today, it’s not possible to turn off HTML support. In some, however, it is. And, fortunately, some HTML-capable mail readers don’t support running JavaScripts within mail messages. Not very many, though.

Worst yet, is that at least the AOL client software (and perhaps others) doesn’t even allow users to turn off HTML in messages they send! It’s just not possible, anymore, for an AOL user to do the right thing!

Conclusion

HTML in email is neither necessary nor useful. It doesn’t do quite what you want in email messages (remember: it was designed for a wholly different purpose), and there are purpose-built technologies that do a much better job. It opens you up to severe security problems.

HTML in email was a quick kluge by Netscape not to produce an innovative solution to any need that existed, but rather to get a bloated, bug-ridden product out the door a little faster. Is this really something that you want to buy into?

- Geoff Adams
18 Apr 2002