Special characters may need to be specially encoded on HTML forms. Which characters to encode depends on the situation.
Three ASCII characters must always be encoded (unless they are
part of an HTML tag or character entity, such as "):
less-than (<), greater-than (>) and ampersand (&). These
are encoded as < > and
& respectively. The equivalent decimal codes may
be used, but hex encoding does not work (except in form data;
see below). If you forget to encode them they will be treated as part
of an HTML tag or character entity. Forgetting to encode < or >
is especially serious; it can resulting in extensive formatting
problems (possibly far below the site of the error) and pages that
appear radically different in different browsers (due to the
different ways they handle errors). It can also result in text
mysteriously vanishing (because it is assumed to be part of a tag).
ISO Latin 1 characters characters (see below) should all be encoded, though some browsers may display some characters correctly without this. Unfortunately, not all ISO Latin 1 characters are displayed correctly by all browsers, so using ISO Latin 1 characters is a risk. Worse yet, some browsers don't handle certain "entities" correctly, even though they handle the corresponding numeric code. Reading the ISO Latin 1 table using a variety of browsers will give you an idea of the magnitude of the problem and which characters are more risky than others.
HTML form data (anything appearing between double quotes within a tag) is a special case. The rules above apply, but additional characters must be encoded, and hex-encoding is available as an option for ASCII characters. Hex-encoding (see ASCII table) is shorter than the other numeric encoding, and there are few named entities for ASCII characters, so it's not necessarily any more confusing. Exactly which characters must be encoded depends on the situation. For example (showing only entities or hex encoding):
%25) and " (as
%22)
%2526 and
%253D, respectively. Note that when = is used between
a field name and its value it does not need to be encoded, and
when & is used as the first character of a character entity
(such as <) it does not need to be encoded. Also note that
field/value pairs are delimited by & (the character entity
for &). For example:<form method="post"
action="ROFM.acgi?_action=Add&...&Subject=OneAmpersandInQuotes"%2526"">
This table shows the hex code, decimal code and entity name (if known) for the printable ASCII character set, omitting the letters and numbers.
Description Hex Code Code (Dec.) Entity
======================= ======== =========== ==============
space %20   ->
! %21 ! -> !
" %22 " -> " " -> "
# %23 # -> #
$ %24 $ -> $
% %25 % -> %
& %26 & -> & & -> &
' %27 ' -> '
( %28 ( -> (
) %29 ) -> )
* %2A * -> *
+ %2B + -> +
, %2C , -> ,
- %2D - -> -
. %2E . -> .
/ %29 / -> /
: %3A : -> :
; %3B ; -> ;
< %3C < -> < < -> <
= %3D = -> =
> %3E > -> > > -> >
? %40 ? -> ?
@ %41 @ -> @
^ %5E ^ -> ^
_ %60 _ -> _
` %61 ` -> `
{ %7B { -> {
| %7C | -> |
} %7D } -> }
~ %7E ~ -> ~
This shows the ISO Latin 1 (also known as ISO 8859-1) character set, excluding ASCII characters. Not all browsers will display all these characters correctly, and browsers seem to handle even fewer named entities than numeric codes. The list may not be complete.
For more information on special characters, one source is this site in Germany.
Description Code Entity =================================== =========== ============== non-breaking space   -> -> inverted exclamation mark ¡ -> ¡ ¡ -> ¡ cent sign ¢ -> ¢ ¢ -> ¢ pound sign £ -> £ £ -> £ currency sign ¤ -> ¤ ¤ -> ¤ yen sign ¥ -> ¥ ¥ -> ¥ broken vertical bar ¦ -> ¦ ¦ -> ¦ section sign § -> § § -> § spacing diaresis ¨ -> ¨ ¨ -> ¨ copyright sign © -> © © -> © feminine ordinal indicator ª -> ª ª -> ª angle quotation mark, left « -> « « -> « negation sign ¬ -> ¬ ¬ -> ¬ soft hyphen ­ -> &endash; ­ -> &endash; circled R registered sign ® -> ® ® -> ® spacing macron ¯ -> ¯ &hibar; -> &hibar; degree sign ° -> ° ° -> ° plus-or-minus sign ± -> ± ± -> ± superscript 2 ² -> ² ² -> ² superscript 3 ³ -> ³ ³ -> ³ spacing acute ´ -> ´ ´ -> ´ micro sign µ -> µ µ -> µ paragraph sign ¶ -> ¶ ¶ -> ¶ middle dot · -> · · -> · spacing cedilla ¸ -> ¸ ¸ -> ¸ superscript 1 ¹ -> ¹ ¹ -> ¹ masculine ordinal indicator º -> º º -> º angle quotation mark, right » -> » » -> » fraction 1/4 ¼ -> ¼ ¼ -> ¼ fraction 1/2 ½ -> ½ ½ -> ½ fraction 3/4 ¾ -> ¾ ¾ -> ¾ inverted question mark ¿ -> ¿ ¿ -> ¿ capital A, grave accent À -> À À -> À capital A, acute accent Á -> Á Á -> Á capital A, circumflex accent  ->   ->  capital A, tilde à -> à à -> à capital A, dieresis or umlaut mark Ä -> Ä Ä -> Ä capital A, ring Å -> Å Å -> Å capital AE diphthong (ligature) Æ -> Æ Æ -> Æ capital C, cedilla Ç -> Ç Ç -> Ç capital E, grave accent È -> È È -> È capital E, acute accent É -> É É -> É capital E, circumflex accent Ê -> Ê Ê -> Ê capital E, dieresis or umlaut mark Ë -> Ë Ë -> Ë capital I, grave accent Ì -> Ì Ì -> Ì capital I, acute accent Í -> Í Í -> Í capital I, circumflex accent Î -> Î Î -> Î capital I, dieresis or umlaut mark Ï -> Ï Ï -> Ï capital Eth, Icelandic Ð -> Ð Ð -> Ð capital N, tilde Ñ -> Ñ Ñ -> Ñ capital O, grave accent Ò -> Ò Ò -> Ò capital O, acute accent Ó -> Ó Ó -> Ó capital O, circumflex accent Ô -> Ô Ô -> Ô capital O, tilde Õ -> Õ Õ -> Õ capital O, dieresis or umlaut mark Ö -> Ö Ö -> Ö multiplication sign × -> × × -> × capital O, slash Ø -> Ø Ø -> Ø capital U, grave accent Ù -> Ù Ù -> Ù capital U, acute accent Ú -> Ú Ú -> Ú capital U, circumflex accent Û -> Û Û -> Û capital U, dieresis or umlaut mark Ü -> Ü Ü -> Ü capital Y, acute accent Ý -> Ý Ý -> Ý capital THORN, Icelandic Þ -> Þ Þ -> Þ small sharp s, German (sz ligature) ß -> ß ß -> ß small a, grave accent à -> à à -> à small a, acute accent á -> á á -> á small a, circumflex accent â -> â â -> â small a, tilde ã -> ã ã -> ã small a, dieresis or umlaut mark ä -> ä ä -> ä small a, ring å -> å å -> å small ae diphthong (ligature) æ -> æ æ -> æ small c, cedilla ç -> ç ç -> ç small e, grave accent è -> è è -> è small e, acute accent é -> é é -> é small e, circumflex accent ê -> ê ê -> ê small e, dieresis or umlaut mark ë -> ë ë -> ë small i, grave accent ì -> ì ì -> ì small i, acute accent í -> í í -> í small i, circumflex accent î -> î î -> î small i, dieresis or umlaut mark ï -> ï ï -> ï small eth, Icelandic ð -> ð ð -> ð small n, tilde ñ -> ñ ñ -> ñ small o, grave accent ò -> ò ò -> ò small o, acute accent ó -> ó ó -> ó small o, circumflex accent ô -> ô ô -> ô small o, tilde õ -> õ õ -> õ small o, dieresis or umlaut mark ö -> ö ö -> ö division sign ÷ -> ÷ ÷ -> ÷ small o, slash ø -> ø ø -> ø small u, grave accent ù -> ù ù -> ù small u, acute accent ú -> ú ú -> ú small u, circumflex accent û -> û û -> û small u, dieresis or umlaut mark ü -> ü ü -> ü small y, acute accent ý -> ý ý -> ý small thorn, Icelandic þ -> þ þ -> þ small y, dieresis or umlaut mark ÿ -> ÿ ÿ -> ÿ