Special Characters

Contents
Introduction
Set Up
Use
Special Characters
Handling Special Characters
ASCII Characters
ISO Latin-1 Characters
Misc

Handling Special Characters

Special characters may need to be specially encoded on HTML forms. Which characters to encode depends on the situation.

Three ASCII characters must always be encoded (unless they are part of an HTML tag or character entity, such as &quot;): less-than (<), greater-than (>) and ampersand (&). These are encoded as &lt; &gt; and &amp; respectively. The equivalent decimal codes may be used, but hex encoding does not work (except in form data; see below). If you forget to encode them they will be treated as part of an HTML tag or character entity. Forgetting to encode < or > is especially serious; it can resulting in extensive formatting problems (possibly far below the site of the error) and pages that appear radically different in different browsers (due to the different ways they handle errors). It can also result in text mysteriously vanishing (because it is assumed to be part of a tag).

ISO Latin 1 characters characters (see below) should all be encoded, though some browsers may display some characters correctly without this. Unfortunately, not all ISO Latin 1 characters are displayed correctly by all browsers, so using ISO Latin 1 characters is a risk. Worse yet, some browsers don't handle certain "entities" correctly, even though they handle the corresponding numeric code. Reading the ISO Latin 1 table using a variety of browsers will give you an idea of the magnitude of the problem and which characters are more risky than others.

HTML form data (anything appearing between double quotes within a tag) is a special case. The rules above apply, but additional characters must be encoded, and hex-encoding is available as an option for ASCII characters. Hex-encoding (see ASCII table) is shorter than the other numeric encoding, and there are few named entities for ASCII characters, so it's not necessarily any more confusing. Exactly which characters must be encoded depends on the situation. For example (showing only entities or hex encoding):

ASCII Characters

This table shows the hex code, decimal code and entity name (if known) for the printable ASCII character set, omitting the letters and numbers.

Description               Hex Code    Code (Dec.)    Entity
=======================   ========    ===========    ==============
space                       %20         &#32;  ->  
!                           %21         &#33;  -> !
"                           %22         &#34;  -> "    &quot; -> "
#                           %23         &#35;  -> #
$                           %24         &#36;  -> $
%                           %25         &#37;  -> %
&                           %26         &#38;  -> &    &amp; -> &
'                           %27         &#39;  -> '
(                           %28         &#40;  -> (
)                           %29         &#41;  -> )
*                           %2A         &#42;  -> *
+                           %2B         &#43;  -> +
,                           %2C         &#44;  -> ,
-                           %2D         &#45;  -> -
.                           %2E         &#46;  -> .
/                           %29         &#47;  -> /
 
:                           %3A         &#58;  -> :
;                           %3B         &#59;  -> ;
<                           %3C         &#60;  -> <    &lt; -> <
=                           %3D         &#61;  -> =
>                           %3E         &#62;  -> >    &gt; -> >
?                           %40         &#63;  -> ?
@                           %41         &#64;  -> @
 
^                           %5E         &#94;  -> ^
_                           %60         &#95;  -> _
`                           %61         &#96;  -> `
 
{                           %7B         &#123;  -> {
|                           %7C         &#124;  -> |
}                           %7D         &#125;  -> }
~                           %7E         &#126;  -> ~
 

ISO Latin 1Characters

This shows the ISO Latin 1 (also known as ISO 8859-1) character set, excluding ASCII characters. Not all browsers will display all these characters correctly, and browsers seem to handle even fewer named entities than numeric codes. The list may not be complete.

For more information on special characters, one source is this site in Germany.

Description                           Code           Entity
===================================   ===========    ==============
non-breaking space                    &#160; ->      &nbsp; ->  
inverted exclamation mark             &#161; -> ¡    &iexcl; -> ¡
cent sign                             &#162; -> ¢    &cent; -> ¢
pound sign                            &#163; -> £    &pound; -> £
currency sign                         &#164; -> ¤    &curren; -> ¤
yen sign                              &#165; -> ¥    &yen; -> ¥
broken vertical bar                   &#166; -> ¦    &brvbar; -> ¦
section sign                          &#167; -> §    &sect; -> §
spacing diaresis                      &#168; -> ¨    &uml; -> ¨
copyright sign                        &#169; -> ©    &copy; -> ©
feminine ordinal indicator            &#170; -> ª    &ordf; -> ª
angle quotation mark, left            &#171; -> «    &laquo; -> «
negation sign                         &#172; -> ¬    &not; -> ¬
soft hyphen                           &#173; -> &endash;    &shy; -> &endash;
circled R registered sign             &#174; -> ®    &reg; -> ®
spacing macron                        &#175; -> ¯    &hibar; -> &hibar;
degree sign                           &#176; -> °    &deg; -> °
plus-or-minus sign                    &#177; -> ±    &plusmn; -> ±
superscript 2                         &#178; -> ²    &sup2; -> ²
superscript 3                         &#179; -> ³    &sup3; -> ³
spacing acute                         &#180; -> ´    &acute; -> ´
micro sign                            &#181; -> µ    &micro; -> µ
paragraph sign                        &#182; -> ¶    &para; -> ¶
middle dot                            &#183; -> ·    &middot; -> ·
spacing cedilla                       &#184; -> ¸    &cedil; -> ¸
superscript 1                         &#185; -> ¹    &sup1; -> ¹
masculine ordinal indicator           &#186; -> º    &ordm; -> º
angle quotation mark, right           &#187; -> »    &raquo; -> »
fraction 1/4                          &#188; -> ¼    &frac14; -> ¼
fraction 1/2                          &#189; -> ½    &frac12; -> ½
fraction 3/4                          &#190; -> ¾    &frac34; -> ¾
inverted question mark                &#191; -> ¿    &iquest; -> ¿
capital A, grave accent               &#192; -> À    &Agrave; -> À
capital A, acute accent               &#193; -> Á    &Aacute; -> Á
capital A, circumflex accent          &#194; -> Â    &Acirc; -> Â
capital A, tilde                      &#195; -> Ã    &Atilde; -> Ã
capital A, dieresis or umlaut mark    &#196; -> Ä    &Auml; -> Ä
capital A, ring                       &#197; -> Å    &Aring; -> Å
capital AE diphthong (ligature)       &#198; -> Æ    &AElig; -> Æ
capital C, cedilla                    &#199; -> Ç    &Ccedil; -> Ç
capital E, grave accent               &#200; -> È    &Egrave; -> È
capital E, acute accent               &#201; -> É    &Eacute; -> É
capital E, circumflex accent          &#202; -> Ê    &Ecirc; -> Ê
capital E, dieresis or umlaut mark    &#203; -> Ë    &Euml; -> Ë
capital I, grave accent               &#204; -> Ì    &Igrave; -> Ì
capital I, acute accent               &#205; -> Í    &Iacute; -> Í
capital I, circumflex accent          &#206; -> Î    &Icirc; -> Î
capital I, dieresis or umlaut mark    &#207; -> Ï    &Iuml; -> Ï
capital Eth, Icelandic                &#208; -> Ð    &ETH; -> Ð
capital N, tilde                      &#209; -> Ñ    &Ntilde; -> Ñ
capital O, grave accent               &#210; -> Ò    &Ograve; -> Ò
capital O, acute accent               &#211; -> Ó    &Oacute; -> Ó
capital O, circumflex accent          &#212; -> Ô    &Ocirc; -> Ô
capital O, tilde                      &#213; -> Õ    &Otilde; -> Õ
capital O, dieresis or umlaut mark    &#214; -> Ö    &Ouml; -> Ö
multiplication sign                   &#215; -> ×    &times; -> ×
capital O, slash                      &#216; -> Ø    &Oslash; -> Ø
capital U, grave accent               &#217; -> Ù    &Ugrave; -> Ù
capital U, acute accent               &#218; -> Ú    &Uacute; -> Ú
capital U, circumflex accent          &#219; -> Û    &Ucirc; -> Û
capital U, dieresis or umlaut mark    &#220; -> Ü    &Uuml; -> Ü
capital Y, acute accent               &#221; -> Ý    &Yacute; -> Ý
capital THORN, Icelandic              &#222; -> Þ    &THORN; -> Þ
small sharp s, German (sz ligature)   &#223; -> ß    &szlig; -> ß
small a, grave accent                 &#224; -> à    &agrave; -> à
small a, acute accent                 &#225; -> á    &aacute; -> á
small a, circumflex accent            &#226; -> â    &acirc; -> â
small a, tilde                        &#227; -> ã    &atilde; -> ã
small a, dieresis or umlaut mark      &#228; -> ä    &auml; -> ä
small a, ring                         &#229; -> å    &aring; -> å
small ae diphthong (ligature)         &#230; -> æ    &aelig; -> æ
small c, cedilla                      &#231; -> ç    &ccedil; -> ç
small e, grave accent                 &#232; -> è    &egrave; -> è
small e, acute accent                 &#233; -> é    &eacute; -> é
small e, circumflex accent            &#234; -> ê    &ecirc; -> ê
small e, dieresis or umlaut mark      &#235; -> ë    &euml; -> ë
small i, grave accent                 &#236; -> ì    &igrave; -> ì
small i, acute accent                 &#237; -> í    &iacute; -> í
small i, circumflex accent            &#238; -> î    &icirc; -> î
small i, dieresis or umlaut mark      &#239; -> ï    &iuml; -> ï
small eth, Icelandic                  &#240; -> ð    &eth; -> ð
small n, tilde                        &#241; -> ñ    &ntilde; -> ñ
small o, grave accent                 &#242; -> ò    &ograve; -> ò
small o, acute accent                 &#243; -> ó    &oacute; -> ó
small o, circumflex accent            &#244; -> ô    &ocirc; -> ô
small o, tilde                        &#245; -> õ    &otilde; -> õ
small o, dieresis or umlaut mark      &#246; -> ö    &ouml; -> ö
division sign                         &#247; -> ÷    &divide; -> ÷
small o, slash                        &#248; -> ø    &oslash; -> ø
small u, grave accent                 &#249; -> ù    &ugrave; -> ù
small u, acute accent                 &#250; -> ú    &uacute; -> ú
small u, circumflex accent            &#251; -> û    &ucirc; -> û
small u, dieresis or umlaut mark      &#252; -> ü    &uuml; -> ü
small y, acute accent                 &#253; -> ý    &yacute; -> ý
small thorn, Icelandic                &#254; -> þ    &thorn; -> þ
small y, dieresis or umlaut mark      &#255; -> ÿ    &yuml; -> ÿ