Visual Index of Punctuation & Figures

Proper typesetting of punctuation is as meaningful and important as proper spelling. Each character performs a specific task and is not interchangeable, even though they may look similar, as in ’ ′ ´ (right single quote, prime, and acute). A thorough understanding of the material and how to set and punctuate it is necessary, and it is important to remember that individual punctuation marks can have a completely different look from one typeface to another – including looking like other characters in different typefaces. In the string of example marks, which is the apostrophe?

The business of setting type for the web documents is somewhat simplified by the fact that many characters have HTML entity names (though these don’t always make sense or correspond to the correct typographic character, so it is important to know how they interchange to get the desired result). While the concept of entity names does not extend to the entire character set, every character does occupy a defined and consistent position (called a code point) in the Unicode character specification. By using these universal designators, consistent results can be achieved across any compliant typeface (which most modern professional fonts are). Therefore, if you specify U+0024, you’ll be right on the money in any typeface you care to choose (so long as they do not omit the character).

Getting Unicode characters into a document can be achieved by several means (predicated on the target application offering Unicode support). Some programs allow you to input code points directly, but most frequently, you will have to rely on some sort of helper utility, usually provided by the operating system though third-party software packages sometimes offer more features and better usability. As a generic description, these applications present the character set in some kind of table which can be browsed, selected, and copied to the clipboard. Since this is rather tedious and open to all sorts of potential errors, well-written applications also offer the ability to jump directly to the correct character by typing in the code point. The only down side here is that you have to already know what you’re looking for.

In (X)HTML documents, Unicode characters can be specified by typing &#x and the four digit code point terminated with a semicolon. (Each entity name crossreferences to a specific code point via a document type declaration.) The Unicode specification uses hexadecimal notation to designate code points, and any software application that references these character sets follows that model. Since the various HTML specifications support characters given in either hex (base-16) or decimal (base-10), it’s not uncommon to see find characters referenced interchangeably (especially in online resources, though the tendency there is to use decimal equivalents, owing from the inability of certain early browsers to reliably process hex notation and therefore wreaking havoc on the page). The problem is, these base-10 Unicode characters can only be used in writing (X)HTML, and will have to be converted back to base-16 to be usable in any other context. It is best, then, to stick with hex notation.

On the specific subject of web typography, it is crucial to remember that while specific characters can look different from one typeface to another (obviously), the rendering of the same character in the same typeface can be wildly different from browser to browser, and should not be taken as an indication of how they will look when used in a layout application.

About the Index:

Following is a non-exhaustive list of characters which frequently appear in typesetting jobs, and a brief description. The second line of each entry gives the character and its code point; HTML entity name, if any, is also included, and if a character can be typed from the keyboard, that is noted by simply repeating the character.

Entry notes: (for this index)

 

 


 

acute

´ U+00B4 ´

An accent used on vowels. Does not appear independently (except in discussions of typography, diacritical marks, and the like).

addition

+ U+002B +

ampersand

& U+0026 &

Originally a ligature styled to form et (being the Latin for and), and when the alphabet was repeated rapidly the last character named was “et per se, and ” (that is, et by itself, and), which became corrupted to “and per se and,” and thence to “ampersand.”1 Use entity name in (X)HTML to prevent the character from being rendered.

angle brackets

⟨ ⟩ U+27E8 U+27E9 ⟨ ⟩

Used extensively in mathematical and scientific notation, and to indicate editorial additions in the editing of classical texts.

apostrophe

' U+0027 '

Typographically, this character would be (or could be) referred to as a “vertical prime”. (Primes being either vertical or sloped.) Found on typewriters and computer keyboards, which lack typographic quotation marks. Use as single quote when setting blocks of computer code.

asterisk

* U+002A *

From the Latin, asteriscum, little star (and traditionally called star by printers). First in the traditional sequence of reference marks.

backslash

\ U+005C \

Found on computer keyboards, this ASCII character was introduced in 1961 for use in computer programming logic. Has no accepted purpose in typography.

bar

| U+007C |

Primarily used in mathematical notation. On North American keyboards, bar is located as the shift-character on the backslash; for whatever reason, the key label frequently depicts the glyph as the broken bar, an entirely different character (U+00A6). Also known as a vertical rule.

braces

{ } U+007B U+007D { }

Used to mark phrases and sets in mathematical notation, and to indicate editorial deletions in the editing of classical texts. Braces (along with square brackets) can act as an additional set of inner or outer parentheses.

bullet

•  U+2022 •

A larger version of the midpoint, bullets are used to flag list items and similar elements.

caron

ˇ U+02C7

An accent used on vowels and consonants in Eastern European languages, and in Asia. Does not appear independently. Also known as an inverted circumflex.

cent

¢ U+00A2 ¢

circumflex

^ U+005E ˆ

An accent used on vowels. Does not appear independently.

copyright

© U+00A9 ©

Indicates that a work was registered for copyright, usually combined with the year of the filing date.

currency

¤ U+00A4 ¤

A generic character to denote currency when the symbol for said currency is unavailable in the typeface used.

dagger

U+2020 †

Second in the traditional sequence of reference marks.

degree

° U+00B0 °

Indicator of temperature or angle.

division

÷ U+00F7 ÷

dollar

$ U+0024 $

double acute

˝ U+02DD

An accent used on vowels. Does not appear independently.

double dagger

U+2021 ‡

Third in the traditional sequence of reference marks.

double prime

U+2033 ″

Abbreviation for:

  • inches (1″ = 25.4mm)
  • seconds of arc (arcseconds)
  • seconds of time.

See single prime.

double quotation

“ ” U+201C U+201D “ ”

In North America, used as the primary quotation mark. See single quotation.

ellipsis

U+2026 … … vs ...

Traditionally, the horizontal space used to indicate omission or rhetorical pause is made up of three or more periods which may be set close or spaced. Most modern digital typefaces contain an ellipsis built into a single character. If this is too narrow, it is perfectly acceptable to build your own. The difference between a single character ellipsis (left) and one built from a series of closed dots can be seen in the example, and can range from subtle to extreme depending on the font.

em dash

U+2014 —

A horizontal rule one em wide.

en dash

U+2013 –

A horizontal rule one en (half an em) wide.

euro

U+20AC €

grave

` U+0060

An accent used on vowels. Does not appear independently.

greater than

> U+003E >

Use entity name in (X)HTML to prevent the character from being rendered.

greater than or equal to

U+003E ≥

guilder

ƒ U+0192 ƒ

Dutch monetary unit. The stylized “ f ” originates with the original currency name, florin.

guillemets

‹ › U+2039 U+203A ‹ ›

« » U+00AB U+00BB « »

Commonly used as quotation marks for the Latin, Greek, and Cyrillic alphabets in Europe, Asia, and Africa, they may point out or in, depending on country and language. Also known as chevrons or angled quotes. Named, in the diminutive, for French type-cutter Guillaume le Bé (1525–98), who is said to have invented them. Single and double versions shown.

hyphen

- U+002D -

Unicode supports the keyboard hyphen-minus (U+002D) but also offers a separate hyphen at U+2010. Oddly, this code point does not appear on the supposedly comprehensive Adobe Glyph List (a reference document for type designers), which could imply that its usage is questionable. About the AGL, Adobe writes: “[T]his specification is intended to be stable, i.e. never revised. In particular, it is intended that no mappings will ever be added to the AGL.”

less than

< U+003C &lt;

Use entity name in (X)HTML to prevent the character from being rendered.

less than or equal to

U+2264 &le;

midpoint

· U+00B7 &middot;

U+2027 c·o·m·p‧a‧r‧e

Used to separate syllables or letters, and in notation for mathematics and symbolic logic. In typography also known as “small bullet”. Midpoint is vaguely akin to the Unicode “hyphenation point” (U+2027), the only purpose of which is to separate syllables in dictionary layout. Depending on the typeface, the amount of built-in space can be excessive.

multiplication / dimension

× U+00D7 &times;

Commonly, and improperly, replaced with the letter “x”.

numero

U+2116 № ... vs ... No Nos

Not really a typographic character, but a form of the classical abbreviation of the word “number”, especially as it appears in addresses (No 221b Baker Street) or product models (a Stanley No 55 hand plane). Although the entire abbreviation has been defined in Unicode (with both characters contained in a single code point), only the singular is provided. It is best, therefore, to create both versions so that singular and plural will match. (See example; since <u> was deprecated in HTML 4.0 and invalid in XHTML 1.0 Strict and 1.1, CSS was used.)

parallel

U+2016 ‖ ... vs ... ||

Fifth in the traditional sequence of reference marks. To insure matching heights, parallel and bar should be compared in the typeface used. If unequal, build parallel by kerning two vertical rules together.

parentheses

( ) U+0028 U+0029 ( )

Used to offset parenthetical phrases in grammar, and to mark groups in mathematics or logic notation.

per cent

% U+0025 %

One part per hundred. Also spelled percent.

per mil

U+2030 &permil;

One part per thousand, or a tenth of a percent. Also spelled per mille, per mill, and permill.

per myriad

U+2031

One part per ten thousand, or a hundredth of a percent (percent of a percent).

period

. U+002E .

The standard character of sentence termination in all European languages. Also known as full stop or full point, and in mathematics as decimal point.

pilcrow

U+00B6 &para;

The sixth, and final character in the traditional sequence of reference marks. After pilcrow, the sequence repeats with the figures doubled. Used by proofreaders and ancient scribes to indicate the beginning of a paragraph (being the traditional name of the character).

phi

Φ φ U+03A6 U+03C6 &Phi; &phi;

The twenty-first letter of the Greek alphabet (upper- and lowercase shown). Used to express the Mean of Phidias (i.e. the Golden Ratio).

pi

π U+03C0 &pi;

Lowercase form of the sixteenth letter of the Greek alphabet. Used in Euclidean geometry to indicate the ratio of a circle’s circumference to its diameter.

plus-or-minus

± U+00B1 &plusmn;

pounds sterling

£ U+00A3 &pound;

British monetary unit. Symbol also used in the currency of numerous African and Middle Eastern states.

prime

U+2032 &prime;

Abbreviation for:

  • feet (1′ = 12″)
  • minutes of arc (arcminute)
  • minutes of time.

See double prime.

quotation

" U+0022 &quot;

Typographically, this character would be (or could be) referred to as a “vertical double prime”. (Primes being either vertical or sloped.) Also known as a “dumb quote” from its usage on typewriters, which lack typographic quotation marks. Use when setting blocks of computer code.

registered trademark

® U+00AE &reg;

Indicates that the entity to which it accompanies has been registered as a trademark with the U.S. Patent and Trademark Office (USPTO).

section

§ U+00A7 &sect;

Used in science and mathematics, and especially to denote sections of legal code. Plural is indicated by two symbols. Fourth in the traditional sequence of reference marks.

single quotation

‘ ’ U+2018 U+2019 &lsquo; &rsquo;

Use:

  • When nesting quotes
  • Close-quote for apostrophe.

See double quotation.

solidus

U+2044

Used with superior and inferior numbers to create ad hoc fractions (where the numerator is set above the baseline). See the Typographic Slash Hassle for more information on history and formatting.

square brackets

[ ] U+005B U+005D [ ]

Used to interpolate and interject [usually missing] material into quoted matter, and as secondary or inner parentheses. “In the editing of classical texts, square brackets normally mark editorial restorations, angle brackets mark editorial and conjectural insertions, and braces mark deletions.”2 Traditionally referred to as simply brackets.

subtraction

U+2212 &minus;

Not the same as the keyboard minus character, and frequently designed to be identical to the en dash.

swung dash

U+223C &sim;

Used in mathematics to indicate similarity, and in lexicography to indicate repetition. Though this figure is used as a standalone character, it is not the one found on computer keyboards; that figure is the tilde, an accent mark which does not appear independent of alphabetic characters. The official Unicode name for the swung dash is tilde operator.

tilde

~ U+007E ~

An accent used on vowels in certain languages, and consonants in others. Does not appear independently, though it is a stock keyboard character. See swung dash.

trademark

U+2122 &trade;

Indicates that the entity which it accompanies is a trademark, but has not been registered with the U.S. Patent and Trademark Office (USPTO).

umlaut / diaeresis

¨ U+00A8 &uml;

An accent used on vowels. Does not appear independently. Linguists distinguish between the two terms though the typographic character is the same. In English and Romance languages, umlaut marks a change in the pronunciation of a single vowel, whereas diaeresis is used to mark the separation of adjacent vowels into individually pronounced sounds. A full suite of accented characters is available in most fonts.

unequal

U+2260 &ne;

virgule

/ U+002F /

/ italicized /

Used:

  • as a sign of separation
  • to indicate line breaks

To set references to pre-decimalization British currency (i.e. £/s/d), use an italic virgule. See the Typographic Slash Hassle for more information on history and formatting.

yen

¥ U+00A5 &yen;

Notes:

1 Southward, John and Arthur Powell. Practical Printing. 2nd ed. (London: J.M. Powell & Son, 1884), 7.

2 Bringhurst, Robert. The Elements of Typographic Style. 2nd ed. (Vancouver: Hartley & Marks, Publishers, 2002), 285.