The humble escape character
by Paul Curzon, Queen Mary University of London
The escape character is a rather small and humble thing, often ignored, easily misunderstood but vital in programming languages. It is used simply to say symbols that follow should be treated differently. The n in \n is no longer just an n but a newline character, for example. It is the escape character \ that makes the change. The escape character has a long history dating back to at least Ancient Egypt and probably earlier.
The Ancient Egyptians famously used a language of pictures to write: hieroglyphs. How to read the language was lost for thousands of years, and it proved to be fiendishly difficult to decipher. The key to doing this turned out to be the Rosetta Stone, discovered when Napoleon invaded Egypt. It contained the same text in three different languages: the Hieroglyphic script, Greek and also an Egyptian script called Demotic.
A whole series of scholars ultimately contributed, but the final decipherment was done by Jean-François Champollion. Part of the difficulty in decipherment, even with a Greek translation of the Rosetta Stone text available, was because it wasn’t, as commonly thought, just a language where symbolic pictures represented words (a picture of the sun, meaning sun, for example). Instead, it combined several different systems of writing but using the same symbols. Those symbols could be read in different ways. The first way was as alphabetic letters that stood for consonants (like b, d and p in our alphabet). Words could be spelled out in this alphabet. The second was phonetically where symbols could stand for groups of such sounds. Finally, the picture could stand not for a sound but for a meaning. A picture of a duck could mean a particular sound or it could mean a duck!
Part of the reason it took so long to decipher the language was that it was assumed that all the symbols were pictures of the things they represented. It was only when eventually scholars started to treat some as though they represented sounds that progress was made. Even more progress was made when it was realised the same symbol meant different things and might be read in a different way, even in the same phrase.
However, if the same symbol meant different things in different places of a passage, how on earth could even Egyptian readers tell? How might you indicate a particular group of characters had a special meaning?
One way the Egyptians used specifically for names is called a cartouche: they enclosed the sequence of symbols that represented a name in an oval-like box, like the one shown for Cleopatra. This was one of the first keys to unlocking the language as the name of pharaoh Ptolemy appeared several times in the Greek of the Rosetta Stone. Once someone had the idea that the cartouches might be names, the symbols used to spell out Ptolemy a letter at a time could be guessed at.
Putting things in boxes works for a pictorial language, but it isn’t so convenient as a more general way of indicating different uses of particular symbols or sequences of them. The Ancient Egyptians therefore had a much simpler way too. The normal reading of a symbol was as a sound. A symbol that was to be treated as a picture of the word it represented was followed by a line (so despite all the assumptions of the translators and the general perception of them, a hieroglyph as picture is treated as the exception not the norm!)
For example, the hieroglyph that is a picture of the Egyptian eagle stands for a single consonant sound, aleph. We would pronounce it ‘ah’ and it can be seen in the cartouche for Cleopatra that sounds out her name. However, add the line after the picture of the eagle (as shown) and it just means what it looks like: the Egyptian eagle.
Cartouches actually included the line at the end too indicating in itself their special meaning, as can be seen on the Cleopatra cartouche above
The Egyptian line hieroglyph is what we would now call an escape character: its purpose is to say that the symbol it is paired with is not treated normally, but in a special way.
Computer Scientists use escape characters in a variety of ways in programming languages as well as in scripting languages like HTML. Different languages use a different symbol as the escape character, though \ is popular (and very reminiscent of the Egyptian line!). One place escapes are used is to represent special characters in strings (sequences of characters like words or sentences) so they can be manipulated or printed. If I want my program to print a word like “not” then I must pass an appropriate string to the print command. I just put the three characters in quotation marks to show I mean the characters n then o then t. Simple.
However, the string “\no\t” does not similarly mean five characters \, n, o, \ and t. It still represents three characters, but this time \n, o and \t. \ is an escape character saying that the n and the t symbols that follow it are not really representing the n or t characters but instead stand for a newline (\n : which jumps to the next line) and a tab character (\t : which adds some space). “\no\t” therefore means newline o tab.
This begs the question what if you actually want to print a \ character! If you try to use it as it is, it just turns what comes after it into something else and disappears. The solution is simple. You escape it by preceding it with a \. \\ means a single \ character! So “n\\t” means n followed by an actual \ character followed by a t. The normal meaning of \ is to escape what follows. Its special meaning when it is escaped is just to be a normal character! Other characters’s meanings are inverted like this too, where the more natural meaning is the one you only get with an escape character. For example what if you want a program to print a quotation so use quotation marks. But quotation marks are used to show you are starting and ending a String. They already have another meaning. So if you want a string consisting of the five characters “, n, o, t and ” you might try to write “”not”” but that doesn’t work as the initial “” already makes a string, just one with no characters in it. The string has ended before you got to the n. Escape characters to the rescue. You need ” to mean something other than its “normal” meeting of starting or ending a string so just escape it inside the string and write “\”not\””.
Once you get used to it, escaping characters is actually very simple, but is easy to find confusing when first learning to program. It is not surprising those trying to decipher hieroglyphs struggled so much as escapes were only one of the problems they had to contend with.
More on …
Related Magazines …
This blog is funded through EPSRC grant EP/W033615/1.