Lego computer science: representing numbers using position

Numbers represented with different sized common blocks

Continuing a series of blogs on what to do with all that lego scattered over the floor: learn some computer science…how do we represent numbers and how is it related to the representation Charles Babbage used in his design for a Victorian steam-powered computer?

We’ve seen there are lots of ways that human societies have represented numbers and that there are many ways we could represent numbers even just using lego. Computers store numbers using a different representation again called binary. Before we get to that though we need to understand how we represent bigger numbers ourselves and why it is so useful.

Numbers represented as colours.

Our number system was invented in India somewhere before the 4th century. It then spread, including to the west, via muslim scholars in Persia by the 9th century, so is called the Hindu-Arabic numeral system. Its most famous advocate was Muḥammad ibn Mūsā al-Khwārizmī. The word algorithm comes from the latin version of his name because of his book on algorithms for doing arithmetic with Hindu-arabic numbers.

The really clever thing about it is the core idea that a digit can have a different value depending on its position. In the number 555, for example, the digit 5 is representing the number five hundred, the number fifty and the number five. Those three numbers are added together to give the actual number being represented. Digit in the ‘ones’ column keep their value, those in the ‘tens’ column are ten times bigger, those in the ‘hundreds column a hundred times bigger than the digit, and so on. This was revolutionary differing from most previous systems where a different symbol was used for bigger number, and each symbol always meant the same thing. For example, in Roman numerals X is used to mean 10 and always means 10 wherever it occurs in a number. This kind of positional system wasn’t totally unique as the Babylonians had used a less sophisticated version and Archimedes also came up with a similar idea, those these systems didn’t get used elsewhere.

In the lego representations of numbers we have seen so far, to represent big numbers we would need ever more coloured blocks, or ever more different kinds of brick or ever bigger piles of bricks, to give a representation of those bigger numbers. It just doesn’t scale. However, this idea of position-valued numbers can be applied whatever the representation of digits used, not just with digits 0 to 9. So we can use the place number system to represent ever bigger numbers using our different versions of the way digits could be represented in lego. We only need symbols for the different digits, not for every number, of for every bigger numbers.

For example, if we have ten different colours of bricks to represent the 10 digits of our decimal system, we can build any number by just placing them in the right position, placing coloured bricks on a base piece.

The number 2301 represented in coloured blocks where black represents 0, red represents 1, blue represents 2 and where yellow represents 3

Numbers could be variable sized or fixed size. If as above we have a base plate, and so storage space, for four digits then we can’t represent larger numbers than 9999. This is what happens with the way computers store numbers. A fixed amount of space is allocated for each number in the computer’s memory, and if a number needs more digits then we get an “overflow error” as it can’t be stored. Rockets worth millions of pounds have exploded on take-off in the past because a programmer made the mistake of trying to store numbers too big for the space allocated for them. If we want bigger numbers, we need a representation (and algorithms) that extend the size of the number if we run out of space. In lego that means our algorithm for dealing with numbers would have to include extending the grey base plate by adding a new piece when needed (and removing it when no longer needed). That then would allow us to add new digits.

Unlike when we write numbers, where we write just as many digits as we need, with fixed-sized numbers like this, we need to add zeros on the end to fill the space. There is no such thing as an empty piece of storage in a computer. Something is always there! So the number 123 is actually stored as 0123 in a fixed 4-digit representation like our lego one.

The number 321 represented in coloured blocks where space is allocated for 4 digits as 0321: black represents 0, red represents 1, blue represents 2 and where yellow represents 3

Charles Babbage made use of this idea when inventing his Victorian machines for doing computation: had they been built would have been the first computers. Driven by steam power his difference engine and analytical engine were to have digits represented by wheels with the numbers 0-9 written round the edge, linked to the positions of cog-like teeth that turned them.

Wheels were to be stacked on top of each other to represent larger numbers in a vertical rather than horizontal position system. The equivalent lego version to Babbage’s would therefore not have blocks on a base plate but blocks stacked on top of each other.

The number 321 represented vertically in coloured blocks where space is allocated for 4 digits as 0321: black represents 0, red represents 1, blue represents 2 and where yellow represents 3

In Babbage’s machines different numbers were represented by their own column of wheels. He envisioned the analytical engine to have a room sized data store full of such columns of wheels.

Numbers stored as columns of wheels on the replica of Babbage’s Difference Engine at the Science Museum London. Carsten Ullrich: CC-BY-SA-2.5. From wikimedia.

So Babbage’s idea was just to use our decimal system with digits represented with wheels. Modern computers instead use binary … bit that is for next time.

This post was funded by UKRI, through grant EP/K040251/2 held by Professor Ursula Martin, and forms part of a broader project on the development and impact of computing.


Lego Computer Science

Part 1: Lego Computer Science: pixel pictures

Part 2: Lego Computer Science: compression algorithms

Part 3: Lego Computer Science: representing numbers

Part 4: Lego Computer Science: representing numbers using position

Lego computer science: representing numbers

Continuing a series of blogs on what to do with all that lego scattered over the floor: learn some computer science…what does number representation mean?

We’ve seen some different ways to represent images and how ultimately they can be represented as numbers but how about numbers themselves. We talk as though computers can store numbers as numbers but even they are represented in terms of simpler things in computers.

Lego numbers

But first what do we mean by a number and a representation of a number? If I told you to make the numbers 0 to 9 in lego (go on have a go) you may well make something like this…

But those symbols 0, 1, 2, … are just that. They are symbols representing numbers not the numbers themselves. They are arbitrary choices. Different cultures past and present use different symbols to mean the same thing. For example, the ancient Egyptian way of writing the number 1000 was a hieroglyph of a water lily. (Perhaps you can make that in lego!)

The ancient Egyptian way to write 1000 was a hieroglyph of a waterlily

What really are numbers? What is the symbol 2 standing for? It represents the abstract idea of twoness ie any collection, group or pile of two things: 2 pieces of lego, 2 ducks, 2 sprouts, … and what is twoness? … it is oneness with one more thing added to the pile. So if you want to get closer to the actual numbers then a closer representation using lego might be a single brick, two bricks, three bricks, … put together in any way you like.

Numbers represented by that number of lego bricks

Another way would to use different sizes of bricks for them. Use a lego brick with a single stud for 1, a 2-stud brick for two and so on (combining bricks where you don’t have a single piece with the right number of studs). In these versions 0 is the absence of anything just like the real zero.

Lego bricks representing numbers based on the number of studs showing.

Once we do it in bricks it is just another representation though – a symbol of the actual thing. You can actually use any symbols as long as you decide the meaning in advance, there doesn’t actually have to be any element of twoness in the symbol for two. What other ways can you think of representing numbers 0 to 9 in lego? Make them…

A more abstract set of symbols would be to use different coloured bricks – red for 1, blue for 2 and so on. Now 0 can have a direct symbol like a black brick. Now as long as it is the right colour any brick would do. Any sized red brick can still mean 1 (if we want it to). Notice we are now doing the opposite of what we did with images. Instead of representing a colour with a number, we are representing a number with a colour.

Numbers represented as colours.

Here is a different representation. A one stud brick means 1, a 2-stud brick means 2, a square 4 stud brick means 3, a rectangular 6 stud brick means 4 and so on. As long as we agreed that is what they mean it is fine. Whatever representation we choose it is just a convention that we have to then be consistent about and agree with others.

Numbers represented by increasing sized blocks

What has this to do with computing? Well if we are going to write algorithms to work with numbers, we need a way to store and so represent numbers. More fundamentally though, computation (and so at its core computer science) really is all about symbol manipulation. That is what computational devices (like computers) do. They just manipulate symbols using algorithms. We will see this more clearly when we get to creating a simple computer (a Turing Machine) out of lego (but that is for later).

We interpret the symbols in the inputs of computers and the symbols in the outputs with meanings and as a result they tell us things we wanted to know. So if we key the symbols 12+13= into a calculator or computer and it gives us back 25, what has happened is just that it has followed some rules (an algorithm for addition) that manipulated those input symbols and made it spew out the output symbols. It has no idea what they mean as it is just blindly following its rules about how to manipulate symbols. We also could have used absolutely any symbols for the numbers and operators as long as they were the ones the computer was programmed to manipulate. We are the ones that add the intelligence and give those symbols meanings of numbers and addition and the result of doing an addition.

This is why representations are important – we need to choose a representation for things that makes the symbol manipulation we intend to do easy. We already saw this with images. If we want to send a large image to someone else then a representation of images like run-length encoding that shrinks the amount of data is a good idea.

When designing computers we need to provide them with a representation of numbers so they can manipulate those numbers. We have seen that there are lots of representations we could choose for numbers and any in theory would do, but when we choose a representation of numbers for use to do computation, we want to pick one that makes the operations we are interested in doing easy. Charles Babbage for example chose to use cog-like wheels turned to particular positions to represent numbers as he had worked out how to create a mechanism to do calculation with them. But that is something for another time…


This post was funded by UKRI, through grant EP/K040251/2 held by Professor Ursula Martin, and forms part of a broader project on the development and impact of computing.


Lego Computer Science

Part 1: Lego Computer Science: pixel pictures

Part 2: Lego Computer Science: compression algorithms

Part 3: Lego Computer Science: representing numbers

Lego computer science: compression algorithms

Continuing a series of blogs on what to do with all that lego scattered over the floor: learn some computer science…

A giraffe as a pixel image.
Colour look-up table
Black 0
Blue 1
Yellow 2
Green 3
Brown 4

We saw in the last post how images are stored as pixels – the equivalent of square or round lego blocks of different colours laid out in a grid like a mosaic. By giving each colour a number and drawing out a gird of numbers we give ourself a map to recreate the picture from. Turning that grid of numbers into a list (and knowing the size of the rectangle that is the image) we can store the image as a file of numbers, and send it to someone else to recreate.

Of course, we didn’t really need that grid of numbers at all as it is the list we really need. A different (possibly quicker) way to create the list of numbers is work through the picture a brick at a time, row by row and find a brick of the same colour. Then make a long line of those bricks matching the ones in the lego image, keeping them in the same order as in the image. That long line of bricks is a different representation of the image as a list instead of as a grid. As long as we keep the bricks in order we can regenerate the image. By writing down the number of the colour of each brick we can turn the list of bricks into another representation – the list of numbers. Again the original lego image can be recreated from the numbers.

The image as a list of bricks and numbers
Colour look-up table: Black 0: Blue 1: Yellow 2: Green 3: Brown 4

The trouble with this is for any decent size image it is a long list of numbers – made very obvious by the very long line of lego bricks now covering your living room floor. There is an easy thing to do to make them take less space. Often you will see that there is a run of the same coloured lego bricks in the line. So when putting them out, stack adjacent bricks of the same colour together in a pile, only starting a new pile if the bricks change colour. If eventually we get to more bricks of the original colour, they start their own new pile. This allows the line of bricks to take up far less space on the floor. (We have essentially compressed our image – made it take less storage space, at least here less floor space).

Now when we create the list of numbers (so we can share the image, or pack all the lego away but still be able to recreate the image), we count how many bricks are in each pile. We can then write out a list to represent the numbers something like 7 blue, 1 green, … Of course we can replace the colours by numbers that represent them too using our key that gives a number to each colour (as above).

If we are using 1 to mean blue and the line of bricks starts with a pile of seven black bricks then write down a pair of numbers 7 1 to mean “a pile of seven blue bricks”. If this is followed by 1 green bricks with 3 being used for green then we next write down 1 3, to mean a pile of 1 green bricks and so on. As long as there are lots of runs of bricks (pixels) of the same colour then this will use far less numbers to store than the original:

7 1 1 3 6 1 2 3 1 1 1 2 3 1 2 3 2 2 3 1 2 3 …

We have compressed our image file and it will now be much quicker to send to a friend. The picture can still be rebuilt though as we have not lost any information at all in doing this (it is called a lossless data compression algorithm). The actual algorithm we have been following is called run-length encoding.

Of course, for some images, it may take more not less numbers if the picture changes colour nearly every brick (as in the middle of our giraffe picture). However, as long as there are large patches of similar colours then it will do better.

There are always tweaks you can do to algorithms that may improve the algorithm in some circumstances. For example in the above we jumped back to the start of the row when we got to the end. An alternative would be to snake down the image, working along the adjacent rows in opposite directions. That could improve run-length encoding for some images because patches of colour are likely the same as the row below, so this may allow us to continue some runs. Perhaps you can come up with other ways to make a better image compression algorithm

Run-length encoding is a very simple compression algorithm but it shows how the same information can be stored using a different representation in a way that takes up less space (so can be shared more quickly) – and that is what compression is all about. Other more complex compression algorithms use this algorithm as one element of the full algorithm.

Activities

Make this picture in lego (or colouring in on squared paper or in a spreadsheet if you don’t have the lego). Then convert it to a representation consisting of a line of piles of bricks and then create the compressed numbered list.

An image of a camel to compress: Colour look-up table: Black 0: Blue 1: Yellow 2: Green 3: Brown 4

Make your own lego images, encode and compress them and send the list of numbers to a friend to recreate.


Find more about Lego Art at lego.com.

Find more pixel puzzles (no lego needed, just coloured pens or spreadsheets) at https://teachinglondoncomputing.org/pixel-puzzles/


This post was funded by UKRI, through grant EP/K040251/2 held by Professor Ursula Martin, and forms part of a broader project on the development and impact of computing.

CS4FN Advent – Day 1 – Woolly jumpers, knitting and coding

Welcome to the first ‘window’ of the CS4FN Christmas Computing Advent Calendar. The picture on the ‘box’ was a woolly jumper with a message in binary, three letters on the jumper itself and another letter split across the arms. Can you work out what it says? (Answer at the end).

Come back tomorrow for the next instalment in our Advent series.

Cartoon of a green woolly Christmas jumper with some knitted stars and a message “knitted” in binary (zeroes and ones). Also the symbol for wifi on the cuffs.

 

Wrap up warm with our first festive CS4FN article, from Karen Shoop, which is all about the links between knitting patterns and computer code. Find out about regular expressions in her article.

Knitters and Coders:
separated at birth?

 

Image credit: Regular Expressions by xkcd

Further reading

Dickens Knitting in Code – this CS4FN article, by Paul Curzon, is about Charles Dickens’ book A Tale of Two Cities. One of the characters, Madame Defarge, takes coding to the next level by encoding hidden information into her knitting, something known as steganography (basically hiding information in plain sight). We have some more information on the history of steganography and how it is used in computing in this CS4FN article: Hiding in Elizabethan binary.

In Craft, Culture, and Code Shuchi Grover also considers the links between coding and knitting, writing that “few non-programming activities have such a close parallels to coding as knitting/crocheting” (see section 4 in particular, which talks about syntax, decomposition, subroutines, debugging and algorithms).

Something to print and colour in

This is a Christmas-themed thing you might enjoy eating, if you’ve any room left of course. Puzzle solution tomorrow. This was designed by Elaine Huen.

 

Solving the Christmas jumper code

The jumper’s binary reads

01011000

01001101

01000001

01010011

What four letters might be being spelled out here? Each binary number represents one letter and you can find out what each letter is by looking at this binary-to-letters translator. Have a go at working out the word using the translator (but the answer is at the end of this post).

 

 

 

 

 

 

 

Keep scrolling

 

 

 

 

 

 

 

Bit more

 

 

 

 

 

 

 

 

The Christmas jumper says… XMAS

Hiding in Elizabethan Binary

The great Tudor and Stuart philosopher Sir Francis Bacon was a scientist, a statesman and an author. He was also a pretty decent computer scientist. In the year of the Gunpowder plot, he published a new form of cipher, now called Bacon’s Cipher, invented when he was a teenager. Its core idea is the foundation for the way all messages are stored in computers today.

From Pixabay

The Tudor and Stuart eras were a time of plot and intrigue. Perhaps the most famous is the 1605 Gunpowder plot where Guy Fawkes tried to assassinate King James I by blowing up the Houses of Parliament. Secrets mattered! In his youth Bacon had worked as a secret agent for Elizabeth I’s spy chief, Walsingham, so knew all about ciphers. Not content with using those that existed he invented his own. The one he is best remembered for was actually both a cipher and a form of steganography. While a cipher aims to make a message unreadable, steganography is the science of secret writing: disguising messages so no one but the recipient knows there is a message there at all.

A Cipher …

Bacon’s method came in two parts. The first was a substitution cipher, where different symbols are substituted for each letter of the alphabet in the message. This idea dates back to Roman times. Julius Caesar used a version, substituting each letter for a letter from a fixed number of places down the alphabet (so A becomes E, B becomes F, and so on). Bacon’s key idea was to replace each letter of the alphabet with, not a number or letter, but it’s own series of a’s and b’s (see the cipher table). The Elizabethan alphabet actually had only 24 letters so I and J have the same code as do U and V as they were interchangeable (J was the capital letter version of i and similarly for U and v).

In Bacon’s cipher everything is encoded in two symbols, so it is a binary encoding. The letters a and b are arbitrary. Today we would use 0 and 1. This is the first use of binary as a way to encode letters (in the West at least). Today all text stored in computers is represented in this way – though the codes are different – it is all Unicode is. It allocates each character in the alphabet with a binary pattern used to represent it in the computer. When the characters are to be displayed, the computer program just looks up which graphic pattern (the actual symbol as drawn) is linked to that binary pattern in the code being used. Unicode gives a binary pattern for every symbol in every human language (and some alien ones like Klingon).

Steganography

The second part of Bacon’s cipher system was Steganography. Steganography dates back to at least the Greeks, who supposedly tattooed messages on the shaved heads of slaves, then let their hair grow back before sending them as both messenger and message. The binary encoding of Bacon’s cipher was vital to make his steganography algorithm possible. However, the message was not actually written as a’s and b’s. Bacon realised that two symbols could stand for any two things. If you could make the difference hard to spot, you could hide the messages. Bacon invented two ways of handwriting each letter of the alphabet – two fonts. An ‘a’ in the encoded message meant use one font and a ‘b’ meant use the other. The secret message could then be hidden inside an innocent one. The letters written were no longer the message, the message was in the font used. As Bacon noted, once you have the message in binary you could think of other ways to hide it. One way used was with capital and lower-case letters, though only using the first letter of words to make it less obvious.

Suppose you wanted to hide the message “no” in the innocuous message ‘hello world’. The message ‘no’ becomes ‘abbaa abbab’. So far this is just a substitution cipher. Next we hide it in, ‘hello world’. Two different kinds of fonts are those with curls on the tails of letters known as serif fonts and like this one and those without curls known as sans serif fonts and like this one. We can use a sans serif font to represent an ‘a’ in the coded message, and a serif font to represent ‘b’. We just alternate the fonts following the pattern of the a’s and b’s: ‘abbaa abbab’. The message becomes

sans serif, serif, serif, sans serif, sans serif,
sans serif, serif, serif, sans serif, serif.

Using those fonts for our message we get the final mixed font message to send:

Bacon the polymath

Bacon is perhaps best known as one of the principal advocates for rigorous science as a way of building up knowledge. He argued that scientists needed to do more than just come up with theories of how the world worked, and also guard against just seeing the results that matched their theories. He argued knowledge should be based on careful, repeated observation. This approach is the basis of the Scientific Method and one of the foundation stones of modern science.

Bacon was also a famous writer of the time, and one of many authors who has since been suggested as the person who wrote William Shakespeare’s plays. In his case it is because they claim to have found secret messages hidden in the plays in Bacon’s code. The idea that someone else wrote Shakespeare’s plays actually started just because some upper class folk with a lack of imagination couldn’t believe a person from a humble background could turn themselves into a genius. How wrong they were!

– Paul Curzon, Queen Mary University of London, Autumn 2017