Silly Sequences and Data Compression

A spiral fractal image — Image by Charles Thonney from Pixabay

Here is a fun challenge for you: can you find the rule for and next term of this sequence?

This is not just some maths exercise – it also gives a clue to a clever way of compressing data! Have you figured it out? What if I told you this is called the “look-and-say” sequence? Say each row out loud in turn…

Here’s the answer: the next term in the sequence is 13112221. Each term describes the previous one. The last row, 312211, read aloud becomes “one three, one one, two twos, two ones”, which when written down with numbers is 13112221!

The sequence doesn’t necessarily need to start with 1: try picking a different starting number (a seed) and see where you end up!

If you’re perplexed with this sequence, you’re in good company. The look-and-say sequence, while not invented by him, was analysed in-depth by the mathematician John Conway after it was introduced to him at a party: a party where maths is discussed openly – sign me up!

Conway, is famous for his Game of Life [EXTERNAL]: a cellular automaton which uses a few simple rules to create seemingly living patterns. We won’t go into detail on this here, but interestingly though, if you look more at Conway’s Game of Life, you might start to see some similar features – compare the look-and-say sequence starting 22… and a 2×2 block on the Life grid.

In his investigations, Conway discovered some very interesting properties of the look-and-say sequence. One of these is the interesting fact that no digits other than 1, 2, and 3 will appear in the sequence (unless the seed number contains such a digit or a run of more than three of the same digit).

Take the sequence above starting 1, 11, etc. – this will never contain a digit 4 or above. Can you figure out why?

We can understand this by going backwards – what would we need to get a digit 4 in one of the terms in the sequence? Well, we’d need to have four of the same digit in a row (e.g. 1111). But this is impossible, because the number which would generate this would be 11 (read as “one one followed by one one” so written in the next round as 1111 as we want). However 11 if generated is actually written in the next term as 21 (“two ones”) not 1111.

Compress it?

You’re perhaps wondering how this links into computer science? Imagine a black-and-white image stored with 0s and 1s where 0 is a white pixel and 1 is a black pixel: 0000111111001111… Storing this as a long sequence might seem a bit inefficient though, so let’s try applying a look-and-say methodology to this.

We would end up with 40 (“four zeros”), 61, 20, 41 or written in full 40612041. Using just two digits to store data is especially convenient, as since we know that the 1s and 0s will always alternate once compressed, we can even remove them and just store the count of each: 4624…, a much shorter sequence to store compared to our original.

This style of compression is called Run-Length Encoding and is especially useful when you have files with long sequences of identical data (like the black-and-white binary image in our example).

Of course, it’s not universally efficient. As we’ve seen with the ever-growing look-and-say sequences, if the data doesn’t contain many repeating sequences, the file size may even get bigger: sometimes known as negative compression.

This is yet another example of how mathematics and computer science can go hand-in-hand to solve real problems! Keep looking out for interesting patterns – you never know what you might discover!

Daniel Gill, Queen Mary University of London

More on…

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

How did the zebra get its stripes?

Head of a fish with a distinctive stripy, spotty pattern — Image by geraldrose from Pixabay

There are many myths and stories about how different animals gained their distinctive patterns. In 1901, Rudyard Kipling wrote a “Just So Story” about how the leopard got its spots, for example. The myths are older than that though, such as a story told by the San people of Namibia (and others) of how the zebra got its stripes – during a fight with a baboon as a result of staggering through the baboon’s fire. These are just stories. It was a legendary computer scientist and mathematician, who was also interested in biology and chemistry, who worked out the actual way it happens.

Alan Turing is one of the most important figures in Computer Science having made monumental contributions to the subject, including what is now called the Turing Machine (giving a model of what a computer might be before they existed) and the Turing Test (kick-starting the field of Artificial Intelligence). Towards the end of his life, in the 1950s, he also made a major contribution to Biology. He came up with a mechanism that he believed could explain the stripy and spotty patterns of animals. He has largely been proved right. As a result those patterns are now called Turing Patterns. It is now the inspiration for a whole area of mathematical biology.

How animals come to have different patterns has long been a mystery. All sorts of animals from fish to butterflies have them though. How do different zebra cells “know” they ultimately need to develop into either black ones or white ones, in a consistent way so that stripes (not spots or no pattern at all) result, whereas leopard cells “know” they must grow into a creature with spots. They both start from similar groups of uniform cells without stripes or spots. How do some that end up in one place “know” to turn black and others ending up in another place “know” to turn white in such a consistent way?

There must be some physical process going on that makes it happen so that as cells multiply, the right ones grow or release pigments in the right places to give the right pattern for that animal. If there was no such process, animals would either have uniform colours or totally random patterns.

Mathematicians have always been interested in patterns. It is what maths is actually all about. And Alan Turing was a mathematician. However, he was a mathematician interested in computation, and he realised the stripy, spotty problem could be thought of as a computational kind of problem. Now we use computers to simulate all sorts or real phenomena, from the weather to how the universe formed, and in doing so we are thinking in the same kind of way. In doing this, we are turning a real, physical process into a virtual, computational one underpinned by maths. If the simulation gets it right then this gives evidence that our understanding of the process is accurate. This way of thinking has given us a whole new way to do science, as well as of thinking more generally (so a new kind of philosophy) and it starts with Alan Turing.

Back to stripes and spots. Turing realised it might all be explained by Chemistry and the processes that resulted from it. Thinking computationally he saw that you would get different patterns from the way chemicals react as they spread out (diffuse). He then worked out the mathematical equations that described those processes and suggested how computers could be used to explore the ideas.

Diffusion is just a way by which chemicals spread out. Imagine dropping some black ink onto some blotting paper. It starts as a drop in the middle, but gradually the black spreads out in an increasing circle until there is not enough to spread further. The expanding circle stops. Now, suppose that instead of just ink we have a chemical (let’s call it BLACK, after its colour), that as it spreads it also creates more of itself. Now, BLACK will gradually uniformly spread out everywhere. So far, so expected. You would not expect spots or stripes to appear!

Next, however, let’s consider what Turing thought about. What happens if that chemical BLACK produces another chemical WHITE as well as more BLACK? Now, starting with a drop of BLACK, as it spreads out, it creates both more BLACK to spread further, but also WHITE chemicals as well. Gradually they both spread. If the chemicals don’t interact then you would end up with BLACK and WHITE mixed everywhere in a uniform way leading to a uniform greyness. Again no spots or stripes. Having patterns appear still seems to be a mystery.

However, suppose instead that the presence of the WHITE chemical actually stops BLACK creating more of itself in that region. Anywhere WHITE becomes concentrated gets to stays WHITE. If WHITE spreads (ie diffuses) faster than BLACK then it spreads to places first that become WHITE with BLACK suppressed there. However, no new BLACK leads to no more new WHITE to spread further. Where there is already BLACK, however, it continue to create more BLACK leading to areas that become solid BLACK. Over time they spread around and beyond the white areas that stopped spreading and also create new WHITE that again spreads faster. The result is a pattern. What kind of pattern depends on the speed of the chemical reactions and how quickly each chemical diffuses, but where those are the same because it is the same chemicals the same kind of pattern will result: zebras will end up with stripes and leopards with spots.

This is now called a Turing pattern and the process is called a reaction-diffusion system. It gives a way that patterns can emerge from uniformity. It doesn’t just apply to chemicals spreading but to cells multiplying and creating different proteins. Detailed studies have shown it is the mechanism in play in a variety of animals that leads to their patterns. It also, as Alan Turing suggested, provides a basis to explain the way the different shapes of animals develop despite starting from identical cells. This is called morphogenesis. Reaction-diffusion systems have also been suggested as the mechanism behind how other things occur in the natural world, such as how fingerprints develop. Despite being ignored for decades, Turing’s theory now provides a foundation for the idea of mathematical biology. It has spawned a whole new discipline within biology, showing how maths and computation can support our understanding of the natural world. Not something that the writers of all those myths and stories ever managed.

– Paul Curzon, Queen Mary University of London

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

Lego computer science: What is computation? (simple cellular automata)

Continuing a series of blogs on what to do with all that lego scattered over the floor: learn some computer science…what is computation? Using binary.

We’ve been focussing on representing data so far but data on its own doesn’t do a lot. It is when you combine it with computation that things get exciting and suddenly you have something that can change the world. But what is computation? We will start to explore computation using something called cellular automata. They are just one simple way to do computation (of many).

We have seen that a data representation is just a way of storing information using symbols. It just gives meaning to otherwise arbitrary symbols. Those 1s and 0s, red blocks and blue blocks, Xs, Vs and Is could mean anything. Indeed at different times they mean different things: sometimes a particular group of 1s and 0s stand for a number, sometimes the colour of a pixel, sometimes a letter. So symbols become interesting when we give them meanings (and that is an important point to remember).

Computation is also about symbols, but about manipulating them using sets of rules. What do the rules do? Given one or more symbols they tell you to swap those symbols for new symbols. To do computation you just repeatedly apply a given set of rules, starting with some starting symbols and the symbols change and then change again and then change again …

Elementary Cellular Automata

Cellular automata are just a particular kind of rules that apply to grids of symbols (called cells). They were invented by one of the great original computer scientists, John von Neumann along with Stanislaw Ulam in the 1940s.

Elementary cellular automata, which we will look at here, are a simple version where you just have a row of cells (so a row of symbols). There are only two symbols allowed, usually 1 and 0. We will of course use lego blocks as our two symbols instead: a red brick for 1 and a blue brick for 0. A particular row of red and blue bricks is called the state. The rules change the colour of the bricks in the row and so change the state of the cellular automata. Here is an example state of such a ‘machine’ where the rows are 16 bricks (symbols) long (essentially the memory of the machine will be 16 bits long):

An initial state in lego bricks: a pattern of red and blue bricks. — An example cellular automaton state consisting of 16 symbols. Traditionally cellular automata have symbols 0 and 1. We use a red block to mean a 1 and a blue block to mean a 0. Image by CS4FN

Rules

One rule RED-B:LUE-BLUE -> RED — A rule that says if we have RED-BLUE-BLUE then change the middle cell to a RED block.
Image by CS4FN

Now if we are going to do computation, we need rules (essentially a program) to apply that changes the state. The rules of an elementary cellular automaton like this are applied to each lego brick, changing it to a new lego brick. To do so they take the brick on either side into account though. Each rule therefore looks at three bricks at a time and changes (or not) the middle brick.

We can write out the rules using lego bricks too – saying what to do for each pattern of three lego blocks. So we could have a rule that if we have a triple RED-BLUE-BLUE then we change the middle of that triple to RED so that the triple becomes RED-RED-BLUE instead. In lego we could represent this rule as shown right, where we show the new value for the middle cell that changes. (Notice we are now using lego bricks, so symbols, to represent rules: a rule is just data too!)

Now a vital thing about rules for computation is that you MUST give a rule for every possibility. Our above rule only tells us what to do for one of the eight possibilities of those triples of bricks that might occur in each position. We must give 8 different rules, so that whatever pattern we come across we have a rule that says what to do.

Here is one possible set of 8 rules we could use:

A set of rules to define how a cellular automaton will behave.
Image by CS4FN

Altogether, there are 256 different possible sets of rules like this.

Notice that we have ordered our rules using a binary pattern of the triples counting from 0 to 7 as a way to make sure we have covered every possible pattern exactly once and to make it easier to find the right rule. We could write then in any order of course. It would make no difference to what the rules do.

Doing Computation

Now to do computation we just apply the rules we have chosen to every position in an initial state – an initial pattern of red and blue blocks. We start at one end of the row and apply the set of rules in each position finding the one that matches the pattern at that position. Once we find the rule that matches that position, we note the new middle block accordingly, then move on to the next position. Once we get to the end of the row, we know what the whole new state for the automaton will be: we have done one step of computation. For the cell at either end of the row, assume its adjacent value off the end is 0 (so blue for us). At every position the rule applies to the original triple of bricks in the current state, not ones changed by rules applied to other positions: the rules are applied to every position at the same time.

The easiest way to do this with lego is to line up a row of red and blue lego blocks as the initial state and apply our rules as above to get a new pattern of red and blue lego blocks placed below it. That new pattern is the new state of the machine, Here is a step as applied to our random state we gave above.

Applying the rules to a random starting state — Image by CS4FN

Calculating Number Sequences

We seem to be just replacing patterns by new patterns. Are we doing anything useful? Of course we could give some simple meaning to these patterns. Interpret the pattern as a binary number and what is happening? We are generating a number sequence. To see this use the above rules on a shorter pattern, starting with a single red lego block at the left hand end, with the rest blue. This is the binary for the number 1 (00001). Apply the rules and we get the number 2 (00010). Apply the rules again and the pattern of lego turns into the binary for 5 (00101), then 8 (01000) and then 20 (10100) and so on…

The series of transformations through binary patterns from applying the rules. — Image by CS4FN

We have created a machine that does a calculation on a number to create a new number. Let it run and it calculates the whole number sequence. Different rules will compute different number sequences: some perhaps more interesting than others.

Images from numbers

If you think numbers are a bit boring, then instead just give a different meaning to the patterns – as giving the colour of pixels, with each new state giving the next row of an infinite lego pixel picture. Now our rules are generating art. Each rule set will compute a different image as will different starting states (again some images generated will be more interesting than others). Here is what our above rules generate if we start with a single red brick in the centre:

The top of a Sierpiński triangle as generated in lego bricks from out rules. — The image generated by our rule if we see it as rules to generate the next line of a lego art image.
Image by CS4FN

Sierpinski triangle image CC BY-SA 3.0 via Wikimedia

This is actually a fractal pattern called the Sierpiński triangle. It contains the same triangular pattern over and over again, and If you create a massive version of it on a large lego board you will see that each triangle has the same pattern within it. It is a beautiful recursive pattern.

Apply the rules and create a Lego pixel version yourself.

Explore the different rules

Stephen Wolfram has exhaustively explored all the elementary cellular automata, categorising them and describing their properties. However, that is no reason not to explore them yourself, whether with lego, on graph paper or by writing a program to apply the rules for you.

Of course you do not have to stick to only automata with 2 symbols. Add more symbols / colours of lego blocks (so you will need lots more rules in each set) and explore some more.

There is one cellular automaton, so one rule set (with only two symbols) that is very intriguing. It turns out that, rather than just generate a particular number sequences or pattern as the one above does, it can do absolutely any computation – it is a general purpose machine that can do anything that a modern computer can do…but that is another story.

	The Hidden Code in Y… on Only the fittest slogans …
	Music AI Kriss Kross… on The day the music didn’t …
	Music AI Kriss Kross… on Separate your stems
	Musical Algorithms… on You’ll be Bach! – create…
	The art of animatron… on I’m (not) a little …

Category: Cellular Automata