Double or nothing: an extra copy of your software, just in case

by Paul Curzon, Queen Mary University of London

Ariane 5 on the launchpad
Ariane 5 on the launch pad. Photo Credit: (NASA/Chris Gunn) Public Domain via Wikimedia Commons.

If you spent billions of dollars on a gadget you’d probably like it to last more than a minute before it blows up. That’s what happened to a European Space Agency rocket. How do you make sure the worst doesn’t happen to you? How do you make machines reliable?

A powerful way to improve reliability is to use redundancy: double things up. A plane with four engines can keep flying if one fails. Worried about a flat tyre? You carry a spare in the boot. These situations are about making physical parts reliable. Most machines are a combination of hardware and software though. What about software redundancy?

You can have spare copies of software too. Rather than a single version of a program you can have several copies running on different machines. If one program goes wrong another can take over. It would be nice if it was that simple, but software is different to hardware. Two identical programs will fail in the same way at the same time: they are both following the same instructions so if one goes wrong the other will too. That was vividly shown by the maiden flight of the Ariane 5 rocket. Less than 40 seconds from launch things went wrong. The problem was to do with a big number that needed 64 bits of storage space to hold it. The program’s instructions moved it to a storage place with only 16 bits. With not enough space, the number was mangled to fit. That led to calculations by its guidance system going wrong. The rocket veered off course and exploded. The program was duplicated, but both versions were the same so both agreed on the same wrong answers. Seven billion dollars went up in smoke.

Can you get round this? One solution is to get different teams to write programs to do the same thing. The separate teams may make mistakes but surely they won’t all get the same thing wrong! Run them on different machines and let them vote on what to do. Then as long as more than half agree on the right answer the system as a whole will do the right thing. That’s the theory anyway. Unfortunately in practice it doesn’t always work. Nancy Leveson, an expert in software safety from MIT, ran an experiment where different programmers were given programs to write. She found they wrote code that gave the same wrong answers. Even if it had used independently written redundant code it’s still possible Ariane 5 would have exploded.

Redundancy is a big help but it can’t guarantee software works correctly. When designing systems to be highly reliable you have to assume things will still go wrong. You must still have ways to check for problems and to deal with them so that a mistake (whether by human or machine) won’t turn into a disaster.


Related Magazine …


Further reading


Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

CS4FN Advent 2023 – Day 1: Woolly jumpers, knitting and coding

Welcome to the first ‘window’ of the CS4FN Christmas Computing Advent Calendar. The picture on the ‘box’ was a woolly jumper with a message in binary, three letters on the jumper itself and another letter split across the arms. Can you work out what it says? (Answer at the end).

Come back tomorrow for the next instalment in our Advent series.

Cartoon of a green woolly Christmas jumper with some knitted stars and a message “knitted” in binary (zeroes and ones). Also the symbol for wifi on the cuffs. Image drawn and digitised by Jo Brodie.

Wrap up warmly with our first festive CS4FN article, from Karen Shoop, which is all about the links between knitting patterns and computer code. Find out about regular expressions in her article: Knitters and Coders: separated at birth?

Click above to read Karen’s article

Image credit: Regular Expressions by xkcd

Further reading

Dickens Knitting in Code – this CS4FN article, by Paul Curzon, is about Charles Dickens’ book A Tale of Two Cities. One of the characters, Madame Defarge, takes coding to the next level by encoding hidden information into her knitting, something known as steganography (basically hiding information in plain sight). We have some more information on the history of steganography and how it is used in computing in this CS4FN article: Hiding in Elizabethan binary.

In Craft, Culture, and Code Shuchi Grover also considers the links between coding and knitting, writing that “few non-programming activities have such a close parallels to coding as knitting/crocheting” (see section 4 in particular, which talks about syntax, decomposition, subroutines, debugging and algorithms).

Something to print and colour in

This is a Christmas-themed thing you might enjoy eating, if you’ve any room left of course. Puzzle solution tomorrow. This was designed by Elaine Huen.

Solving the Christmas jumper code

The jumper’s binary reads

01011000

01001101

01000001

01010011

What four letters might be being spelled out here? Each binary number represents one letter and you can find out what each letter is by looking at this binary-to-letters translator. Have a go at working out the word using the translator (but the answer is at the end of this post).

Keep scrolling

Bit more

The Christmas jumper says… XMAS


Advert for our Advent calendar
Click the tree to visit our CS4FN Christmas Computing Advent Calendar

EPSRC supports this blog through research grant EP/W033615/1.