The cure that just folds away: understanding protein folding to tackle diseases, and how computers (and people) can help ^JB

by Paul Curzon, Queen Mary University of London.
This article was originally published on CS4FN.

Biologists want you to play games in the name of science. A group of researchers at the University of Washington have invented a computer game, Foldit, in which you have to pack what looks like a 3D nest of noodles and elastics into the smallest possible space. You drag, turn and squeeze the noodles until they’re packed in tight. You compete against others, and as you get better you can rise through the ranks of competitors around the world. How can that help science? It’s because the big 3D jumbles represent models of proteins, and figuring out how proteins fold themselves up is one of the biggest problems in biology. Knowing more about how they do it could help researchers design cures for some of the world’s deadliest diseases.

The perfect fit

Proteins are in every cell in your body. They help you digest your food, send signals through your brain, and fight infection. They’re made of small molecules called amino acids. It’s easy for scientists to figure out what amino acids go together to make up a protein, but it’s incredibly difficult to figure out the shape they make when they do it. That’s a shame, because the shape of a protein is what makes it able to do its job. Proteins act by binding on to other molecules – for example, a protein called haemoglobin carries oxygen around our blood. The shape of the haemoglobin molecule has to fit the shape of the oxygen molecule like a lock and key. The close tie between form and function means that if you could figure out the shape that a particular protein folds into, you would know a lot about the jobs it can do.

Completely complex

Tantrix rotation puzzle

Protein folding is part of a group of problems that are an old nemesis of computer scientists. It’s what’s known as an NP-complete problem. That’s a mathematical term that means it appears there’s no shortcut to calculating the answer to a problem. You just have to try every different possible answer before you arrive at the right one. There are other problems like this, like the Tantrix rotation puzzle. Because a computer would have to check through every possible answer, the more complex the problem is the longer it will take. Protein folding is particularly complex – an average-sized protein contains about 100 amino acids, which means it would take a computer a billion billion billion years to figure out. So a shortcut would be nice then.

Puzzling out a cure

Obviously the proteins themselves have found a shortcut. They fold up all the time without having to have computers figure it out for them. In order to get to the bottom of how they do it, though, scientists are hoping that human beings might provide a shortcut. Humans love puzzles, and we’re awfully good at visual ones. Our good visual sense means we see patterns everywhere, and we can easily develop a ‘feel’ for how to use those patterns to solve problems. We use that sense when we play games like chess or Go. The scientists behind Foldit reckon that if it turns out that humans really are more efficient at solving protein folding problems, we can teach some of our tricks to computers.

HIV-1 proteasean illustration showing the folded shape of a protein used by HIV, created by ‘Boghog’ in 2008, via Wikipedia.

If there were an efficient way to work out protein structure, it could be a huge boon to medicine. Diseases depend on proteins too, and lots of drugs work by targeting the business end of those proteins. HIV uses two proteins to infect people and replicate itself, so drugs disrupt the workings of those proteins. Cancer, on the other hand, damages helpful proteins. If scientists understood how proteins fold, they could design new proteins to counteract the effects of disease. So getting to the top of the tables in Foldit could hold even more glory for you than you bargained for – if your protein folding efforts help cure a dreaded disease, hey, maybe it’s the Nobel Prize you’ll end up winning.

 

Further reading

The coloured diagram of the enzyme above is a 3D representation to help people see how the protein folds. These are called ribbon diagrams and were invented by Jane S Richardson, find out more here.

Lego computer science: representing numbers

Continuing a series of blogs on what to do with all that lego scattered over the floor: learn some computer science…what does number representation mean?

We’ve seen some different ways to represent images and how ultimately they can be represented as numbers but how about numbers themselves. We talk as though computers can store numbers as numbers but even they are represented in terms of simpler things in computers.

Lego numbers

But first what do we mean by a number and a representation of a number? If I told you to make the numbers 0 to 9 in lego (go on have a go) you may well make something like this…

But those symbols 0, 1, 2, … are just that. They are symbols representing numbers not the numbers themselves. They are arbitrary choices. Different cultures past and present use different symbols to mean the same thing. For example, the ancient Egyptian way of writing the number 1000 was a hieroglyph of a water lily. (Perhaps you can make that in lego!)

The ancient Egyptian way to write 1000 was a hieroglyph of a waterlily

What really are numbers? What is the symbol 2 standing for? It represents the abstract idea of twoness ie any collection, group or pile of two things: 2 pieces of lego, 2 ducks, 2 sprouts, … and what is twoness? … it is oneness with one more thing added to the pile. So if you want to get closer to the actual numbers then a closer representation using lego might be a single brick, two bricks, three bricks, … put together in any way you like.

Numbers represented by that number of lego bricks

Another way would to use different sizes of bricks for them. Use a lego brick with a single stud for 1, a 2-stud brick for two and so on (combining bricks where you don’t have a single piece with the right number of studs). In these versions 0 is the absence of anything just like the real zero.

Lego bricks representing numbers based on the number of studs showing.

Once we do it in bricks it is just another representation though – a symbol of the actual thing. You can actually use any symbols as long as you decide the meaning in advance, there doesn’t actually have to be any element of twoness in the symbol for two. What other ways can you think of representing numbers 0 to 9 in lego? Make them…

A more abstract set of symbols would be to use different coloured bricks – red for 1, blue for 2 and so on. Now 0 can have a direct symbol like a black brick. Now as long as it is the right colour any brick would do. Any sized red brick can still mean 1 (if we want it to). Notice we are now doing the opposite of what we did with images. Instead of representing a colour with a number, we are representing a number with a colour.

Numbers represented as colours.

Here is a different representation. A one stud brick means 1, a 2-stud brick means 2, a square 4 stud brick means 3, a rectangular 6 stud brick means 4 and so on. As long as we agreed that is what they mean it is fine. Whatever representation we choose it is just a convention that we have to then be consistent about and agree with others.

Numbers represented by increasing sized blocks

What has this to do with computing? Well if we are going to write algorithms to work with numbers, we need a way to store and so represent numbers. More fundamentally though, computation (and so at its core computer science) really is all about symbol manipulation. That is what computational devices (like computers) do. They just manipulate symbols using algorithms. We will see this more clearly when we get to creating a simple computer (a Turing Machine) out of lego (but that is for later).

We interpret the symbols in the inputs of computers and the symbols in the outputs with meanings and as a result they tell us things we wanted to know. So if we key the symbols 12+13= into a calculator or computer and it gives us back 25, what has happened is just that it has followed some rules (an algorithm for addition) that manipulated those input symbols and made it spew out the output symbols. It has no idea what they mean as it is just blindly following its rules about how to manipulate symbols. We also could have used absolutely any symbols for the numbers and operators as long as they were the ones the computer was programmed to manipulate. We are the ones that add the intelligence and give those symbols meanings of numbers and addition and the result of doing an addition.

This is why representations are important – we need to choose a representation for things that makes the symbol manipulation we intend to do easy. We already saw this with images. If we want to send a large image to someone else then a representation of images like run-length encoding that shrinks the amount of data is a good idea.

When designing computers we need to provide them with a representation of numbers so they can manipulate those numbers. We have seen that there are lots of representations we could choose for numbers and any in theory would do, but when we choose a representation of numbers for use to do computation, we want to pick one that makes the operations we are interested in doing easy. Charles Babbage for example chose to use cog-like wheels turned to particular positions to represent numbers as he had worked out how to create a mechanism to do calculation with them. But that is something for another time…


This post was funded by UKRI, through grant EP/K040251/2 held by Professor Ursula Martin, and forms part of a broader project on the development and impact of computing.


Lego Computer Science

Part 1: Lego Computer Science: pixel pictures

Part 2: Lego Computer Science: compression algorithms

Part 3: Lego Computer Science: representing numbers

Why would you accept inefficiency?

by Paul Curzon, Queen Mary University of London

British Airways Plane x 3

In May 2017, British Airways IT system had a meltdown. Someone mistakenly disconnected the power for a short time. The fleet was grounded and tens of thousands of passengers were left stranded for days. One suggestion was it was due to “cost cutting”. Willie Walsh, the Head of BAs parent group came out fighting, defending the idea of doing things cheaply: “You talk about it as cost-cutting, I talk about it as efficiency … The idea that you would accept inefficiency – I just don’t get it.”

The fact that many business leaders don’t get it may be exactly the problem. Doing things more cheaply than the competition is an idea that is at the core of capitalism. It is often taken as a given. But, is it really always true?

The best and only the best

Computer Scientists actually use the word “efficiency” in a subtly different way. When they talk about a program or algorithm being efficient, they do not mean that it was cheap. They mean it did exactly the same job, but faster or with less memory. This is one of the really creative areas of computer science. Can you come up with an algorithm that does exactly the same thing but in fewer steps?

The business version of efficiency would be fine if it had the same underlying principle. Do it cheaper, yes, but only if it really does do exactly the same thing in all circumstances. To company bosses, however, the trade-off can be seen as cut costs at all costs. ‘Waste’ is anything you think no one will notice. You accept the 1 in a million chance of it not working at all – just as with the BA meltdown, taking the hit (or rather your passengers do) because you think you will make more money overall as a result.

Even with algorithms we do accept inefficiency though. Engineering is often about trade-offs. Sometimes, you will accept inefficiency in the use of memory because it gives a way to get a faster algorithm. Sometimes you accept a slower algorithm because it is just easier to be certain your code really does do the right thing. Sometimes slow is good enough. Sometimes it is the bigger picture that matters. The fastest algorithms for searching for information require sorted data. That is why a dictionary is in alphabetical order. Finding the word you want is quick – you don’t have to check every word in turn to find the one you want. However, if you were only ever going to look for a single thing in a data source, you wouldn’t sort it first. You would use an inefficient search algorithm, because overall that would be faster than sorting and then searching once. Efficiency can be subtle.

Inefficiently safe

There are actually even more powerful reasons for demanding inefficiency. In the area of safety-critical systems, computer scientists build in redundancy on purpose. When the consequences of the computer not working is that lives are lost, we definitely want inefficiency, as long as it is well-engineered inefficiency. Dependability and safety matter more.

An algorithm is a mathematical object. If it works, it always works. However, programs operate in the real world where things can go wrong. Hardware fails, clocks drift, criminals hack, technicians do silly things by accident (like unplug the power). Systems that matter have to be resilient. They have to cope with the unexpected, with the never before seen. One way that is achieved is by designing in inefficiency. For example, if your single computer goes down, you are stuffed. If instead two computers run the same program in parallel, then if one goes down the other can take over. Ahh, but how do you know which is wrong when they disagree? Be even more ‘inefficient’ and have three computers ‘wastefully’ doing the same thing. Then, if one goes rogue, the three vote on who is at fault … cut them out and carry on.

Computer Scientists have developed many ingenious ways to build in guarantees of safety even when the world around conspires against us. To a cost cutter these extras may seem like inefficiency but the inefficiency is there, apparently unused most of the time, waiting to step in and avert disaster, waiting to save lives. Personally, I would accept inefficiency. I hope, for the sake of saved lives, society would too.

More on …