by Paul Curzon, Queen Mary University of London
Google, one of the most powerful companies in the world, is famous for being founded by Larry Page and Sergey Brin, but a key person, the 20th person employed, was engineer, programmer and believer in detail, Marissa Mayer. Her attention to detail made a gigantic difference to Google’s success. She was involved in most of their successful products, from the search engine to Gmail to Adwords and if she wasn’t convinced about a new feature or change, then it didn’t happen. When a designer suggested a new shade of blue for the links of ads, for example, she had to be persuaded. But how could she be sure she did make the right decisions? She used a centuries old idea from medicine, first used to help cure scurvy in sailors, and applied it to software design: the randomized controlled trial.
Randomized controlled trials revolutionized medicine. They could revolutionize many other aspects of our lives too, from education to prison reform, if they were used more. Computer Scientists realized that, and more trials are now used on software than medicines. It’s part of the Big Data revolution and is the way to avoid relying on hunches, instead relying on scientific method to find out what the right answer really is.
But what if …?
The problem with the way we do most things is “what-if”. We make decisions, but never know what would have happened if we took the other choice. If things go well we pat ourselves on the back and tell ourselves we are right. But things might have gone even better had we only made the other decision. We will never know. However good or bad it seems, there is no way of knowing actually if our decision was the right one, if all we do is make it. We then delude ourselves, and so keep doing bad things, over and over. That’s why illness was treated by getting leeches to suck blood for centuries!
Controlled trials overcome this. The big idea boils down to making sure you do both alternatives! Not only do you make the change, you also leave things alone too! That sounds impossible, but it’s simple. Split your population (patients, users, prisoners, students, …) into two groups at random. Apply the change to one group, but leave the other group alone. Then at the end of a suitable period, long enough so you can see any difference, compare the results. You see not only the result of making the change, but also what would have happened if you didn’t. Only then, with hard data about the consequences of both possibilities, do you take the decision.
The first medical trial like this involved sailors who were ill with scurvy – a disease that killed more wartime sailors than enemy action in the 18th century. Scottish Navy surgeon James Lind waited until his ship had been at sea long enough for many sailors to get scurvy. He then split a dozen into 6 pairs: one group had oranges and lemons on top of the normal food, and the others were given different alternatives like cider or vinegar instead. Within a week, the two eating fruit were virtually recovered. More to the point, there was no difference in any of the others apart from an improvement in the pair given cider. Eating fruit was clearly the right decision to cure scurvy. All new drugs are now tested in trials like this to find out if they really do make patients better or not. Because you know what happens to those not given the new treatment, you know any improvement wouldn’t have happened any way.
So how do computer scientists use this sort of trial? The way Marissa Mayer’s team did it is a classic example. One of Google’s designers was suggesting they use a slightly different shade of blue for the links on ads in Google’s mail program. Rather than take his word that it was an improvement, they ran a trial. They created a version of the program that had multiple colours possible for the links, each a different shade of blue. They then split all the users of the program into groups and gave each a different shade of blue for their links, tracking the results. One particular shade led to more clicks on the ads than any other. That was the shade Marissa chose (and it wasn’t the shade the designer had suggested!)
Software trials like this are called A/B Testing. They have become the mainstay of hi-tech companies wanting an edge. It actually leads to a new way of developing software. Rather than get a perfect product at the outset you get something basic that works well enough quickly. Then you set to work running trials on lots of small details, making what are called ‘marginal gains’, as soon as possible. One small detail may not make a big difference, but when you pile them up, each known to be a definite improvement because of the trial, then very quickly your software improves. Trials can give better results than intelligent design!
Does it make a difference? Well the one decision about that shade of blue of Marissa’s team supposedly made Google $200 million a year, as a result of more people clicking on ads. Google now run tens of thousand of trails like this each year. Add the benefits of lots and lots of small improvements and you get one of the most powerful companies on the planet.
Little Gains in Life
The idea of developing software through marginal gains is actually based on the process used by nature: evolution by natural selection. Each species of animal seems perfectly designed for its environment, not because they were designed, but because only the fittest individuals survive to have babies. Any small improvement in a baby that gives it a better ability to survive means the genes responsible for that improvement are passed on. Over many generations the marginal gains add up to give creatures perfectly adapted to their environment.
This article was originally published on the CS4FN website and a copy can also be found on page 22-23 of our free magazine celebrating women in computer science, which you can download as a PDF below along with all of our free material.
Related Magazine …
This blog is funded through EPSRC grant EP/W033615/1.