by Paul Curzon, Milan Verma and Hamit Soyel, Queen Mary University of London
Try our spot the difference puzzles set by an Artificial Intelligence …
NOTE: this page containsslowlyflashing images.
Queen Mary researcher, Milan Verma used his AI program that modelled the way human brains see the world (our ‘vision system’) to change the details of some pictures in places where the program predicted changes should be easy to spot. Other pictures were changed in places where the AI predicted we would struggle to see even when big areas were changed.
The images flash back and forth between the original and the changed version. How quickly can you see the difference between the two versions.
Spot the Difference: Challenge 1
As this image flashes, something changes. This one is predicted by our AI to be quite hard to see. How quickly can you see the difference between the two versions?
Once you spot it (or give up) see both the answer (linked below) and the ‘saliency maps’ which show where the model predicts where your attention will be drawn to and away from.
You can also try our second challenge that is predicted to be easier.
Spot the Difference: Challenge 2
As this image flashes, something changes. This one is predicted by our AI to be easier to see. How quickly can you see the difference between the two versions?
by Peter W McOwan, Queen Mary University of London
(updated from the archive)
Flies are small, fast and rather cunning. Try to swat one and you will see just how efficient their brain is, even though it has so few brain cells that each one of them can be counted and given a number. A fly’s brain is a wonderful proof that, if you know what you’re doing, you can efficiently perform clever calculations with a minimum of hardware. The average household fly’s ability to detect movement in the surrounding environment, whether it’s a fly swat or your hand, is due to some cunning wiring in their brain.
Movement is measured by detecting something changing position over time. The ratio distance/time gives us the speed, and flies have built in speed detectors. In the fly’s eye, a wonderful piece of optical engineering in itself with hundreds of lenses forming the mosaic of the compound eye, each lens looks at a different part of the surrounding world, and so each registers if something is at a particular position in space.
All the lenses are also linked by a series of nerve cells. These nerve cells each have a different delay. That means a signal takes longer to pass along one nerve than another. When a lens spots an object in its part of the world, say position A, this causes a signal to fire into the nerve cells, and these signals spread out with different delays to the other lenses’ positions.
The separation between the different areas that the lenses view (distance) and the delays in the connecting nerve cells (time) are such that a whole range of possible speeds are coded in the nerve cells. The fly’s brain just has to match the speed of the passing object with one of the speeds that are encoded in the nerve cells. When the object moves from A to B, the fly knows the correct speed if the first delayed signal from position A arrives at the same time as the new signal at position B. The arrival of the two signals is correlated. That means they are linked by a well-defined relation, in this case the speed they are representing.
Do locusts like Star Wars?
Understanding the way that insects see gives us clever new ways to build things, and can also lead to some bizarre experiments. Researchers in Newcastle showed locusts edited highlights from the original movie Star Wars. Why you might ask? Do locusts enjoy a good Science Fiction movie? It turns out that the researchers were looking to see if locusts could detect collisions. There are plenty of those in the battles between X-wing fighters and Tie fighters. They also wanted to know if this collision detecting ability could be turned into a design for a computer chip. The work, part-funded by car-maker Volvo, used such a strange way to examine locust’s vision that it won an Ig Nobel award in 2005. Ig Noble awards are presented each year for weird and wonderful scientific experiments, and have the motto ‘Research that makes people laugh then think’. You can find out more at http://improbable.com
Car crash: who is to blame?
So what happens if we start to use these insect ‘eye’ detectors in cars, building
We now have smart cars with the artificial intelligence (AI) taking over from the driver completely or just to avoid hitting other things. An interesting question arises. When an accident does happen, who is to blame? Is it the car driver: are they in charge of the vehicle? Is it the AI to blame? Who is responsible for that: the AI itself (if one day we give machines human-like rights), the car manufacturer? Is it the computer scientists who wrote the program? If we do build cars with fly or locust like intelligence, which avoid accidents like flies avoid swatting or can spot possible collisions like locusts, is it the insect whose brain was copied that is to blame!?!What will insurance companies decide? What about the courts?
As computer science makes new things possible, society quickly needs to decide how to deal with them. Unlike the smart cars, these decisions aren’t something we can avoid.
Look out the window at the human-made world. It’s full of hard, geometric shapes – our buildings, the roads, our cars. They are made of solid things like tarmac, brick and metal that are designed to be rigid and stay that way. The natural world is nothing like that though. Things bend, stretch and squish in response to the forces around them. That provides a whole bunch of fascinating problems for computer scientists like Lourdes Agapito of Queen Mary, University of London to solve.
Computer scientists interested in creating 3-dimensional models of the world have so far mainly concentrated on modelling the hard things. Why? Because they are easier! You can see the results in computer-animated films like Toy Story, and the 3D worlds like Second Life your avatar inhabits. Even the soft things tend to be rigid.
Lourdes works in this general area creating 3D computer models, but she wants to solve the problems of creating them automatically just from the flat images in videos and is specifically interested in things that deform – the squishy things.
Look out the window and watch the world go by. As you watch a woman walk past you have no problem knowing that you are looking at the same person as you were a second ago – even if she becomes partially hidden as she walks behind the post box and turns to post a letter. The sun goes behind a cloud and the scene is suddenly darker. It starts to rain and she opens an umbrella. You can still recognise her as the same object. Your brain is pulling some amazing tricks to make this seem so mundane. Essentially it is creating a model of the world – identifying all the 3-dimensional objects that you see and tracking them over time. If we can do it, why can’t a computer?
Unlike hard surfaces, deformable ones don’t look the same from one still to the next. You don’t have to just worry about changes in lighting, them being partially hidden, and that they appear different from a different angle. The object itself will be a different shape from one still to the next. That makes it far harder to work out which bits of one image are actually the same as the ones in the next. Lourdes has taken on a seriously hard problem.
Existing vision systems that create 3D objects have made things easier for themselves by using existing models. If a computer already has a model of a cube to compare what it sees with, then spotting a cube in the image stream is much easier than working it out from scratch. That doesn’t really generalise to deformable objects though because they vary too much. Another approach, used by the film industry, is to put highly visible markers on objects so that those markers can be tracked. That doesn’t help if you just want to point a camera out the window at whatever passes by though.
Lourdes aim is to be able to point a camera at a deformable object and have a computer vision system be able to create a 3D model simply by analysing the images. No markers, no existing models of what might be there, not even previous films to train it with, just the video itself. So far her team have created a system that can do this in some situations such as with faces as a person changes their expression. Their next goal is to be able to make their system work for a whole person as they are filmed doing arbitrary things. It’s the technical challenge that inspires Lourdes the most, though once the problems of deformable objects are solved there are applications of course. One immediately obvious area is in operating theatres. Keyhole surgery is now very common. It involves a surgeon operating remotely, seeing what they are doing by looking at flat video images from a fibre optic probe inside the body of the person being operated on. The image is flat but the inside of the person that the surgeon is trying to make cuts in is 3-dimensional. It would be far less error prone if what the surgeon was looking at was an accurate 3D model of the video feed rather than just a flat picture. Of course the inside of your body is made of exactly the kind of squishy deformable surfaces that Lourdes is interested in. Get the computer science right and technologies like this will save lives.
At the same time as tackling seriously hard if squishy computer science problems, Lourdes is also a mother of three. A major reason she can fit it all in, as she points out, is that she has a very supportive partner who shares in the childcare. Without him it would be impossible to balance all the work involved in leading a top European research team. It’s also important to get away from work sometimes. Running regularly helps Lourdes cope with the pressures and as we write she is about to run her first half marathon.
Lourdes may or may not be the person who turns her team’s solutions into the applications that in the future save lives in operating theatres, spot suspicious behaviour in CCTV footage or allow film-makers to quickly animate the actions of actors. Whoever does create the applications, we still need people like Lourdes who are just excited about solving the fundamental problems in the first place.
Some people have a neurological condition called face blindness (also known as ‘prosopagnosia’) which means that they are unable to recognise people, even those they know well – this can include their own face in the mirror! They only know who someone is once they start to speak but until then they can’t be sure who it is. They can certainly detect faces though, but they might struggle to classify them in terms of gender or ethnicity. In general though, most people actually have an exceptionally good ability to detect and recognise faces, so good in fact that we even detect faces when they’re not actually there – this is called pareidolia – perhaps you see a surprised face in this picture of USB sockets below.
What if facial recognition technology isn’t as good at recognising faces as it has sometimes been claimed to be? If the technology is being used in the criminal justice system, and gets the identification wrong, this can cause serious problems for people (see Robert Williams’ story in “Facing up to the problems of recognising faces“).
In 2018 Joy Buolamwini and Timnit Gebru shared the results of research they’d done, testing three different commercial facial recognition systems. They found that these systems were much more likely to wrongly classify darker-skinned female faces compared to lighter- or darker-skinned male faces. In other words, the systems were not reliable.
The Gender Shades Audit
Facial recognition systems are trained to detect, classify and even recognise faces using a bank of photographs of people. Joy and Timnit examined two banks of images used to train facial recognition systems and found that around 80 per cent of the photos used were of people with lighter coloured skin.
If the photographs aren’t fairly balanced in terms of having a range of people of different gender and ethnicity then the resulting technologies will inherit that bias too. Effectively the systems here were being trained to recognise light-skinned people.
The Pilot Parliaments Benchmark
They decided to create their own set of images and wanted to ensure that these covered a wide range of skin tones and had an equal mix of men and women (‘gender parity’). They did this by selecting photographs of members of various parliaments around the world which are known to have a reasonably equal mix of men and women, and selected parliaments from countries with predominantly darker skinned people (Rwanda, Senegal and South Africa) and from countries with predominantly lighter-skinned people (Iceland, Finland and Sweden).
They labelled all the photos according to gender (they did have to make some assumptions based on name and appearance if pronouns weren’t available) and used the Fitzpatrick scale (see Different shades, below) to classify skin tones. The result was a set of photographs labelled as dark male, dark female, light male, light female with a roughly equal mix across all four categories – this time, 53 per cent of the people were light-skinned (male and female).
The Fitzpatrick skin tone scale (top) is used by dermatologists (skin specialists) as a way of classifying how someone’s skin responds to ultraviolet light. There are six points on the scale with 1 being the lightest skin and 6 being the darkest. People whose skin tone has a lower Fitzpatrick score are more likely to burn in the sun and not tan, and are also at greater risk of melanoma (skin cancer). People with higher scores have darker skin which is less likely to burn and they have a lower risk of skin cancer.
Below it is a variation of the Fitzpatrick scale, with five points, which is used to create the skin tone emojis that you’ll find on most messaging apps in addition to the ‘default’ yellow.
Testing three face recognition systems
Joy and Timnit tested the three commercial face recognition systems against their new database of photographs – a fair test of a wide range of faces that a recognition system might come across – and this is where they found that the systems were less able to correctly identify particular groups of people. The systems were very good at spotting lighter-skinned men, and darker skinned men, but were less able to correctly identify darker-skinned women, and women overall.
These tools, trained on sets of data that had a bias built into them, inherited those biases and this affected how well they worked. Joy and Timnit published the results of their research and it was picked up and discussed in the news as people began to realise the extent of the problem, and what this might mean for the ways in which facial recognition tech is used.
There is some good news though. The three companies made changes to improve their facial recognition technology systems and several US cities have already banned the use of this tech in criminal investigations, and more cities are calling for it too. People around the world are becoming more aware of the limitations of this type of technology and the harms to which it may be (perhaps unintentionally) put and are calling for better regulation of these systems.
In 2009 Desi Cryer, who is Black, shared a light-hearted video with a serious message. He’d bought a new computer with a face tracking camera… which didn’t track his face, at all. It did track his White colleague Wanda’s face though. In the video (below) he asked her to go in front of the camera and move from side to side and the camera obediently tracked her face – wherever she moved the camera followed. When Desi moved back in front of the camera it stopped again. He wondered if the computer might be racist…
Another video (below), this time from 2017, showed a dark-skinned man failing to get a soap to dispenser to give him some soap. Nothing happened when he put his hand underneath the sensor but as soon as his lighter-skinned friend put his hand under it – out popped some soap! The only way the first man could get any soap dispensed was to put a white tissue on his hand first. He wondered if the soap dispenser might be racist…
What’s going on?
Probably no-one set out to maliciously design a racist device but designers might need to check that their products work with a range of different people before putting them on the market. This can save the company embarrassment as well as creating something that more people want to buy.
Sensors working overtime
Both devices use a sensor that is activated (or in these cases isn’t) by a signal. Soap dispensers shine a beam of light which bounces off a hand placed below it and some of that light is reflected back. Paler skin reflects more light (and so triggers the sensor) than darker skin. Next to the light is a sensor which responds to the reflected light – but if the device was only tested on White people then the sensor wasn’t adjusted for the full range of skin tones and so won’t respond appropriately. Similarly cameras have historically been designed for White skin tones meaning darker tones are not picked up as well.
Things can be improved!
It’s a good idea, when designing something that will be used by lots of different people, to make sure that it will work correctly with everyone. Having a diverse design team and, importantly, making sure that everyone feels empowered to contribute is a good way to start. Another is to test the design with different target audiences early in the design process so that changes can be made before it’s too late. How a company responds to feedback when they’ve made an oversight is also important. In the case of the computer company they acknowledged the problem and went to work to improve the camera’s sensitivity.
A problemwithpulse oximeters
During the coronavirus pandemic many people bought a ‘pulse oximeter’, a device which clips painlessly onto a finger and measures how much oxygen is circulating in your blood (and your pulse). If the oxygen reading became too low people were advised to go to hospital. Oximeters shine red and infrared light from the top clip through the finger and the light is absorbed diferently depending on how much oxygen is present in the blood. A sensor on the lower clip measures how much light has got through but the reading can be affected by skin colour (and coloured nail polish). People were concerned that pulse oximeters would overestimate the oxygen reading for someone with darker skin (that is, tell them they had more oxygen than they actually had) and that the devices might not detect a drop in oxygen quickly enough to warn them.
In response the UK Government announced in August 2022 that it would investigate this bias in a range of medical devices to ensure that future devices work effectively for everyone.
Computer science and artificial intelligence have provided a new way to do science: it was in fact one of the earliest uses of the computer. They are now giving new ways for scholars to do research in other disciplines such as ancient history, too. Artificial Intelligence has been used in a novel way to help understand how the Dead Sea Scrolls were written, and it turns out scribes in ancient Judea worked in teams.
The Dead Sea Scrolls are a collection of almost a thousand ancient documents written several thousand years ago that were found in caves near the Dead Sea. The collection includes the oldest known written version of the Bible.
Researchers from the University of Groningen used artificial intelligence techniques to analyse a digitised version of the longest scroll in the collection, known as the Great Isaiah Scroll. They picked one letter, aleph, that appears thousands of times through the document, and analysed it in detail.
Two kinds of artificial intelligence programs were used. The first, feature extraction, based on computer vision and image processing was needed to recognize features in the images. At one level this is the actual characters, but also more subtly here, the aim was that the features corresponded to ink traces based on the actual muscle movements of the scribes.
The second was machine learning. Machine Learning programs are good at spotting patterns in data – grouping the data into things that are similar and things that are different. A typical text book example would be giving the program images of cats and of dogs. It would spot the patterns of features that correspond to dogs, and the different pattern of features that corresponds to cats and group each image into one or the other pattern.
Here the data was all those alephs or more specifically the features extracted from them. Essentially the aim was to find patterns that were based on the muscle movements of the original scribe of each letter. To the human eye the writing throughout the document looks very, very uniform, suggesting a single scribe wrote the whole document. If that was the case, only one pattern would be found that all letters were part of with no clear way to split them. Despite this, the artificial intelligence evidence suggests there were actually two scribes involved. There were two patterns.
The research team found, by analysing the way the letters were written, that there were two clear groupings of letters. One group were written in one way and the other in a slightly different way. There were very subtle differences in the way strokes were written, such as in their thickness and the positions of the connections between strokes. This could just be down to variations in the way a single writer wrote letters at different times. However, the differences were not random, but very clearly split at a point halfway through the scroll. This suggests there were two writers who each worked on the different parts. Because the characters were otherwise so uniform, those two scribes must have been making an effort to carefully mirror each other’s writing style so the letters looked the same to the naked eye.
The research team have not only found out something interesting about the Dead Sea Scrolls, but also demonstrated a new way to study ancient hand writing. With a few exceptions, the scribes who wrote the ancient documents, like the Dead Sea Scrolls, that have survived to the modern day, are generally anonymous, but thanks to leading-edge Computer Science, we have a new way to find out more about them.
Artificial Intelligence software has shown that two different Manchester United gaffers got it right believing that kit and stadium seat colours matter if the team are going to win.
It is 1996. Sir Alex Ferguson’s Manchester United are doing the unthinkable. At half time they are losing 3-0 to lowly Southampton. Then the team return to the pitch for the second half and they’ve changed their kit. No longer are they wearing their normal grey away kit but are in blue and white, and their performance improves (if not enough to claw back such a big lead). The match becomes infamous for that kit change: the genius gaffer blaming the team’s poor performance on their kit seemed silly to most. Just play better football if you want to win!
Jump forward to 2021, and Manchester United Manager Ole Gunnar Solskjaer, who originally joined United as a player in that same year, 1996, tells a press conference that the club are changing the stadium seats to improve the team’s performance!
Is this all a repeat of previously successful mind games to deflect from poor performances? Or superstition, dressed up as canny management, perhaps. Actually, no. Both managers were following the science.
Ferguson wasn’t just following some gut instinct, he had been employing a vision scientist, Professor Gail Stephenson, who had been brought in to the club to help improve the players’ visual awareness, getting them to exercise the muscles in their eyes not just their legs! She had pointed out to Ferguson that the grey kit would make it harder for the players to pick each other out quickly. The Southampton match was presumably the final straw that gave him the excuse to follow her advice.
She was very definitely right, and modern vision Artificial Intelligence technology agrees with her! Colours do make it easier or harder to notice things and slows decision making in a way that matters on the pitch. 25 years ago the problem was grey kit merging into the grey background of the crowd. Now it is that red shirts merge into the background of an empty stadium of red seats.
It is all about how our brain processes the visual world and the saliency of objects. Saliency is just how much an object stands out and that depends on how our brain processes information. Objects are much easier to pick out if they have high contrast, for example, like a red shirt on a black background.
Peter McOwan and Hamit Soyel at Queen Mary combined vision research and computer science, creating an Artificial Intelligence (AI) that sees like humans in the sense that it predicts what will and won’t stand out to us, doing it in real time (see DragonflyAI: I see what you see). They used the program to analyse images from that infamous football match before and after the kit change and showed that the AI agreed with Gail Stephenson and Alex Ferguson. The players really were much easier for their team mates to see in the second half (see the DragonflyAI version of the scenes below).
Details matter and science can help teams that want to win in all sorts of ways. That includes computer scientists and Artificial Intelligence. So if you want an edge over the opposition, hire an AI to analyse the stadium scene at your next match. Changing the colour of the seats really could make a difference.
What use is a computer that sees like a human? Can’t computers do better than us? Well, such a computer can predict what we will and will not see, and there is BIG money to be gained doing that!
Peter McOwan’s team at Queen Mary spent 10 years doing exploratory research understanding the way our brains really see the world, exploring illusions, inventing games to test the ideas, and creating a computer model to test their understanding. Ultimately they created a program that sees like a human. But what practical use is a program that mirrors the oddities of the way we see the world? Surely a computer can do better than us: noticing all the things that we miss or misunderstand? Well, for starters the research opens up exciting possibilities for new applications, especially for marketeers.
A fruitful avenue to emerge is ‘visual analytics’ software: applications that predict what humans will and will not notice. Our world is full of competing demands, overloading us with information. All around us things vie to catch our attention, whether a shop window display, a road sign warning of danger or an advertising poster.
Imagine, a shop has a big new promotion designed to entice people in, but no more people enter than normal. No-one notices the display. Their attention is elsewhere. Another company runs a web ad campaign, but it has no effect, as people’s eyes are pulled elsewhere on the screen. A third company pays to have its products appear in a blockbuster film. Again, a waste of money. In surveys afterwards no one knew the products had been there. A town council puts up a new warning sign at a dangerous bend in the road but the crashes continue. These are examples of situations where predicting where people look in advance allows you to get it right. In the past this was either done by long and expensive user testing, perhaps using software that tracks where people look, or by having teams of ‘experts’ discuss what they think will happen. What if a program made the predictions in a fraction of a second beforehand? What if you could tweak things repeatedly until your important messages could not be missed.
Queen Mary’s Hamit Soyel turned the research models into a program called DragonflyAI, which does exactly that. The program analyses all kinds of imagery in real-time and predicts the places where people’s attention will, and will not, be drawn. It works whether the content is moving or not, and whether it is in the real world, completely virtual, or both. This then gives marketeers the power to predict and so influence human attention to see the things they want. The software quickly caught the attention of big, global companies like NBC Universal, GSK and Jaywing who now use the technology.