by Jo Brodie, Queen Mary University of London
Face recognition technology is used widely, such as at passport controls and by police forces. What if it isn’t as good at recognising faces as it has been claimed to be? Joy Buolamwini and Timnit Gebru tested three different commercial systems and found that they were much more likely to wrongly classify darker skinned female faces compared to lighter or darker skinned male faces. The systems were not reliable.
Face recognition systems are trained to detect, classify and even recognise faces based on a bank of photographs of people. Joy and Timnit examined two banks of images used to train the systems and found that around 80 percent of the photos used were of people with lighter coloured skin. If the photographs aren’t fairly balanced in terms of having a range of people of different gender and ethnicity then the resulting technologies will inherit that bias too. The systems examined were being trained to recognise light skinned people.
The pilot parliaments benchmark
Joy and Timnit decided to create their own set of images and wanted to ensure that these covered a wide range of skin tones and had an equal mix of men and women (‘gender parity’). They did this using photographs of members of parliaments around the world which are known to have a reasonably equal mix of men and women. They selected parliaments both from countries with mainly darker skinned people (Rwanda, Senegal and South Africa) and from countries with mainly lighter skinned people (Iceland, Finland and Sweden).
They labelled all the photos according to gender (they had to make some assumptions based on name and appearance if pronouns weren’t available) and used a special scale called the Fitzpatrick scale to classify skin tones (see Different Shades below). The result was a set of photographs labelled as dark male, dark female, light male, light female, with a roughly equal mix across all four categories: this time, 53 per cent of the people were light skinned (male and female).
Testing times
Joy and Timnit tested the three commercial face recognition systems against their new database of photographs (a fair test of a wide range of faces that a recognition system might come across) and this is where they found that the systems were less able to correctly identify particular groups of people. The systems were very good at spotting lighter skinned men, and darker skinned men, but were less able to correctly identify darker skinned women, and women overall. The tools, trained on sets of data that had a bias built into them, inherited those biases and this affected how well they worked.
As a result of Joy and Timnit’s research there is now much more recognition of the problem, and what this might mean for the ways in which face recognition technology is used. There is some good news, though. The three companies made changes to improve their systems and several US cities have already banned the use of this technology in criminal investigations, with more likely to follow. People worldwide are more aware of the limitations of face recognition programs and the harms to which they may be (perhaps unintentionally) put, with calls for better regulation.
Different Shades
The Fitzpatrick skin tone scale is used by skin specialists to classify how someone’s skin responds to ultraviolet light. There are six points on the scale with 1 being the lightest skin and 6 being the darkest. People whose skin tone has a lower Fitzpatrick score are more likely to burn in the sun and are at greater risk of skin cancer. People with higher scores have darker skin which is less likely to burn and have a lower risk of skin cancer. A variation of the Fitzpatrick scale, with five points, is used to create the skin tone emojis that you’ll find on most messaging apps in addition to the ‘default’ yellow.
More on …
Related Magazines …
EPSRC supports this blog through research grant EP/W033615/1.










