The gender shades audit

by Jo Brodie, Queen Mary University of London

Face recognition technology is used widely, such as at passport controls and by police forces. What if it isn’t as good at recognising faces as it has been claimed to be? Joy Buolamwini and Timnit Gebru tested three different commercial systems and found that they were much more likely to wrongly classify darker skinned female faces compared to lighter or darker skinned male faces. The systems were not reliable.

Different skin tone cosmetics
Image by Stefan Schweihofer from Pixabay

Face recognition systems are trained to detect, classify and even recognise faces based on a bank of photographs of people. Joy and Timnit examined two banks of images used to train the systems and found that around 80 percent of the photos used were of people with lighter coloured skin. If the photographs aren’t fairly balanced in terms of having a range of people of different gender and ethnicity then the resulting technologies will inherit that bias too. The systems examined were being trained to recognise light skinned people.

The pilot parliaments benchmark

Joy and Timnit decided to create their own set of images and wanted to ensure that these covered a wide range of skin tones and had an equal mix of men and women (‘gender parity’). They did this using photographs of members of parliaments around the world which are known to have a reasonably equal mix of men and women. They selected parliaments both from countries with mainly darker skinned people (Rwanda, Senegal and South Africa) and from countries with mainly lighter skinned people (Iceland, Finland and Sweden).

They labelled all the photos according to gender (they had to make some assumptions based on name and appearance if pronouns weren’t available) and used a special scale called the Fitzpatrick scale to classify skin tones (see Different Shades below). The result was a set of photographs labelled as dark male, dark female, light male, light female, with a roughly equal mix across all four categories: this time, 53 per cent of the people were light skinned (male and female).

Testing times

Joy and Timnit tested the three commercial face recognition systems against their new database of photographs (a fair test of a wide range of faces that a recognition system might come across) and this is where they found that the systems were less able to correctly identify particular groups of people. The systems were very good at spotting lighter skinned men, and darker skinned men, but were less able to correctly identify darker skinned women, and women overall. The tools, trained on sets of data that had a bias built into them, inherited those biases and this affected how well they worked.

As a result of Joy and Timnit’s research there is now much more recognition of the problem, and what this might mean for the ways in which face recognition technology is used. There is some good news, though. The three companies made changes to improve their systems and several US cities have already banned the use of this technology in criminal investigations, with more likely to follow. People worldwide are more aware of the limitations of face recognition programs and the harms to which they may be (perhaps unintentionally) put, with calls for better regulation.

Different Shades
The Fitzpatrick skin tone scale is used by skin specialists to classify how someone’s skin responds to ultraviolet light. There are six points on the scale with 1 being the lightest skin and 6 being the darkest. People whose skin tone has a lower Fitzpatrick score are more likely to burn in the sun and are at greater risk of skin cancer. People with higher scores have darker skin which is less likely to burn and have a lower risk of skin cancer. A variation of the Fitzpatrick scale, with five points, is used to create the skin tone emojis that you’ll find on most messaging apps in addition to the ‘default’ yellow.

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

Recognising (and addressing) bias in facial recognition tech #BlackHistoryMonth

By Jo Brodie and Paul Curzon, Queen Mary University of London

A unit containing four sockets, 2 USB and 2 for a microphone and speakers.
Happy, though surprised, sockets Photo taken by Jo Brodie in 2016 at Gladesmore School in London.

Some people have a neurological condition called face blindness (also known as ‘prosopagnosia’) which means that they are unable to recognise people, even those they know well – this can include their own face in the mirror! They only know who someone is once they start to speak but until then they can’t be sure who it is. They can certainly detect faces though, but they might struggle to classify them in terms of gender or ethnicity. In general though, most people actually have an exceptionally good ability to detect and recognise faces, so good in fact that we even detect faces when they’re not actually there – this is called pareidolia – perhaps you see a surprised face in this picture of USB sockets below.

How about computers? There is a lot of hype about face recognition technology as a simple solution to help police forces prevent crime, spot terrorists and catch criminals. What could be bad about being able to pick out wanted people automatically from CCTV images, so quickly catch them?

What if facial recognition technology isn’t as good at recognising faces as it has sometimes been claimed to be, though? If the technology is being used in the criminal justice system, and gets the identification wrong, this can cause serious problems for people (see Robert Williams’ story in “Facing up to the problems of recognising faces“).

“An audit of commercial facial-analysis tools
found that dark-skinned faces are misclassified
at a much higher rate than are faces from any
other group. Four years on, the study is shaping
research, regulation and commercial practices.”

The unseen Black faces of AI algorithms
(19 October 2022) Nature

In 2018 Joy Buolamwini and Timnit Gebru shared the results of research they’d done, testing three different commercial facial recognition systems. They found that these systems were much more likely to wrongly classify darker-skinned female faces compared to lighter- or darker-skinned male faces. In other words, the systems were not reliable. (Read more about their research in “The gender shades audit“).

“The findings raise questions about
how today’s neural networks, which …
(look for) patterns in huge data sets,
are trained and evaluated.”

Study finds gender and skin-type bias
in commercial artificial-intelligence systems
(11 February 2018) MIT News

Their work has shown that face recognition systems do have biases and so are not currently at all fit for purpose. There is some good news though. The three companies whose products they studied made changes to improve their facial recognition systems and several US cities have already banned the use of this tech in criminal investigations. More cities are calling for it too and in Europe, the EU are moving closer to banning the use of live face recognition technology in public places. Others, however, are still rolling it out. It is important not just to believe the hype about new technology and make sure we do understand their limitations and risks.

More on

Further reading

More technical articles

• Joy Buolamwini and Timnit Gebru (2018) Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, Proceedings of Machine Learning Research 81:1-15. [EXTERNAL]
The unseen Black faces of AI algorithms (19 October 2022) Nature News & Views [EXTERNAL]


See more in ‘Celebrating Diversity in Computing

We have free posters to download and some information about the different people who’ve helped make modern computing what it is today.

Screenshot showing the vibrant blue posters on the left and the muted sepia-toned posters on the right

Or click here: Celebrating diversity in computing


EPSRC supports this blog through research grant EP/W033615/1.