Collaborative community coding & curating

Equality, diversity and inclusion in the R Project

Concentric Rings of rainbow people as triangles holding hands — Image by Gordon Johnson from Pixabay

You might not think of a programming language like Python or Scratch as being an ‘ecosystem’ but each language has its own community of people who create and improve its code (compilers, library code,…), flush out the bugs, introduce new features, document any changes and write the ‘how to’ guides for new users.

R is one such programming language. It’s named after its two co-inventors (Ross Ihaka and Robert Gentleman) and is used by around two million people around the world. People working in all sorts of jobs and industries (for example finance, academic research, government, data journalists) use R to analyse their data. The software has useful tools to help people see patterns in their data and to make sense of that information.

It’s also open source which means that anyone can use it and help to improve it, a bit like Wikipedia where anyone can edit an article or write a new one. That’s generally a good thing because it means everyone can contribute but it can also bring problems. Imagine writing an essay about an event at your school and sharing it with your class. Then imagine your classmates adding paragraphs of their own about the event, or even about different events. Your essay could soon become rather messy and you’d need to re-order things, take bits out and make sure people hadn’t repeated something that someone had already said (but in a slightly different way).

When changes are made to software people also want to keep a note not just of the ‘words’ added (the code) but also to make a note of who added what and when. Keeping good records, also known as documentation, helps keep things tidy and gives the community confidence that the software is being properly looked after.

Code and documentation can easily become a bit chaotic when created by different people in the community so there needs to be a core group of people keeping things in order. Fortunately there is – the ‘R Core Team’, but these days its membership doesn’t really reflect the community of R users around the world. R was first used in universities, particularly by more privileged statistics professors from European countries and North America (the Global North), and so R’s development tended to be more in line with their academic interests. R needs input and ideas from a more diverse group of active developers and decision-makers, in academia and beyond to ensure that the voices of minoritised groups are included. Also the voices of younger people, particularly as many of the current core group are approaching retirement age.

Dr Heather Turner from the University of Warwick is helping to increase the diversity of those who develop and maintain the R programming language and she’s been given funding by the EPSRC* to work on this. Her project is a nice example of someone who is bringing together two different areas in her work. She is mixing software development (tech skills) with community management (people skills) to support a range of colleagues who use R and might want to contribute to developing it in future, but perhaps don’t feel confident to do so yet.

Development can involve things like fixing bugs, helping to improve the behaviour or efficiency of programs or translating error messages that currently appear on-screen in the English language into different languages. Heather and her colleagues are working with the R community to create a more welcoming environment for ‘newbies’ that encourages participation, particularly from people who are in the community but who are not currently represented or under-represented by the core group and she’s working collaboratively with other community organisations such as R-Ladies, LatinR and RainbowR. Another task she’s involved in is producing an easier-to-follow ‘How to develop R’ guide.

There are also people who work in universities but who aren’t academics (they don’t teach or do research but do other important jobs that help keep things running well) and some of them use R too and can contribute to its development. However their contributions have been less likely to get the proper recognition or career rewards compared with those made by academics, which is a little unfair. That’s largely because of the way the academic system is set up.

Generally it’s academics who apply for funding to do new research, they do the research and then publish papers in academic journals on the research that they’ve done and these publications are evidence of their work. But the important work that supporting staff do in maintaining the software isn’t classified as new research so doesn’t generally make it into the journals, so their contribution can get left out. They also don’t necessarily get the same career support or mentoring for their development work. This can make people feel a bit sidelined or discouraged.

To try and fix this and to make things fairer the Society of Research Software Engineering was created to champion a new type of job in computing – the Research Software Engineer (RSE). These are people whose job is to develop and maintain (engineer) the software that is used by academic researchers (sometimes in R, sometimes in other languages). The society wants to raise awareness of the role and to build a community around it. You can find out what’s needed to become an RSE below.

Heather is in a great position to help here too, as she has a foot in each camp – she’s both an Academic and a Research Software Engineer. She’s helping to establish RSEs as an important role in universities while also expanding the diversity of people involved in developing R further, for its long-term sustainability.

Related careers

QMUL

Below is an example of a Research Software Engineer role which was advertised at QMUL in April 2024 – you can read the original advert and see a copy of the job description / person specification information which is archived at the “Jobs in Computer Science” website. This advert was looking for an RSE to support a research project “at the intersection of Natural Language Processing (NLP) and multi-modal Machine Learning, with applications in mental health.”

Closed, QMUL, Research Software Engineer (0.5FTE), ~£63.6k pro rata, closed April 2024

QMUL also has a team of Research Software Engineers and you can read about what they’re working on and their career here (there are also RSEs attached to different projects across the university, as above).

QMUL’s Research Software Engineer (RSE) team

Archived job adverts from elsewhere

Below are some examples of RSE jobs (these particular vacancies have now closed but you can read about what they were looking for and see if that sort of thing might interest you in the future). The links will take you to a page with the original job advert + any Job Description (JD – what the person would actually be doing) and might also include a Person Specification (PS – the type of person they’re looking for in terms of skills, qualifications and experience) – collectively these are often known as ‘job packs’.

Note that these documents are written for quite a technical audience – the people who’d apply for the jobs will have studied computer science for many years and will be familiar with how computing skills can be applied to different subjects.

1. The Science and Technology Facilities Council (STFC) wanted four Research Software Engineers (who’d be working either in Warrington or Oxford) on a chemistry-related project (‘computational chemistry’ – “a branch of chemistry that uses computer simulation to assist in solving chemical problems”)

2. The University of Cambridge was looking for a Research Software Engineer to work in the area of climate science – “Computational modelling is at the core of climate science, where complex models of earth systems are a routine part of the scientific process, but this comes with challenges…”

3. University College London (UCL) wanted a Research Software Engineer to work in the area of neuroscience (studying how the brain works, in this case by analysing the data from scientists using advanced microscopy).

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

The gender shades audit

by Jo Brodie, Queen Mary University of London

Face recognition technology is used widely, such as at passport controls and by police forces. What if it isn’t as good at recognising faces as it has been claimed to be? Joy Buolamwini and Timnit Gebru tested three different commercial systems and found that they were much more likely to wrongly classify darker skinned female faces compared to lighter or darker skinned male faces. The systems were not reliable.

Different skin tone cosmetics — *Image by Stefan Schweihofer from Pixabay*

Face recognition systems are trained to detect, classify and even recognise faces based on a bank of photographs of people. Joy and Timnit examined two banks of images used to train the systems and found that around 80 percent of the photos used were of people with lighter coloured skin. If the photographs aren’t fairly balanced in terms of having a range of people of different gender and ethnicity then the resulting technologies will inherit that bias too. The systems examined were being trained to recognise light skinned people.

The pilot parliaments benchmark

Joy and Timnit decided to create their own set of images and wanted to ensure that these covered a wide range of skin tones and had an equal mix of men and women (‘gender parity’). They did this using photographs of members of parliaments around the world which are known to have a reasonably equal mix of men and women. They selected parliaments both from countries with mainly darker skinned people (Rwanda, Senegal and South Africa) and from countries with mainly lighter skinned people (Iceland, Finland and Sweden).

They labelled all the photos according to gender (they had to make some assumptions based on name and appearance if pronouns weren’t available) and used a special scale called the Fitzpatrick scale to classify skin tones (see Different Shades below). The result was a set of photographs labelled as dark male, dark female, light male, light female, with a roughly equal mix across all four categories: this time, 53 per cent of the people were light skinned (male and female).

Testing times

Joy and Timnit tested the three commercial face recognition systems against their new database of photographs (a fair test of a wide range of faces that a recognition system might come across) and this is where they found that the systems were less able to correctly identify particular groups of people. The systems were very good at spotting lighter skinned men, and darker skinned men, but were less able to correctly identify darker skinned women, and women overall. The tools, trained on sets of data that had a bias built into them, inherited those biases and this affected how well they worked.

As a result of Joy and Timnit’s research there is now much more recognition of the problem, and what this might mean for the ways in which face recognition technology is used. There is some good news, though. The three companies made changes to improve their systems and several US cities have already banned the use of this technology in criminal investigations, with more likely to follow. People worldwide are more aware of the limitations of face recognition programs and the harms to which they may be (perhaps unintentionally) put, with calls for better regulation.

Different Shades
The Fitzpatrick skin tone scale is used by skin specialists to classify how someone’s skin responds to ultraviolet light. There are six points on the scale with 1 being the lightest skin and 6 being the darkest. People whose skin tone has a lower Fitzpatrick score are more likely to burn in the sun and are at greater risk of skin cancer. People with higher scores have darker skin which is less likely to burn and have a lower risk of skin cancer. A variation of the Fitzpatrick scale, with five points, is used to create the skin tone emojis that you’ll find on most messaging apps in addition to the ‘default’ yellow.

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

Reclaim your name

Canadian Passport — Image by tookapic from Pixabay

In June 2021 the Canadian government announced that Indigenous people would be allowed to use their ancestral family names on government-issued identity and travel documents. This meant that, for the first time, they could use the names that are part of their heritage and culture rather than the westernised names that are often used instead. Because of computers, it wasn’t quite as easy as that though …

Some Indigenous people take on a Western name to make things easier, to simplify things for official forms, to save having to spell the name, even to avoid teasing. If it is a real choice then perhaps that is fine, though surely we should be able to make it easy for people to use their actual names. For many it was certainly not a choice, their Indigenous names were taken from them. From the 19th century, hundreds of thousands of Indigenous children in Canada were sent to Western schools and made to take on Western names as part of an attempt to force them to “assimilate” into Western society. Some were even beaten if they did not use their new name. Because their family names had been “officially” changed, they and their descendants had to use these new names on official documents. Names matter. It is your identity, and in some cultures family names are also sacred. Being able to use them matters.

The change to allow ancestral names to be used was part of a reconciliation process to correct this injustice. After the announcement, Ta7talíya Nahanee, an indigenous woman from the Squamish community in Vancouver, was delighted to learn that she would be able to use her real name on her official documents, rather than ‘Michelle’ which she had previously used.

Unfortunately, she was frustrated to learn that travel documents could still only include the Latin alphabet (ABCDEFG etc) with French accents (À, Á, È, É etc). That excluded her name (pronounced Ta-taliya, the 7 is silent) as it contains a number and the letter í. Why? Because the computer said so!

Modern machine-readable passports have a specific area, called the Machine Readable Zone which can be read by a computer scanner at immigration. It has a very limited number of permitted characters. Names which don’t fit need to be “transliterated”, so Å would be written as AA, Ü as UE and the German letter ß (which looks like a B but sounds like a double S) is transliterated as SS. Names are completely rewritten to fit, so Müller becomes MUELLER, Gößmann becomes GOESSMANN, and Hämäläinen becomes HAEMAELAEINEN. If you’ve spent your life having your name adapted to fit someone else’s system this is another reminder of that.

While there are very sensible reasons for ensuring that a passport from one part of the world can be read by computers anywhere else, this choice of characters highlights that, in order to make things work, everyone else has been made to fall in line with the English-speaking population, another example of an unintentional bias. It isn’t, after all, remotely beyond our ability to design a system that meets the needs of everyone, it just needs the will. Designing computer systems isn’t just about machines. It’s about designing them for people.

Jo Brodie and Paul Curzon, Queen Mary University of London

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

Facing up to ALL faces

The problems of recognising faces

Wire frame face — Image by Gerd Altmann from Pixabay

How face recognition technology caused the wrong Black man to be arrested.

The police were waiting for Robert Williams when he returned home from work in Detroit, Michigan. They arrested him for robbery in front of his wife and terrified daughters aged two and five and took him to a detention centre where he was kept overnight. During his interview an officer showed him two grainy CCTV photos of a suspect alongside a photo of Williams from his driving licence. All the photos showed a large Black man, but that’s where the similarity ended – it wasn’t Williams on CCTV but a completely different man. Williams held up the photos to his face and said “I hope you don’t think all Black people look alike”, the officer replied that “the computer must have got it wrong.”

William’s problems began several months before his arrest when video clips and images of the robbery from the CCTV camera were run through face recognition software used by the Detroit Police Department. The system has access to the photos from everyone’s driving licence and can compare different faces until it finds a potential match and in this case it falsely identified Robert Williams. No system is ever perfect but studies have shown that face recognition technology is often better at correctly matching lighter skinned faces than darker skinned ones.

Check the signature

The way face recognition works is not actually by comparing pictures but by comparing data. When a picture of a face is added to the system, essentially lots of measurements are taken such as how far apart the eyes are, or what the shape of the nose is. This gives a signature for each face made up of all the numbers. That signature is added to the database. When looking for a match from say a CCTV image, the signature of the new image is first determined. Then algorithms look for the signature in the database “nearest” to the new one. How well it works depends on the particular features chosen, amongst many other things. If the features chosen are a poor way to distinguish particular groups of people then there will be lots of bad matches. But how does it decide what is “nearest” anyway given in essence it is just comparing groups of numbers? Many algorithms are based on machine learning. The system might be trained on lots of faces and told which match and which don’t, allowing it to look for patterns that are good ways to predict matches. If, however, it is trained on mainly light skinned faces it is likely to be bad at spotting matches for faces of other ethnic backgrounds. It may actually decide that “all black people look alike”.

Biasing the investigation

However, face recognition is only part of the story. A potential match is only a pointer towards someone who might be a suspect and it’s certainly not a ‘case closed’ conclusion – there’s still work to be done to check and confirm. But as Williams’ lawyer, Victoria Burton-Harris, pointed out once the computer had suggested Williams as a suspect that “framed and informed everything that officers did subsequently”. The man in the CCTV image wore a red baseball cap. It was for a team that Williams didn’t support (he’s not even a baseball fan) but no-one asked him about it. They also didn’t ask if he was in the area at the time (he wasn’t) or had an alibi (he did). Instead the investigators asked a security guard at the shop where the theft took place to look at some photos of possible suspects and he picked Williams from the line-up of images. Unfortunately the guard hadn’t been on duty on the day of the theft and had only seen the CCTV footage.

Robert Williams spent 30 hours in custody for a crime he didn’t commit after his face was mistakenly selected from a database. He was eventually released and the case dropped but his arrest is still on record along with his ‘mugshot’, fingerprints and a DNA sample. In other words he was wrongly picked from one database and has now been unfairly added to another. The experience for his whole family has been very traumatic and sadly his children’s first encounter with the police has been a distressing rather than a helpful one.

Remove the links

The American Civil Liberties Union (ACLU) has filed a lawsuit against the Detroit Police Department on Williams’ behalf for his wrongful arrest. It is not known how many people have been arrested because of face recognition technology but given how widely it is used it’s likely that others will have been misidentified too. The ACLU and Williams have asked for a public apology, for his police record to be cleared and for his images to be removed from any face recognition database. They have also asked that the Detroit Police Department stop using face recognition in their investigations. If Robert Williams had lived in New Hampshire he’d never have been arrested as there is a law there which prevents face recognition software from being linked with driving license databases.

In June 2020 Amazon, Microsoft and IBM denied the police any further access to their face recognition technology and IBM has also said that it will no longer work in this area because of concerns about racial profiling (targeting a person based on assumptions about their race instead of their individual actions) and violation of privacy and human rights. Campaigners are asking for a new law that protects people if this technology is used in future. But the ACLU and Robert Williams are asking for people to just stop using it – “I don’t want my daughters’ faces to be part of some government database. I don’t want cops showing up at their door because they were recorded at a protest the government didn’t like.”

Technology is only as good as the data and the algorithms it is based on. However, that isn’t the whole story. Even if very accurate, it is only as good as the way it is used. If as a society we want to protect people from bad things happening, perhaps some technologies should not be used at all.

Jo Brodie and Paul Curzon,Queen Mary University of London

Related Magazines …

Issue 29 – Diversity

An earlier version of this article was originally published on the Teaching London Computing website where you can find references and further reading.

Subscribe to be notified whenever we publish a new post to the CS4FN blog.

This blog is funded by EPSRC on research agreement EP/W033615/1.

	Musical Algorithms… on You’ll be Bach! – create…
	The art of animatron… on I’m (not) a little …
	The Decline and Fall… on Victorian volunteers needed…
	Gatsby Benchmarks fo… on Finding work experience, or a…
	BBC Sound Effects li… on AMPER: AI helping future you r…

Category: Bias

Collaborative community coding & curating

Equality, diversity and inclusion in the R Project

Further reading

Related careers

QMUL

Archived job adverts from elsewhere

The gender shades audit

The pilot parliaments benchmark

Testing times

More on …

Related Magazines …

Reclaim your name

More on …

Related Magazines …

Facing up to ALL faces

The problems of recognising faces

Check the signature

Biasing the investigation

Remove the links

More on

Related Magazines …