Equality, diversity and inclusion in the R Project: collaborative community coding & curating with Dr Heather Turner

You might not think of a programming language like Python or Scratch as being an ‘ecosystem’ but each language has its own community of people who create and improve its code, flush out the bugs, introduce new features, document any changes and write the ‘how to’ guides for new users.

R is one such programming language. It’s named after its two co-inventors (Ross Ihaka and Robert Gentleman) and is used by around two million people around the world. People working in all sorts of jobs and industries (for example finance, academic research, government, data journalists) use R to analyse their data. The software has useful tools to help people see patterns in their data and to make sense of that information.

It’s also open source which means that anyone can use it and help to improve it, a bit like Wikipedia where anyone can edit an article or write a new one. That’s generally a good thing because it means everyone can contribute but it can also bring problems. Imagine writing an essay about an event at your school and sharing it with your class. Then imagine your classmates adding paragraphs of their own about the event, or even about different events. Your essay could soon become rather messy and you’d need to re-order things, take bits out and make sure people hadn’t repeated something that someone had already said (but in a slightly different way).

When changes are made to software people also want to keep a note not just of the ‘words’ added (the code) but also to make a note of who added what and when. Keeping good records, also known as documentation, helps keep things tidy and gives the community confidence that the software is being properly looked after.

Code and documentation can easily become a bit chaotic when created by different people in the community so there needs to be a core group of people keeping things in order. Fortunately there is – the ‘R Core Team’, but these days its membership doesn’t really reflect the community of R users around the world. R was first used in universities, particularly by more privileged statistics professors from European countries and North America (the Global North), and so R’s development tended to be more in line with their academic interests. R needs input and ideas from a more diverse group of active developers and decision-makers, in academia and beyond to ensure that the voices of minoritised groups are included. Also the voices of younger people, particularly as many of the current core group are approaching retirement age.

Dr Heather Turner from the University of Warwick is helping to increase the diversity of those who develop and maintain the R programming language and she’s been given funding by the EPSRC* to work on this. Her project is a nice example of someone who is bringing together two different areas in her work. She is mixing software development (tech skills) with community management (people skills) to support a range of colleagues who use R and might want to contribute to developing it in future, but perhaps don’t feel confident to do so yet.

Development can involve things like fixing bugs, helping to improve the behaviour or efficiency of programs or translating error messages that currently appear on-screen in the English language into different languages. Heather and her colleagues are working with the R community to create a more welcoming environment for ‘newbies’ that encourages participation, particularly from people who are in the community but who are not currently represented or under-represented by the core group and she’s working collaboratively with other community organisations such as R-Ladies, LatinR and RainbowR. Another task she’s involved in is producing an easier-to-follow ‘How to develop R’ guide.

There are also people who work in universities but who aren’t academics (they don’t teach or do research but do other important jobs that help keep things running well) and some of them use R too and can contribute to its development. However their contributions have been less likely to get the proper recognition or career rewards compared with those made by academics, which is a little unfair. That’s largely because of the way the academic system is set up.

Generally it’s academics who apply for funding to do new research, they do the research and then publish papers in academic journals on the research that they’ve done and these publications are evidence of their work. But the important work that supporting staff do in maintaining the software isn’t classified as new research so doesn’t generally make it into the journals, so their contribution can get left out. They also don’t necessarily get the same career support or mentoring for their development work. This can make people feel a bit sidelined or discouraged.

Logo for the Society of Research Software Engineering

To try and fix this and to make things fairer the Society of Research Software Engineering was created to champion a new type of job in computing – the Research Software Engineer (RSE). These are people whose job is to develop and maintain (engineer) the software that is used by academic researchers (sometimes in R, sometimes in other languages). The society wants to raise awareness of the role and to build a community around it. You can find out what’s needed to become an RSE below.

Heather is in a great position to help here too, as she has a foot in each camp – she’s both an Academic and a Research Software Engineer. She’s helping to establish RSEs as an important role in universities while also expanding the diversity of people involved in developing R further, for its long-term sustainability.

Example job packs / adverts

Below are some examples of RSE jobs (these vacancies have now closed but you can read about what they were looking for and see if it might interest you in the future).

Note that these documents are written for quite a technical audience – the people who’d apply for the jobs will have studied computer science for many years and will be familiar with how computing skills can be applied to different subjects.

1. The Science and Technology Facilities Council (STFC) wanted four Research Software Engineers (who’d be working either in Warrington or Oxford) on a chemistry-related project (‘computational chemistry’ – “a branch of chemistry that uses computer simulation to assist in solving chemical problems”)

2. The University of Cambridge was looking for a Research Software Engineer to work in the area of climate science – “Computational modelling is at the core of climate science, where complex models of earth systems are a routine part of the scientific process, but this comes with challenges…”

3. University College London (UCL) wanted a Research Software Engineer to work in the area of neuroscience (studying how the brain works, in this case by analysing the data from scientists using advanced microscopy).

EPSRC supports this blog through research grant EP/W033615/1.

Hallucinating chatbots

Why can’t you trust what an AI says?

by Paul Curzon, Queen Mary University of London

postcards of cuba in a rack — Image by Victoria_Regen from Pixabay

Chatbots that can answer questions and write things for you are in the news at the moment. These Artificial Intelligence (AI) programs are very good now at writing about all sorts of things from composing songs and stories to answering exam questions. They write very convincingly in a human-like way. However, one of the things about them is that they often get things wrong. Apparently, they make “facts” up or as some have described it “hallucinate”. Why should a computer lie or hallucinate? What is going on? Writing postcards will help us see.

Write a postcard

We can get an idea of what is going on if we go back to one of the very first computer programs that generated writing. It was in the 1950s and written by Christopher Strachey a school teacher turned early programmer. He wrote a love letter writing program but we will look at a similar idea: a postcard writing program.

Postcards typically might have lots of similar sentences, like “Wish you were here” or “The weather is lovely”, “We went to the beach” or “I had my face painted with butterflies”. Another time you might write things like: The weather is beautiful”, “We went to the funfair” or “I had my face painted with rainbows”. Christopher Strachey’s idea was to write a program with template sentences that could be filled in by different words: “The weather is …”, “We went to the …”, “I had my face painted with …”. Then the program picks some sentence templates at random, and then picks words at random to go in their slots. In this way, applied to postcard writing it can write millions of unique postcards. It might generate something like the following, for example (where I’ve bolded the words it filled in):

Dear Gran,

I’m on holiday in Skegness. I’ve had a wonderful time. The weather is sunny, We went to the beach. I had my face painted with rainbows. I’ve eaten lots strawberry ice cream. Wish you were here!

Lots of love from Mo

but the next time you ask it to it will generate something completely different.

Do it yourself

You can do the same thing yourself. Write lots of sentences on strips of card, leaving gaps for words. Give each gap a number label and note whether it is an adjective (like ‘lovely’ or ‘beautiful’) or a noun (like ‘beach’ or ‘funfair’, ‘butterflies’ or ‘rainbows’). You could also have gaps for verbs or adverbs too. Now create separate piles of cards to fit in each gap. Write the number that labels the gap on one side and different possible words of the right kind for that gap on the other side of the cards. Then keep them in numbered piles.

To generate a postcard (the algorithm or steps for you to follow), shuffle the sentence strips and pick three or four at random. Put them on the table in front of you to spell out a message. Next, go to the numbered pile for each gap in turn, shuffle the cards in that pile and then take one at random. Place it in the gap to complete the sentence. Do this for each gap until you have generated a new postcard message. Add who it is to and from at the start and end. You have just followed the steps (the algorithm) that our simple AI program is following.

Making things up

When you write a postcard by following the steps of our AI algorithm, you create sentences for the postcard partly at random. It is not totally random though, because of the templates and because you chose words to write on cards for each pile that make sense there. The words and sentences are about things you could have done – they are possible – but that does not mean you did do them!

The AI makes things up that are untrue but sound convincing because even though it is choosing words at random, they are appropriate and it is fitting them into sentences about things that do happen on holiday. People talk of chatbots ‘hallucinating’ or ‘dreaming’ or ‘lying’ but actually, as here, they are always just making the whole thing up just as we are when following our postcard algorithm. They are just being a little more sophisticated in the way that they invent their reality!

Our simple way of generating postcards is far simpler than modern AIs, but it highlights some of the features of how AIs are built. There are two basic parts to our AI. The template sentences ensure that what is produced is grammatical. They provide a simple ‘language model‘: rules of how to create correct sentences in English that sound like a human would write. It doesn’t write like Yoda :

“Truly wonderful, the beach is.”

though it could with different templates.

The second part is the sets of cards that fit the gaps. They have to fit the holes left in the templates – only nouns in the noun gaps, adjectives in the adjectives gap, and also fit

Given a set of template sentences about what you might do on holiday, the cards provide data to train the AI to say appropriate things. The cards for the face paining noun slot need to be things that might be painted on your face. By providing different cards you would change the possible sentences. The more cards the more variety in the sentences it writes.

AIs also have a language model, the rules of the language and which words go sensibly in which places in a sentence. However, they also are trained on data that gives the possibilities of what is actually written. Rather than a person writing templates and thinking up words it is based on training data such as social media posts or other writing on the Internet and what is being learnt from this data is the likelihood of what words come next, rather than just filling in holes in a template. The language model used by AIs is also actually just based on the likelihood of words appearing in sentences (not actual grammar rules).

What’s the chances of that?

So, the chatbots are based on the likelihood of words appearing and that is based on statistics. What do we mean by that? We can add a simple version of it to our Postcard AI but first we would need to collect data. How often is each face paint design chosen at seaside resorts? How often do people go to funfairs when on holiday. We need statistics about these things.

As it stands any word we add to the stack of cards is just as likely to be used. If we add the card maggots to the face painting pile (perhaps because the face painter does gruesome designs at Halloween) then the chatbot could write

“I had my face painted with maggots”.

and that is just as likely as it writing

“I had my face painted with butterflies”.

If the word maggots is not written on a card it will never write it. Either it is possible or it isn’t. We could make the chatbot write things that are more realistic, however, by adding more cards of words that are about things that are more popular. So, if in every 100 people having their face painted, almost a third, 30 people choose to have butterflies painted on their face, then we create 30 cards out of 100 in the pack with the word BUTTERFLY on (instead of just 1 card). If 5 in a 100 people choose the rainbow pattern then we add five RAINBOW cards, and so on. Perhaps we would still have one maggot card as every so often someone who likes grossing people out picks it even on holiday. Then, over all the many postcards written this way by our algorithm, the claims will match statistically the reality of what humans would write overall if they did it themselves.

As a result, when you draw a card for a sentence you are now more likely to get a sentence that is true for you. However, it is still more likely to be wrong about you personally than right (you may have had your face painted with butterflies but 70 of the 100 cards still say something else). It is still being chosen by chance and it is only the overall statistics for all people who have their face painted that matches reality not the individual case of what is likely true for you.

Make it personal

How could we make it more likely to be right about you? You need to personalise it. Collect and give it (ie train it on) more information about you personally. Perhaps you usually have a daisy painted on your face because you like daisies (you personally choose a daisy pattern 70% of the time). Sometimes you have rainbows (20% of the time). You might then on a whim choose each of 10 other designs including the butterfly maybe 1 in a hundred times. So you make a pile of 70 DAISY cards, 20 RAINBOW cards and 1 card for each of the other designs, Now, its choices, statistically at least, will match yours. You have trained it about yourself, so it now has a model of you.

You can similarly teach it more about yourself generally, so your likely activities, by adding more cards about the things you enjoy – if you usually choose chocolate or vanilla ice cream then add lots of cards for CHOCOLATE and lots for VANILLA, and so on. The more cards the postcard generator has of a word, the more likely it is to use that word. By giving it more information about yourself, it is more likely to be able to get things about you right. However, it is of course still making it up so, while it is being realistic, on any given occasion it may or may not match reality that time.

Perfect personalisation

You could go a step further and train it on what you actually did do while on this holiday, so that the only cards in the packs are the ones you did actually do on this holiday. (You ate hotdogs and ice cream and chips and … so there are cards for HOTDOG, ICE CREAM, CHIPS …). You had one vanilla ice cream, two chocolate and one strawberry so have that number of each ice cream card. If it knows everything about you then it will be able to write a postcard that is true! That is why companies behind AIs want to collect every detail of your life. The more they know about you the more they get things right about you and so predict what you will do in future too.

Probabilities from the Internet

The modern chatbots work by choosing words at random based on how likely they are in a similar way to our personalised postcard writer. They pick the most likely words to write next based on probabilities of those words coming next in the data they have been trained on. Their training data is often conversations from the Internet. If the word is most likely to come next in all that training data, then the chatbot is more likely to use that word next. However, that doesn’t make the sentence it comes up with definitely true any more than with our postcard AI.

You can personalise the modern AIs too, by giving them more accurate information about yourself and then they are more likely to get what they write about you right. There is still always a chance of them picking the wrong words, if it is there as a possibility though, as they are still just choosing to some extent at random.

Never trust a chatbot

Artificial Intelligences that generate writing do not hallucinate just some of the time. They hallucinate all of the time, just with a big probability of getting it right. They make everything up. When they get things right it is just because the statistics of the data they were trained on made those words the most likely ones to be picked to follow what went before. Just as the Internet is full of false things, an Artificial Intelligence can get things wrong too.

If you use them for anything that matters, always double check that they are telling you the truth.

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

Protecting your fridge

by Jo Brodie and Paul Curzon, Queen Mary University of London

Ever been spammed by your fridge? It has happened, but Queen Mary’s Gokop Goteng and Hadeel Alrubayyi aim to make it less likely…

Gokop has a longstanding interest in improving computing networks and did his PhD on cloud computing (at the time known as grid computing), exploring how computing could be treated more like gas and electricity utilities where you only pay for what you use. His current research is about improving the safety and efficiency of the cloud in handling the vast amounts of data, or ‘Big Data’, used in providing Internet services. Recently he has turned his attention to the Internet of Things.

It is a network of connected devices, some of which you might have in your home or school, such as smart fridges, baby monitors, door locks, lighting and heating that can be switched on and off with a smartphone. These devices contain a small computer that can receive and send data when connected to the Internet, which is how your smartphone controls them. However, it brings new problems: any device that’s connected to the Internet has the potential to be hacked, which can be very harmful. For example, in 2013 a domestic fridge was hacked and included in a ‘botnet’ of devices which sent thousands of spam emails before it was shut down (can you imagine getting spam email from your fridge?!)

A domestic fridge was hacked
and included in a ‘botnet’ of devices
which sent thousands of spam emails
before it was shut down.

The computers in these devices don’t usually have much processing power: they’re smart, but not that smart. This is perfectly fine for normal use, but to run software to keep out hackers, while getting on with the actual job they are supposed to be doing, like running a fridge, it becomes a problem. It’s important to prevent devices from being infected with malware (bad programs that hackers use to e.g., take over a computer) and work done by Gokop and others has helped develop better malwaredetecting security algorithms which take account of the smaller processing capacity of these devices.

One approach he has been exploring with PhD student Hadeel Alrubayyi is to draw inspiration from the human immune system: building artificial immune systems to detect malware. Your immune system is very versatile and able to quickly defend you against new bugs that you haven’t encountered before. It protects you from new illnesses, not just illnesses you have previously fought off. How? Using special blood cells, such as T-Cells, which are able to detect and attack rogue cells invading the body. They can spot patterns that tell the difference between the person’s own healthy cells and rogue or foreign cells. Hadeel and Gokop have shown that applying similar techniques to Internet of Things software can outperform other techniques for spotting new malware, detecting more problems while needing less computing resources.

Gokop is also using his skills in cloud computing and data science to enhance student employability and explore how Queen Mary can be a better place for everyone to do well. Whether a person, organisation or smart fridge Gokop aims to help you reach your full potential!

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

The gender shades audit

by Jo Brodie, Queen Mary University of London

Face recognition technology is used widely, such as at passport controls and by police forces. What if it isn’t as good at recognising faces as it has been claimed to be? Joy Buolamwini and Timnit Gebru tested three different commercial systems and found that they were much more likely to wrongly classify darker skinned female faces compared to lighter or darker skinned male faces. The systems were not reliable.

Different skin tone cosmetics — *Image by Stefan Schweihofer from Pixabay*

Face recognition systems are trained to detect, classify and even recognise faces based on a bank of photographs of people. Joy and Timnit examined two banks of images used to train the systems and found that around 80 percent of the photos used were of people with lighter coloured skin. If the photographs aren’t fairly balanced in terms of having a range of people of different gender and ethnicity then the resulting technologies will inherit that bias too. The systems examined were being trained to recognise light skinned people.

The pilot parliaments benchmark

Joy and Timnit decided to create their own set of images and wanted to ensure that these covered a wide range of skin tones and had an equal mix of men and women (‘gender parity’). They did this using photographs of members of parliaments around the world which are known to have a reasonably equal mix of men and women. They selected parliaments both from countries with mainly darker skinned people (Rwanda, Senegal and South Africa) and from countries with mainly lighter skinned people (Iceland, Finland and Sweden).

They labelled all the photos according to gender (they had to make some assumptions based on name and appearance if pronouns weren’t available) and used a special scale called the Fitzpatrick scale to classify skin tones (see Different Shades below). The result was a set of photographs labelled as dark male, dark female, light male, light female, with a roughly equal mix across all four categories: this time, 53 per cent of the people were light skinned (male and female).

Testing times

Joy and Timnit tested the three commercial face recognition systems against their new database of photographs (a fair test of a wide range of faces that a recognition system might come across) and this is where they found that the systems were less able to correctly identify particular groups of people. The systems were very good at spotting lighter skinned men, and darker skinned men, but were less able to correctly identify darker skinned women, and women overall. The tools, trained on sets of data that had a bias built into them, inherited those biases and this affected how well they worked.

As a result of Joy and Timnit’s research there is now much more recognition of the problem, and what this might mean for the ways in which face recognition technology is used. There is some good news, though. The three companies made changes to improve their systems and several US cities have already banned the use of this technology in criminal investigations, with more likely to follow. People worldwide are more aware of the limitations of face recognition programs and the harms to which they may be (perhaps unintentionally) put, with calls for better regulation.

Different Shades
The Fitzpatrick skin tone scale is used by skin specialists to classify how someone’s skin responds to ultraviolet light. There are six points on the scale with 1 being the lightest skin and 6 being the darkest. People whose skin tone has a lower Fitzpatrick score are more likely to burn in the sun and are at greater risk of skin cancer. People with higher scores have darker skin which is less likely to burn and have a lower risk of skin cancer. A variation of the Fitzpatrick scale, with five points, is used to create the skin tone emojis that you’ll find on most messaging apps in addition to the ‘default’ yellow.

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

Mary Clem: getting it right

by Paul Curzon, Queen Mary University of London

Mary Clem was a pioneer of dependable computing long before the first computers existed. She was a computer herself, but became more like a programmer.

A tick on a target of red concentric zeros — Image by Paul Curzon

Back before there were computers there were human computers: people who did the calculations that machines now do. Victorian inventor, Charles Babbage, worked as one. It was the inspiration for him to try to build a steam-powered computer. Often, however, it was women who worked as human computers especially in the first half of the 20th century. One was Mary Clem in the 1930s. She worked for Iowa State University’s statistical lab. Despite having no mathematical training and finding maths difficult at school, she found the work fascinating and rose to become the Chief Statistical Clerk. Along the way she devised a simple way to make sure her team didn’t make mistakes.

The start of stats

Big Data, the idea of processing lots of data to turn that data into useful information, is all the rage now, but its origins lie at the start of the 20th century, driven by human computers using early calculating machines. The 1920s marked the birth of statistics as a practical mathematical science. A key idea was that of calculating whether there were correlations between different data sets such as rainfall and crop growth, or holding agricultural fairs and improved farm output. Correlation is the the first step to working out what causes what. it allows scientists to make progress in working out how the world works, and that can then be turned into improved profits by business, or into positive change by governments. It became big business between the wars, with lots of work for statistical labs.

Calculations and cards

Originally, in and before the 19th century, human computers did all the calculations by hand. Then simple calculating machines were invented, so could be used by the human computers to do the basic calculations needed. In 1890 Herman Hollerith invented his Tabulator machine (his company later became computing powerhouse, IBM). The Tabulator machine was originally just a counting machine created for the US census, though later versions could do arithmetic too. The human computers started to use them in their work. The tabulator worked using punch cards, cards that held data in patterns of holes punched in to them. A card representing a person in the census might have a hole punched in one place if they were male, and in a different place if they were female. Then you could count the total number of any property of a person by counting the appropriate holes.

Mary was being more than a computer,
and becoming more like a programmer

Mary’s job ultimately didn’t just involve doing calculations but also involved preparing punch cards for input into the machines (so representing data as different holes on a card). She also had to develop the formulae needed for doing calculations about different tasks. Essentially she was creating simple algorithms for the human computers using the machines to follow, including preparing their input. Her work was therefore moving closer to that of a computer operator and then programmer’s job.

Zero check

She was also responsible for checking calculations to make sure mistakes were not being made in the calculations. If the calculations were wrong the results were worse than useless. Human computers could easily make mistakes in calculations, but even with machines doing calculations it was also possible for the formulae to be wrong or mistakes to be made preparing the punch cards. Today we call this kind of checking of the correctness of programs verification and validation. Since accuracy mattered, this part of he job also mattered. Even today professional programming teams spend far more time checking their code and testing it than writing it.

Mary took the role of checking for mistakes very seriously, and like any modern computational thinker, started to work out better ways of doing it that was more likely to catch mistakes. She was a pioneer in the area of dependable computing. What she came up with was what she called the Zero Check. She realised that the best way to check for mistakes was to do more calculations. For the calculations she was responsible for, she noticed that it was possible to devise an extra calculation, whereby if the other answers (the ones actually needed) have been correctly calculated then the answer to this new calculation is 0. This meant, instead of checking lots of individual calculations with different answers (which is slow and in itself error prone), she could just do this extra calculation. Then, if the answer was not zero she had found a mistake.

A trivial version of this general idea when you are doing a single calculation is to just do it a second time, but in a different way. Rather than checking manually if answers are the same, though, if you have a computer it can subtract the two answers. If there are no mistakes, the answer to this extra check calculation should be 0. All you have to do is to look for zero answers to the extra subtractions. If you are checking lots of answers then, spotting zeros amongst non-zeros is easier for a human than looking for two numbers being the same.

Defensive Programming

This idea of doing extra calculations to help detect errors is a part of defensive programming. Programmers add in extra checking code or “assertions” to their programs to check that values calculated at different points in the program meet expected properties automatically. If they don’t then the program itself can do something about it (issue a warning, or apply a recovery procedure, for example).

A similar idea is also used now to catch errors whenever data is sent over networks. An extra calculation is done on the 1s and 0s being sent and the answer is added on to the end of the message. When the data is received a similar calculation is performed with the answer indicating if the data has been corrupted in transmission.

A pioneering human computer

Mary Clem was a pioneer as a human computer, realising there could be more to the job than just doing computations. She realised that what mattered was that those computations were correct. Charles Babbages answer to the problem was to try to build a computing machine. Mary’s was to think about how to validate the computation done (whether by a human or a machine).

Related Magazines …

EPSRC supports this blog through research grant EP/W033615/1.

Black in Data

by Paul Curzon, Queen Mary University of London

Careers do not have to be decided on from day one. You can end up in a good place in a roundabout way. That is what happened to Sadiqah Musa, and now she is helping make the paths easier for others to follow.

Lightbulb in a black circle surrounded by circles of colour representing data — *Image based on ones by Gerd Altmann from Pixabay*

Sadiqah went to university at QMUL expecting to become an environmental scientist. Her first job was as a geophysicist analysing seismic data. It was a job she thought she loved and would do forever. Unfortunately, she wasn’t happy, not least about the lack of job security. It was all about data though which was a part she did still enjoy, and the computer science job of Data Analyst was now a sought-after role. She retrained and started on a whole new exciting career. She currently works at the Guardian Newspapers where she met Devina Nembhard … who was the first Black woman she had ever worked with throughout her career.

Together they decided that was just wrong, but also set out to change it. They created “Black in Data” to support people of colour in the industry, mentoring them, training them in the computer science skills they might be short of: like programming and databases; helping them thrive. More than that they also confront industry to try and take down the barriers that block diversity in the first place.

Related Magazines …

Issue 29 – Diversity

EPSRC supports this blog through research grant EP/W033615/1.

Florence Nightingale: rebel with a cause

Florence Nightingale, the most famous female Victorian after Queen Victoria, is known for her commitment to nursing, especially in the Crimean War. She rebelled against convention to become a nurse at a time when nursing was seen as a lowly job, not suitable for ‘ladies’. She broke convention in another less well-known, but much more significant way too. She was a mathematician – the first woman to be elected a member of the Royal Statistical Society. She also pioneered the use of pictures to present the statistical data that she collected about causes of war deaths and issues of sanitation and health. What she did was an early version of the current Big Data revolution in computer science.

Soldiers were dying in vast numbers in the field hospital she worked in, not directly from their original wounds but from the poor conditions. But how do you persuade people of something that (at least then) is so unintuitive? Even she originally got the cause of the deaths wrong, thinking they were due to poor nutrition, rather than the hospital conditions as her statistics later showed. Politicians, the people with power to take action, were incapable of understanding statistical reports full of numbers then (and probably now). She needed a way to present the information so that the facts would jump out to anyone. Only then could she turn her numbers into life-saving action. Her solution was to use pictures, often presenting her statistics as books of pie charts and circular histograms.

Whilst she didn’t invent them, Florence Nightingale certainly was responsible for demonstrating how effective they could be in promoting change, and so subsequently popularising their use. She undoubtedly saved more lives with her statistics than from her solitary rounds at night by lamplight.

Big Data is now a big thing. It is the idea that if you collect lots of data about something (which computers now make easy) then you (and computers themselves) can look for patterns and so gain knowledge and, for people, ultimately wisdom from it. Florence Nightingale certainly did that. Data visualisation is now an important area of computer science. As computers allow us to collect and store ever more data, it becomes harder and harder for people to make any sense of it all – to pick out the important nuggets of information that matter. Raw numbers are little use if you can’t actually turn them into knowledge, or better still wisdom. Machine Learning programs can number crunch the data and make decisions from it, but its hard to know where the decisions came from. That often matters if we are to be persuaded. For humans the right kind of picture for the right kind of data can do just that as Florence Nightingale showed.

‘The Lady of the Lamp’: more than a nurse, but also a remarkable statistician and pioneer of a field of computer science…a Lady who made a difference by rebelling with a cause.

Related Magazines …

EPSRC supports this blog through research grant EP/W033615/1.

	Can you trust a smil… on Designing robots that car…
	Can you trust a smil… on Blade: the emotional comp…
	Can you trust a smil… on How to get a head in robotics…
	Can you trust a smil… on Computers that read emoti…
	Find your own time z… on Love your data

Category: Big Data

Equality, diversity and inclusion in the R Project: collaborative community coding & curating with Dr Heather Turner

Further reading

Example job packs / adverts

Hallucinating chatbots

Why can’t you trust what an AI says?

Write a postcard

Do it yourself

Making things up

What’s the chances of that?

Make it personal

Perfect personalisation

Probabilities from the Internet

Never trust a chatbot

More on …

Related Magazines …

Protecting your fridge

More on …

Related Magazines …

The gender shades audit

The pilot parliaments benchmark

Testing times

More on …

Related Magazines …

Mary Clem: getting it right

The start of stats

Calculations and cards

Zero check

Defensive Programming

A pioneering human computer

More on …

Related Magazines …

Black in Data

More on …

Related Magazines …

Florence Nightingale: rebel with a cause

More on …

Related Magazines …