Tony Stockman: Sonification

Two different coloured wave patterns superimposed on one anohter on a black background with random dots like a starscape.
Image by Gerd Altmann from Pixabay

Tony Stockman, who was blind from birth, was a Senior Lecturer at QMUL until his retirement. A leading academic in the field of sonification of data, turning data into sound, he eventually became the President of the “International Community for Auditory Display”: the community of researchers working in this area.

Traditionally, we put a lot of effort into finding the best ways to visualise data so that people can easily see the patterns in it. This is an idea that Florence Nightingale, of lady of the lamp fame, pioneered with Crimean War data about why soldiers were dying. Data visualisation is considered so important it is taught in primary schools where we all learn about pie charts and histograms and the like. You can make a career out of data visualisation, working in the media creating visualisations for news programmes and newspapers, for example, and finding a good visualisation is massively important working as a researcher to help people understand your results. In Big Data a good visualisation can help you gain new insights into what is really happening in your data. Those who can come up with good visualisations can become stars, because they can make such a difference (like Florence Nightingale, in fact)

Many people of course, Tony included cannot see, or are partially sighted, so visualisation is not much help! Tony therefore worked on sonifying data instead, exploring how you can map data onto sounds rather than imagery in a way that does the same thing.: makes the patterns obvious and understandable.

His work in this area started with his PhD where he was exploring how breathing affects changes in heart rate. He first needed a way to both check for noise in the recording and then also a way to present the results so that he could analyse and so understand them. So he invented a simple way to turn data into sound using for example frequencies in the data to be sound frequencies. By listening he could find places in his data where interesting things were happening and then investigate the actual numbers. He did this out of necessity just to make it possible to do research but decades later discovered there was by then a whole research community by then working on uses of and good ways to do sonification,

He went on to explore how sonification could be used to give overviews of data for both sighted and non-sighted people. We are very good at spotting patterns in sound – that is all music is after all – and abnormalities from a pattern in sound can stand out even more than when visualised.

Another area of his sonification research involved developing auditory interfaces, for example to allow people to hear diagrams. One of the most famous, successful data visualisations was the London Tube Map designed by Harry Beck who is now famous as a result because of the way that it made the tube map so easy to understand using abstract nodes and lines that ignored distances. Tony’s team explored ways to present similar node and line diagrams, what computer scientist’s call graphs. After all it is all well and good having screen readers to read text but its not a lot of good if all it tells you reading the ALT text that you have the Tube Map in front of you. And this kind of graph is used in all sorts of every day situations but are especially important if you want to get around on public transport.

There is still a lot more to be done before media that involves imagery as well as text is fully accessible, but Tony showed that it is definitely possible to do better, He also showed throughout his career that being blind did not have to hold him back from being an outstanding computer scientists as well as a leading researcher, even if he did have to innovate himself from the start to make it possible.

More on …


Related Magazine …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Crystal ball coupons – what your data might be giving away

Big companies know far more about you than you think. You have very little privacy from their all-seeing algorithms. They may even have worked out some very, very personal things about you, that even your parents don’t know…

An outraged father in Minneapolis stormed into a supermarket chain complaining that his school-aged daughter was being sent coupons for baby clothes. The shop manager apologised … but later they found there was no mistake in the tiny tot offers. The teenager was expecting a baby but had not told her father. Her situation was revealed not by a crystal ball but by an algorithm. The shop was using Big Data processing algorithms that noticed patterns in her shopping that they had linked to “pregnant”. They had even worked out her likely delivery date. Her buying habits had triggered targeted marketing.

Algorithms linked her shopping patterns to “pregnant”

When we use a loyalty card or an online account our sales activity is recorded. This data is added to a big database, with our details, the time, date, location and products bought (or browsed). It is then analysed. Patterns in behaviour can be tracked, our habits, likes, dislikes and even changes in our personal situation deduced, based on those patterns. Sometimes this seems quite useful, other times a bit annoying, it can surprise us, and it can be wrong.

This kind of computing is not just used to sell products, it is also used to detect fraud and to predict where the next outbreak of flu will happen. Our banking behaviour is tracked to flag suspicious transactions and help stop theft and money laundering. When we search for ‘high temperature’ our activity might be added to the data used to predict flu trends. However, the models are not always right as there can be a lot of ‘noise’ in the data. Maybe we bought baby clothes as a present for our aunt, and were googling temperatures because we wanted to go somewhere hot for our holiday.

Whether the predictions are spot on or not is perhaps not the most important thing. Maybe we should be considering whether we want our data saved, mined and used in these ways. A predictive pregnancy algorithm seems like an invasion of privacy, even like spying, especially if we don’t know about it. Predictive analytics is big; big data is really big and big business wants our data to make big profits. Think before you click!

Jane Waite, Queen Mary University of London (now at Raspberry Pi)

More on …


Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

If you go down to the woods today…

A girl walking through a meadow full of flowers within woods
Image by Jill Wellington from Pixabay

In the 2025 RHS Chelsea Flower Show there was one garden that was about technology as well as plants: The Avanade Intelligent Garden  exploring how AI might be used to support plants. Each of the trees contained probes that sensed and recorded data about them which could then be monitored through an App. This takes pioneering research from over two decades ago a step further, incorporating AI into the picture and making it mainstream. Back then a team led by Yvonne Rogers built an ambient wood aiming to add excitement to a walk in the woods...

Mark Weiser had a dream of ‘Calm Computing’ and while computing sometimes seems ever more frustrating to use, the ideas led to lots of exciting research that saw at least some computers disappearing into the background. His vision was driven by a desire to remove the frustration of using computers but also the realization that the most profound technologies are the ones that you just don’t notice. He wanted technology to actively remove frustrations from everyday life, not just the ones caused by computers. He wrote of wanting to “make using a computer as refreshing as taking a walk in the woods.”

Not calm, but engaging and exciting!

No one argues that computers should be frustrating to use, but Yvonne Rogers, then of the Open University, had a different idea of what the new vision could be. Not calm. Anything but calm in fact (apart from frustrating of course). Not calm, but engaging and exciting!

Her vision of Weiser’s tranquil woods was not relaxing but provocative and playful. To prove the point her team turned some real woods in Sussex into an ‘Ambient Wood’. The ambient wood was an enhanced wood. When you entered it you took probes with you, that you could point and poke with. They allowed you to take readings of different kinds in easy ways. Time hopping ‘Periscopes’ placed around the woods allowed you to see those patches of woodland at other times of the year. There was also a special woodland den where you could then see the bigger picture of the woods as all your readings were pulled together using computer visualisations.

Not only was the Ambient Wood technology visible and in your face but it made the invisible side of the wood visible in a way that provoked questions about the wildlife. You noticed more. You saw more. You thought more. A walk in the woods was no longer a passive experience but an active, playful one. Woods became the exciting places of childhood stories again but now with even more things to explore.

The idea behind the Ambient Wood, and similar ideas like Bristol’s Savannah project, where playing fields are turned into African Savannah, was to revisit the original idea of computers but in a new context. Computers started as tools, and tools don’t disappear, they extend our abilities. Tools originally extended our physical abilities – a hammer allows us to hit things harder, a pulley to lift heavier things. They make us more effective and allow us to do things a mere human couldn’t do alone. Computer technology can do a similar thing but for the human intellect…if we design them well.

“The most important thing the participants gained was a sense of wonderment at finding out all sorts of things and making connections through discovering aspects of the physical woodland (e.g., squirrel’s droppings, blackberries, thistles)”

– Yvonne Rogers

The Weiser dream was that technology invisibly watches the world and removes the obstacles in the way before you even notice them. It’s a little like the way servants to the aristocracy were expected to always have everything just right but at the same time were not to be noticed by those they served. The way this is achieved is to have technology constantly monitoring, understanding what is going on and how it might affect us and then calmly fixing things. The problem at the time was that it needs really ‘smart’ technology – a high level of Artificial Intelligence to achieve and that proved more difficult than anyone imagined (though perhaps we are now much closer than we were). Our behaviour and desires, however, are full of subtlety and much harder to read than was imagined. Even a super-intellect would probably keep getting it wrong.

There are also ethical problems. If we do ever achieve the dream of total calm we might not like it. It is very easy to be gung ho with technology and not realize the consequences. Calm computing needs monitors – the computer measuring everything it can so it has as much information as possible to make decisions from (see Big Sister is Watching You).

A classic example of how this can lead to people rejecting technology intended to help is in a project to make a ‘smart’ residential home for the elderly. The idea was that by wiring up the house to track the residents and monitor them the nurses would be able to provide much better care, and relatives be able to see how things were going. The place was filled with monitors. For example, sensors in the beds measured resident’s weight while they slept. Each night the occupants weight could invisibly be taken and the nurses alerted of worrying weight loss over time. The smart beds could also detect tossing and turning so someone having bad nights could be helped. A smart house could use similar technology to help you or I have a good nights sleep and help us diet.

The problem was the beds could tell other things too: things that the occupants preferred to keep to themselves. Nocturnal visitors also showed up in the records. That’s the problem if technology looks after us every second of the day, the records may give away to others far more than we are happy with.

Yvonne’s vision was different. It was not that the computers try to second-guess everything but instead extend our abilities. It is quite easy for new technology to lead to our being poorer intellectually than we were. Calculators are a good example. Yes, we can do more complex sums quickly now, but at the same time without a calculator many people can’t do the sums at all. Our abilities have both improved and been damaged at the same time. Generative AI seems to be currently heading the same way, What the probes do, instead, is extend our abilities not reduce them: allowing us to see the woods in a new way, but to use the information however we wish. The probes encourage imagination.

The alternative to the smart house (or calculator) that pampers allowing your brain to stay in neutral, or the residential home that monitors you for the sake of the nurses and your relatives, is one where the sensors are working for you. Where you are the one the bed reports to helping you to then make decisions about your health, or where the monitors you wear are (only) part of a game that you play because its fun.

What next? Yvonne suggested the same ideas could be used to help learning and exploration in other ways, understanding our bodies: “I’d like to see kids discover new ways of probing their bodies to find out what makes them tick.”

So if Yvonne’s vision is ultimately the way things turn out, you won’t be heading for a soporific future while the computer deals with real life for you. Instead it will be a future where the computers are sparking your imagination, challenging you to think, filling you with delight…and where the woods come alive again just as they do in the storybooks (and in the intelligent garden).

Paul Curzon, Queen Mary University of London

(adapted from the archive)

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

A Sea Hero Quest to understand our navigation skills

A lego minifigure hiking with map and compass
Image by Andrew Martin from Pixabay

Video games can be a very successful way to do citizen science, getting ordinary people involved in research. Sea Hero Quest is an extremely successful example. It involves a boy setting out on a sea quest to recover his father’s memories, lost when he suffers from dementia. The hundreds of thousands of people joining the quest have helped researchers better understand our ability to navigate.

The Sea Hero Quest project was led by Deutsche Telecom, working with both universities and Alzheimer’s Research UK. The first mass-market game of its kind, it has allowed researchers to explore navigation and related cognitive abilities of people throughout their lives. The game has 75 levels, each with different kinds of task in different environments, and has been played by millions of people around the world for over a 100 years of combined game time. The amount of data collected is vast and would have taken researchers centuries to collect by traditional means, if possible at all.

For example, an international team including researchers from UCL, the University of Lyon and the University of Münster used the game to explore how the place people grew up affects their ability to navigate. As well as more general data from around 400,000 people across the world, they also used the data specifically from people who had completed all levels of the game. This amounted to around ten thousand adults of all ages.

They found that people are best at navigating in situations similar to where they grew up (where they lived at the time of playing the game had no effect). So, for example, people who grew up in an American grid-like city such as Chicago, were better at navigating in grid-based levels. Those who grew up in cities such as Prague in Europe, where the streets are more wiggly and chaotically laid out, were better at levels needing similar navigation skills. Throughout, the researchers found that those that grew up in the countryside were better at navigating overall as well as specifically in more unstructured environments.

Sea Hero Quest shows that games designers, if they can create fun but serious games, can help us all help researchers…It is often said that playing video games is bad for growing brains but it also shows that the way we design our cities affects the way we think and can be bad for our brains!

More on …

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Herman Hollerith: from punch cards to a special company

Herman Hollerith
Herman Hollerith (Image from wikimedia, Public Domain)

Herman Hollerith, the son of immigrants, struggled early on at school and then later in bookkeeping at college but it didn’t stop him inventing machines that used punch cards to store data. He founded a company to make and sell his machines. It turned into the company now called IBM, which of course helped propel us into the computer age.

Hollerith had worked as a census clerk for a while, and the experience led to his innovation. The United States has been running a national census every 10 years since the American Revolution, aiming to record the details of every person, for tax and national planning purposes. It is not just a count but has recorded information about each person such as male/female, married or not, ethnicity, whether they can read, disabilities, and so on.

As the population expanded it of course became harder to do. It was also made harder as more data about each person was being collected over time. For the 1890 census a competition was held to try and find better ways to compile the data collected. Herman Holerith won it with his punch card based machine. It could process data up to twice as fast as his competitors and with his system data could be prepared 10 times faster.

To use the machine, the census information for each person was recorded by punching holes in special cards at specific positions. It was a binary system with a hole essentially meaning the specific feature was present (eg they were married) and no hole meaning it wasn’t (eg they were single). Holes against numbers could also mean one of several options.

Hollerith punched card from wikimedia
Hollerith punched card (Image from wikimedia, Public Domain)

The machine could read the holes because they allowed a wire to make an electrical connection to a pool of mercury below so the holes just acted as switches. Data could therefore be counted automatically, with each hole adding one to a different counter. It was the first time that a system of machine-readable data had been used and of course binary went on to be the way all computers store information. In processing the census his machines counted the data on around 100 million cards (an early example of Big Data processing!). This contributed to reducing the time it took to compile the data from the whole country by two years. It also saved about $5 million

Holerith patented the machine and was also awarded a PhD for his work on it. He set up a company to sell it called the Tabulating Machine Company. Over time it merged with other companies until eventually in 1924 the resulting company changed its name to International Business Machines or is it is now known, IBM. it is of course one of the most important companies driving the computer age, building early mainframe computers the size of rooms that revolutionised business computing, but later also responsible for the personal computer, leading to the idea that everyone could own a computer.

Not a bad entrepreneurship legacy for someone who early on at school apparently struggled with, and certainly hated, spelling – he jumped out of a window at school to avoid doing it. He also did badly at bookkeeping in college. He was undeterred by what he was poor at though and focussed on what he was good at, He was hard working and developed his idea for a mechanical tabulating machine for 8 years before his first machine went to work. Patience and determination was certainly a strength that paid off for him!

More on …

Magazines …

Our Books …


Subscribe to be notified whenever we publish a new post to the CS4FN blog.



EPSRC supports this blog through research grant EP/W033615/1. 

Sonifying zebrafish biology

by the CS4FN team (from the archive)

Zebrafish with the appearance of red white and blue stripes
Image by Petr Kuznetsov from Pixabay

Biologists often analyse data about the cell biology of living animals to understand their development. A large part of this involves looking for patterns in the data to use to refine their understanding of what is going on. The trouble is that patterns can be hard to spot when hidden in the vast amount of data that is typically collected. Humans are very good at spotting patterns in sound though – after all that is all music is. So why not turn the data into sound to find these biological patterns?

In hospitals, the heartbeats of critically ill patients are monitored by turning the data from heart monitors into sounds. Under the sea, in (perhaps yellow) submarines, “golden ear” mariners use their listening talent to help with navigation and detect potential danger for fish and the submarine. They do this by listening to the soundscapes produced by sonar built up from echoes from the objects round about. This way of using sounds to represent other kinds of data is called ‘sonification’. Perhaps similar ideas can help to find patterns in biological data? An interdisciplinary team of researchers from Queen Mary including biologist Rachel Ashworth, Audio experts Mathieu Barthet and Katy Noland and computer scientist William Marsh tried the idea out on the zebrafish. Why zebrafish? Well, they are used lots for the study of the development of vertebrates (animals with backbones). In fact it is what is called a ‘model organism’: a creature that lots of people do research on as a way of building a really detailed understanding of its biology. The hope is that what you learn about zebrafish will help you understand the biology of other vertebrates too. Zebrafish make a good model organism because they mature very quickly. Their embryos are also transparent. That is really useful when doing experiments because it means you can directly see what is going on inside their bodies using special kinds of microscopes.

The particular aspect of zebrafish biology the Queen Mary team has been investigating is the way calcium signals are used by the body. Changes in the concentration of calcium ions are important as they are used inside a cell to regulate its behaviour. These changes can be tracked in zebrafish by injecting fluorescent dyes into cells. Because the zebrafish embryos are transparent whatever has been fluorescently labelled can then be observed.

Calcium ions are used inside a cell to regulate its behaviour

The Queen Mary team developed software that detects calcium changes by automatically spotting the peaks of activity over time. They relied on a technique that is used in music signal processing to detect the start of notes in musical sequences. Finding the peaks in a zebrafish calcium signal and the notes from the Beatles’ Day Tripper riff may seem to be light years apart, but from a signal processing point of view, the problems are similar. Both involve detecting sudden burst of energy in the signals. Once the positions of the calcium peaks have been found they can then be monitored by sonifying the data.

What the team found using this approach is that the calcium activity in the muscle cells of zebrafish varies a lot between early developmental stages of the embryo and the late ones. You can have a go at hearing the difference yourself – listen to the sonified versions of the data.

More on …

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

EPSRC supports this blog through research grant EP/W033615/1,

Solving Railway Timetabling Problems with Data Visualisation

by Daniel Gill, Queen Mary University of London

Steam train on a bridge, looking back down the side of the carriages
Image by Laurent from Pixabay

Train timetables are complex. When designing a timetable for railways you have to think about the physical capabilities of the actual train, what stops it needs to make, whether it is carrying passengers or freight, the number of platforms at a station, the gradient of the track, and the placement of passing loops on single-track sections, amongst many other things. Data visualisation can help with timetabling and make sure our railways continue to run on track!

Data visualisation is an important area in computer science. If you had a huge amount of complex data in a spreadsheet, your first thought wouldn’t be to sit down with a cup of tea and spend hours reading through it – instead you might graph it or create an infographic to get a better picture. Humans are very bad at understanding and processing raw data, so we speed up the process by converting it to something easier to understand.

Timetabling is like this – we need to consider the arrival and departure times from all stations for each train. You might have used a (perhaps now) old fashioned paper timetable, with each train as a column, and the times at each station along the rows, like the one below. This is great if you’re a passenger… you can see clearly when your train leaves, and when it gets to your desired destination. If you’re unlucky enough to miss a train, you can also easily scan along to find the next one.

A traditional timetable with stopping times of different trains in columns and rows for each station
Image by Daniel Gill for CS4FN

Unfortunately, this kind of presentation might be more challenging for timetable designers. In this timetable, there’s a mix of stopping and fast services. You can see which of them are fast based on the number of stations they skip (marked with a vertical line), but, because they travel at different speeds it’s difficult to imagine where they are on the railway line at any one time. 

One of the main challenges in railway timetabling, and perhaps the most obvious, is that trains can’t easily overtake slower ones in front of them. it’s this quirk that causes lots of problems. So, if you needed to insert another train into this timetable you would need to consider all the departure times of the trains around it, to make sure there is no conflicts – this is a lot of data to juggle.

 But there’s an easier way to visualise these timetables: introducing Marey charts! They represent a railway on a graph, with stations listed vertically, time along the top, and each train represented by a single (bumpy) line. If we take our original timetable from above and convert it to a Marey chart, we get something that looks like this:

A Marey chart of the same timetable now with lines showing the path of the train through time (which is now the x-axis
Image by Daniel Gill for CS4FN


Though thought to have been invented by a lesser-known railway engineer called Charles Ibry, these charts were popularised by Étienne-Jules Marey, and (perhaps unfairly) take his name. 

How does it work?

There are a few things that you might notice immediately from this diagram. The stations along the side aren’t equally spaced, like you might expect from other types of graph, instead they are spaced relative to the distance between the stations on the actual railway. This means we can estimate when a fast train will pass each of the stations. This is an estimation, of course, because the train won’t be travelling at a constant speed throughout – but it’s better than our table from before which is no help at all!

Given this relative spacing, we can also estimate how fast a train is going. The steepness of the line, in this diagram, directly reflects the speed of the train*. Look at the dark blue and purple trains – they both leave Coventry really close together, but the purple train is a bit slower, so the gap widens near Birmingham International. We can also see that trains that do lots of stopping (when the line is horizontal) travel at a much slower average speed than the fast ones: though that shouldn’t be a surprise! 

*There’s a fun reason that this is the case. The gradient (the steepness of the line) is calculated as the change in y divided by the change in x. In this case, the change in the y dimension is the distance the train has travelled, and the change in x is the time it has taken. If you have studied physics, you might immediately recognise that distance divided by time is speed (or velocity). Therefore, the steepness in a Marey chart is proportional to the speed of the train. 

We can also see that the lines don’t intersect at all. This is good, because, quite famously, trains can’t really overtake. If there was an intersection it would mean that at some point, two trains would need to be at the same location at the same time. Unless you’ve invented some amazing quantum train (more about the weirdness of quantum technology in this CS4FN article), this isn’t possible!

Putting it to the Test

Put yourself in the shoes of a railway timetable designer! We have just heard that there is a freight train that needs to run through our little section of railway. The driver needs to head through sometime between 10:45 and 12:15 – how very convenient: we’ve already graphed that period of time.

The difficulty is, though, that their freight train is going to take a very slow 45 minutes to go through our section of railway – how are we going to make it fit? Let’s use the Marey chart to solve this problem visually. Firstly, we’ll put a line on that matches the requirements of the freight train:

A Marey chart showing the freight train as a single line passing through time and stations
Image by Daniel Gill for CS4FN

And then let’s re-enable all the other services.

With all the other trains included the new train crosses their paths - it would be stuck behind them. or vice versa
Image by Daniel Gill for CS4FN

Well, that’s not going to work. We can see from this, though, how slow this freight train actually is, especially compared to the express trains its overlaps with. So, to fix this, we can shift it over. We want to aim for a placement where there are no overlaps at all.

A Marey chart showing a position where the new train does not clash
Image by Daniel Gill for CS4FN

Perfect, now it’s not going to be able to make the journey without interfering with our other services at all.

Solving Problems

When we’re given a difficult problem, it’s often a good idea to find a way to visualise it (or as my A-Level physics teacher often reminded me: “draw a diagram!”). This kind of visualisation is used regularly in computer science. From students learning the craft, all the way to programmers and academics at the top of their field – they all use diagrams to help understand a problem.

More on …

Magazines …

Front cover of CS4FN issue 29 - Diversity in Computing

EPSRC supports this blog through research grant EP/W033615/1,

Equality, diversity and inclusion in the R Project: collaborative community coding & curating with Dr Heather Turner

You might not think of a programming language like Python or Scratch as being an ‘ecosystem’ but each language has its own community of people who create and improve its code, flush out the bugs, introduce new features, document any changes and write the ‘how to’ guides for new users. 

R is one such programming language. It’s named after its two co-inventors (Ross Ihaka and Robert Gentleman) and is used by around two million people around the world. People working in all sorts of jobs and industries (for example finance, academic research, government, data journalists) use R to analyse their data. The software has useful tools to help people see patterns in their data and to make sense of that information. 

It’s also open source which means that anyone can use it and help to improve it, a bit like Wikipedia where anyone can edit an article or write a new one. That’s generally a good thing because it means everyone can contribute but it can also bring problems. Imagine writing an essay about an event at your school and sharing it with your class. Then imagine your classmates adding paragraphs of their own about the event, or even about different events. Your essay could soon become rather messy and you’d need to re-order things, take bits out and make sure people hadn’t repeated something that someone had already said (but in a slightly different way). 

When changes are made to software people also want to keep a note not just of the ‘words’ added (the code) but also to make a note of who added what and when. Keeping good records, also known as documentation, helps keep things tidy and gives the community confidence that the software is being properly looked after.

Code and documentation can easily become a bit chaotic when created by different people in the community so there needs to be a core group of people keeping things in order. Fortunately there is – the ‘R Core Team’, but these days its membership doesn’t really reflect the community of R users around the world. R was first used in universities, particularly by more privileged statistics professors from European countries and North America (the Global North), and so R’s development tended to be more in line with their academic interests. R needs input and ideas from a more diverse group of active developers and decision-makers, in academia and beyond to ensure that the voices of minoritised groups are included. Also the voices of younger people, particularly as many of the current core group are approaching retirement age.

Dr Heather Turner from the University of Warwick is helping to increase the diversity of those who develop and maintain the R programming language and she’s been given funding by the EPSRC* to work on this. Her project is a nice example of someone who is bringing together two different areas in her work. She is mixing software development (tech skills) with community management (people skills) to support a range of colleagues who use R and might want to contribute to developing it in future, but perhaps don’t feel confident to do so yet

Development can involve things like fixing bugs, helping to improve the behaviour or efficiency of programs or translating error messages that currently appear on-screen in the English language into different languages. Heather and her colleagues are working with the R community to create a more welcoming environment for ‘newbies’ that encourages participation, particularly from people who are in the community but who are not currently represented or under-represented by the core group and she’s working collaboratively with other community organisations such as R-Ladies, LatinR and RainbowR. Another task she’s involved in is producing an easier-to-follow ‘How to develop R’ guide.

There are also people who work in universities but who aren’t academics (they don’t teach or do research but do other important jobs that help keep things running well) and some of them use R too and can contribute to its development. However their contributions have been less likely to get the proper recognition or career rewards compared with those made by academics, which is a little unfair. That’s largely because of the way the academic system is set up. 

Generally it’s academics who apply for funding to do new research, they do the research and then publish papers in academic journals on the research that they’ve done and these publications are evidence of their work. But the important work that supporting staff do in maintaining the software isn’t classified as new research so doesn’t generally make it into the journals, so their contribution can get left out. They also don’t necessarily get the same career support or mentoring for their development work. This can make people feel a bit sidelined or discouraged. 

To try and fix this and to make things fairer the Society of Research Software Engineering was created to champion a new type of job in computing – the Research Software Engineer (RSE). These are people whose job is to develop and maintain (engineer) the software that is used by academic researchers (sometimes in R, sometimes in other languages). The society wants to raise awareness of the role and to build a community around it. You can find out what’s needed to become an RSE below. 

Heather is in a great position to help here too, as she has a foot in each camp – she’s both an Academic and a Research Software Engineer. She’s helping to establish RSEs as an important role in universities while also expanding the diversity of people involved in developing R further, for its long-term sustainability.

Further reading

*Find out more about Heather’s EPSRC-funded Fellowship: “Sustainability and EDI (Equality, Diversity, and Inclusion) in the R Project” https://gtr.ukri.org/projects?ref=EP%2FV052128%2F1 and https://society-rse.org/getting-to-know-your-2021-rse-fellows-heather-turner/ 

Find out more about the job of the Research Software Engineer and the Society of Research Software Engineering https://society-rse.org/about/ 

Related careers

QMUL

Below is an example of a Research Software Engineer role which was advertised at QMUL in April 2024 – you can read the original advert and see a copy of the job description / person specification information which is archived at the “Jobs in Computer Science” website. This advert was looking for an RSE to support a research project “at the intersection of Natural Language Processing (NLP) and multi-modal Machine Learning, with applications in mental health.”

QMUL also has a team of Research Software Engineers and you can read about what they’re working on and their career here (there are also RSEs attached to different projects across the university, as above).

Archived job adverts from elsewhere

Below are some examples of RSE jobs (these particular vacancies have now closed but you can read about what they were looking for and see if that sort of thing might interest you in the future). The links will take you to a page with the original job advert + any Job Description (JD – what the person would actually be doing) and might also include a Person Specification (PS – the type of person they’re looking for in terms of skills, qualifications and experience) – collectively these are often known as ‘job packs’.

Note that these documents are written for quite a technical audience – the people who’d apply for the jobs will have studied computer science for many years and will be familiar with how computing skills can be applied to different subjects.

1. The Science and Technology Facilities Council (STFC) wanted four Research Software Engineers (who’d be working either in Warrington or Oxford) on a chemistry-related project (‘computational chemistry’ – “a branch of chemistry that uses computer simulation to assist in solving chemical problems”) 

2. The University of Cambridge was looking for a Research Software Engineer to work in the area of climate science – “Computational modelling is at the core of climate science, where complex models of earth systems are a routine part of the scientific process, but this comes with challenges…”

3. University College London (UCL) wanted a Research Software Engineer to work in the area of neuroscience (studying how the brain works, in this case by analysing the data from scientists using advanced microscopy).


EPSRC supports this blog through research grant EP/W033615/1.

Hallucinating chatbots

Why can’t you trust what an AI says?

by Paul Curzon, Queen Mary University of London

postcards of cuba in a rack
Image by Victoria_Regen from Pixabay

Chatbots that can answer questions and write things for you are in the news at the moment. These Artificial Intelligence (AI) programs are very good now at writing about all sorts of things from composing songs and stories to answering exam questions. They write very convincingly in a human-like way. However, one of the things about them is that they often get things wrong. Apparently, they make “facts” up or as some have described it “hallucinate”. Why should a computer lie or hallucinate? What is going on? Writing postcards will help us see.

Write a postcard

We can get an idea of what is going on if we go back to one of the very first computer programs that generated writing. It was in the 1950s and written by Christopher Strachey a school teacher turned early programmer. He wrote a love letter writing program but we will look at a similar idea: a postcard writing program.

Postcards typically might have lots of similar sentences, like “Wish you were here” or “The weather is lovely”, “We went to the beach” or “I had my face painted with butterflies”. Another time you might write things like: The weather is beautiful”, “We went to the funfair” or “I had my face painted with rainbows”. Christopher Strachey’s idea was to write a program with template sentences that could be filled in by different words: “The weather is …”, “We went to the …”, “I had my face painted with …”. Then the program picks some sentence templates at random, and then picks words at random to go in their slots. In this way, applied to postcard writing it can write millions of unique postcards. It might generate something like the following, for example (where I’ve bolded the words it filled in):

Dear Gran,

I’m on holiday in Skegness. I’ve had a wonderful time.  The weather is sunny,   We went to the beach. I had my face painted with rainbows. I’ve eaten lots strawberry ice cream. Wish you were here!

Lots of love from Mo

but the next time you ask it to it will generate something completely different.

Do it yourself

You can do the same thing yourself. Write lots of sentences on strips of card, leaving gaps for words. Give each gap a number label and note whether it is an adjective (like ‘lovely’ or ‘beautiful’) or a noun (like ‘beach’ or ‘funfair’, ‘butterflies’ or ‘rainbows’). You could also have gaps for verbs or adverbs too. Now create separate piles of cards to fit in each gap. Write the number that labels the gap on one side and different possible words of the right kind for that gap on the other side of the cards. Then keep them in numbered piles.

To generate a postcard (the algorithm or steps for you to follow), shuffle the sentence strips and pick three or four at random. Put them on the table in front of you to spell out a message. Next, go to the numbered pile for each gap in turn, shuffle the cards in that pile and then take one at random. Place it in the gap to complete the sentence. Do this for each gap until you have generated a new postcard message. Add who it is to and from at the start and end. You have just followed the steps (the algorithm) that our simple AI program is following.

Making things up

When you write a postcard by following the steps of our AI algorithm, you create sentences for the postcard partly at random. It is not totally random though, because of the templates and because you chose words to write on cards for each pile that make sense there. The words and sentences are about things you could have done – they are possible – but that does not mean you did do them!

The AI makes things up that are untrue but sound convincing because even though it is choosing words at random, they are appropriate and it is fitting them into sentences about things that do happen on holiday. People talk of chatbots ‘hallucinating’ or ‘dreaming’ or ‘lying’ but actually, as here, they are always just making the whole thing up just as we are when following our postcard algorithm. They are just being a little more sophisticated in the way that they invent their reality!

Our simple way of generating postcards is far simpler than modern AIs, but it highlights some of the features of how AIs are built. There are two basic parts to our AI. The template sentences ensure that what is produced is grammatical. They provide a simple ‘language model‘: rules of how to create correct sentences in English that sound like a human would write. It doesn’t write like Yoda :

“Truly wonderful, the beach is.”

though it could with different templates.

The second part is the sets of cards that fit the gaps. They have to fit the holes left in the templates – only nouns in the noun gaps, adjectives in the adjectives gap, and also fit

Given a set of template sentences about what you might do on holiday, the cards provide data to train the AI to say appropriate things. The cards for the face paining noun slot need to be things that might be painted on your face. By providing different cards you would change the possible sentences. The more cards the more variety in the sentences it writes.

AIs also have a language model, the rules of the language and which words go sensibly in which places in a sentence. However, they also are trained on data that gives the possibilities of what is actually written. Rather than a person writing templates and thinking up words it is based on training data such as social media posts or other writing on the Internet and what is being learnt from this data is the likelihood of what words come next, rather than just filling in holes in a template. The language model used by AIs is also actually just based on the likelihood of words appearing in sentences (not actual grammar rules).

What’s the chances of that?

So, the chatbots are based on the likelihood of words appearing and that is based on statistics. What do we mean by that? We can add a simple version of it to our Postcard AI but first we would need to collect data. How often is each face paint design chosen at seaside resorts? How often do people go to funfairs when on holiday. We need statistics about these things.

As it stands any word we add to the stack of cards is just as likely to be used. If we add the card maggots to the face painting pile (perhaps because the face painter does gruesome designs at Halloween) then the chatbot could write

“I had my face painted with maggots”.

and that is just as likely as it writing

“I had my face painted with butterflies”.

If the word maggots is not written on a card it will never write it. Either it is possible or it isn’t. We could make the chatbot write things that are more realistic, however, by adding more cards of words that are about things that are more popular. So, if in every 100 people having their face painted, almost a third, 30 people choose to have butterflies painted on their face, then we create 30 cards out of 100 in the pack with the word BUTTERFLY on (instead of just 1 card). If 5 in a 100 people choose the rainbow pattern then we add five RAINBOW cards, and so on. Perhaps we would still have one maggot card as every so often someone who likes grossing people out picks it even on holiday. Then, over all the many postcards written this way by our algorithm, the claims will match statistically the reality of what humans would write overall if they did it themselves.

As a result, when you draw a card for a sentence you are now more likely to get a sentence that is true for you. However, it is still more likely to be wrong about you personally than right (you may have had your face painted with butterflies but 70 of the 100 cards still say something else). It is still being chosen by chance and it is only the overall statistics for all people who have their face painted that matches reality not the individual case of what is likely true for you.

Make it personal

How could we make it more likely to be right about you? You need to personalise it. Collect and give it (ie train it on) more information about you personally. Perhaps you usually have a daisy painted on your face because you like daisies (you personally choose a daisy pattern 70% of the time). Sometimes you have rainbows (20% of the time). You might then on a whim choose each of 10 other designs including the butterfly maybe 1 in a hundred times. So you make a pile of 70 DAISY cards, 20 RAINBOW cards and 1 card for each of the other designs, Now, its choices, statistically at least, will match yours. You have trained it about yourself, so it now has a model of you.

You can similarly teach it more about yourself generally, so your likely activities, by adding more cards about the things you enjoy – if you usually choose chocolate or vanilla ice cream then add lots of cards for CHOCOLATE and lots for VANILLA, and so on. The more cards the postcard generator has of a word, the more likely it is to use that word. By giving it more information about yourself, it is more likely to be able to get things about you right. However, it is of course still making it up so, while it is being realistic, on any given occasion it may or may not match reality that time.

Perfect personalisation

You could go a step further and train it on what you actually did do while on this holiday, so that the only cards in the packs are the ones you did actually do on this holiday. (You ate hotdogs and ice cream and chips and … so there are cards for HOTDOG, ICE CREAM, CHIPS …). You had one vanilla ice cream, two chocolate and one strawberry so have that number of each ice cream card. If it knows everything about you then it will be able to write a postcard that is true! That is why companies behind AIs want to collect every detail of your life. The more they know about you the more they get things right about you and so predict what you will do in future too.

Probabilities from the Internet

The modern chatbots work by choosing words at random based on how likely they are in a similar way to our personalised postcard writer. They pick the most likely words to write next based on probabilities of those words coming next in the data they have been trained on. Their training data is often conversations from the Internet. If the word is most likely to come next in all that training data, then the chatbot is more likely to use that word next. However, that doesn’t make the sentence it comes up with definitely true any more than with our postcard AI.

You can personalise the modern AIs too, by giving them more accurate information about yourself and then they are more likely to get what they write about you right. There is still always a chance of them picking the wrong words, if it is there as a possibility though, as they are still just choosing to some extent at random.

Never trust a chatbot

Artificial Intelligences that generate writing do not hallucinate just some of the time. They hallucinate all of the time, just with a big probability of getting it right. They make everything up. When they get things right it is just because the statistics of the data they were trained on made those words the most likely ones to be picked to follow what went before. Just as the Internet is full of false things, an Artificial Intelligence can get things wrong too.

If you use them for anything that matters, always double check that they are telling you the truth.

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1. 

Protecting your fridge

by Jo Brodie and Paul Curzon, Queen Mary University of London

Ever been spammed by your fridge? It has happened, but Queen Mary’s Gokop Goteng and Hadeel Alrubayyi aim to make it less likely…

Image by Gerd Altmann from Pixabay

Gokop has a longstanding interest in improving computing networks and did his PhD on cloud computing (at the time known as grid computing), exploring how computing could be treated more like gas and electricity utilities where you only pay for what you use. His current research is about improving the safety and efficiency of the cloud in handling the vast amounts of data, or ‘Big Data’, used in providing Internet services. Recently he has turned his attention to the Internet of Things.

It is a network of connected devices, some of which you might have in your home or school, such as smart fridges, baby monitors, door locks, lighting and heating that can be switched on and off with a smartphone. These devices contain a small computer that can receive and send data when connected to the Internet, which is how your smartphone controls them. However, it brings new problems: any device that’s connected to the Internet has the potential to be hacked, which can be very harmful. For example, in 2013 a domestic fridge was hacked and included in a ‘botnet’ of devices which sent thousands of spam emails before it was shut down (can you imagine getting spam email from your fridge?!)

A domestic fridge was hacked
and included in a ‘botnet’ of devices
which sent thousands of spam emails
before it was shut down.

The computers in these devices don’t usually have much processing power: they’re smart, but not that smart. This is perfectly fine for normal use, but to run software to keep out hackers, while getting on with the actual job they are supposed to be doing, like running a fridge, it becomes a problem. It’s important to prevent devices from being infected with malware (bad programs that hackers use to e.g., take over a computer) and work done by Gokop and others has helped develop better malwaredetecting security algorithms which take account of the smaller processing capacity of these devices.

One approach he has been exploring with PhD student Hadeel Alrubayyi is to draw inspiration from the human immune system: building artificial immune systems to detect malware. Your immune system is very versatile and able to quickly defend you against new bugs that you haven’t encountered before. It protects you from new illnesses, not just illnesses you have previously fought off. How? Using special blood cells, such as T-Cells, which are able to detect and attack rogue cells invading the body. They can spot patterns that tell the difference between the person’s own healthy cells and rogue or foreign cells. Hadeel and Gokop have shown that applying similar techniques to Internet of Things software can outperform other techniques for spotting new malware, detecting more problems while needing less computing resources.

Gokop is also using his skills in cloud computing and data science to enhance student employability and explore how Queen Mary can be a better place for everyone to do well. Whether a person, organisation or smart fridge Gokop aims to help you reach your full potential!

More on …

Related Magazines …


EPSRC supports this blog through research grant EP/W033615/1.