Synthetic Speech

Robot on phone
Image by OpenClipart-Vectors from Pixabay

Computer-generated voices are encountered all the time now in everyday life, not only in automated call centres, but also in satellite navigation systems and home appliances.

Although synthetic speech is now far better, early systems were not as easy to understand as human speech, and many people don’t like synthetic speech at all. Maria Klara Wolters of Edinburgh University decided to find out why. In particular, she wanted to discover what makes synthetic speech difficult for older people to understand, so that the next generation of talking computers would speak more clearly.

She asked a range of people to try out a state-of-the-art speech synthesis system fo the time, tested their hearing and asked their thoughts about the voices. She found that older people have more difficulty understanding computer-generated voices, even if they were assessed as having healthy hearing. She also discovered that messages about times and people were well understood, but young and old alike struggled with complicated words, such as the names of medications, when pronounced by a computer.

More surprisingly, she found that the ability of her volunteers to remember speech correctly didn’t depend so much on their memory, but on their ability to hear particular frequencies (between 1 and 3 kHz). These frequencies are in the lower part of the middle range of frequencies that the ear can hear. They contain a large amount of information about the identity of speech sounds. Another result of the experiments was that the processing of sounds by the brain, so called ‘central auditory processing’ appeared to play a more important role for understanding natural speech, while peripheral auditory processing (processing of sounds in the ear) appeared to be more important for synthetic speech.

As a result of the experiments, Maria drew up a list of design guidelines for the next generation of talking computers: make pauses around important words, slow down, and change to simpler forms of expressions (e.g. “the blue pill” is much easier to understand and remember than a complicated medical name). She suggested that, such simple changes to the robot voices could make an immense difference to the lives of many older people. They also make services that use computer-generated voices easier for everyone to use. This kind of inclusive design benefits everybody, as it allows people from all walks of life to use the same technology. Maybe Maria’s rules would work for people you know too. Try them out next time grandpa asks you to repeat what you just said!

by the CS4FN team

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Sounding out a Sensory Garden

A girl in a garden holding an orange flower
Image by Joel santana Joelfotos from Pixabay

When the construction of Norman Jackson Children’s Centre in London started, the local council commissioned artists to design a sensory garden full of wonderful sights and sounds so the 3 to 5 year old children using the centre could have fun playing there. Sand pit, water feature, metal tree and willow pods all seemed pretty easy to install and wouldn’t take much looking after, but what about sound? How do you bring interesting sound to an outdoor space and make it fun for young children? Nela Brown from Queen Mary was given the job.

After thinking about the problem for a while she came up with an idea for an interactive sound installation. She wanted to entertain any children visiting the centre, but she especially wanted it to benefit children with poor language skills. She wanted it to be informal but have educational and social value, even though it was outside.

You name it, they press it!

Somewhere around the age of 18 months, children become fascinated with pressing buttons. Toys, TV remotes, light switches, phones, you name it they want to press it. Given the chance to press all the buttons at the same time in quick succession, that is exactly what young children will do. They will also get bored pretty quickly and move on to something else if their toy just makes lots of noise with little variety or interest.

Nela had to use her experience and understanding of the way children play and learn to work out a suitable ‘user interface’ for the installation. That is she had to design how the children would interact with it and be able to experience the effects. The user interface had to look interesting enough to get the attention of the children playing in the garden in the first place. It also obviously had to be easy to use. Nela watched children playing as part of her preparation to design the installation both to get ideas and get a feel for how they learn and play.

Sit on it!

She decided to use a panel with buttons that triggered sounds built into a seat. One important way to make any gadget easier to use is for it to give ‘real-time feedback’. That is, it should do something like play sound or change colour as soon as you press any button, so you know immediately that the button press did do something. To achieve this and make them even more interesting her buttons would both change colour and play sound when they were pressed. She also decided the panel would need to be programmed so children wouldn’t do what they usually do: press all of the buttons at once, get bored and walk away.

Nela recorded traditional stories, poems and nursery rhymes with parents and children from the local area, and composed music to fit around the stories. She also researched different online sound libraries to find interesting sound effects and soundscapes. Of the three buttons, one played the soundscapes, another played the sound effects and the last played a mixture of stories, poems and nursery rhymes. Nela hoped the variety would make it all more interesting for the children so keep their attention longer and by including stories and nursery rhymes she would be helping with language skills.

Can we build it?

Coming up with the ideas was only part of the problem. It then had to be built. It had to be weatherproof, vandal-proof and allow easy access to any parts that might need replacing. As the installation had to avoid disturbing people in the rest of the garden, furniture designer Joe Mellows made two enclosed seats out of cedar wood cladding each big enough for two children, which could house the installation and keep the sound where only the children playing with it would hear it. A speaker was built into the ceiling and two control panels made of aluminium were built into the side. The bottom panel had a special sensor, which could ‘sense’ when a child was sitting in (or standing on) the seat. It was an ultrasonic range finder – a bit like bat-senses using echoes from high frequency sounds humans can’t hear to work out where objects are. The sensor had to be covered with stainless steel mesh, so the children couldn’t poke their fingers through it and injure themselves or break the sensor. The top panel had three buttons that changed colour and played sound files when pressed.

Interaction designer Gabriel Scapusio did the wiring and the programming. Data from the sensors and buttons was sent via a cable, along with speaker cables, through a pipe underground to a computer and amplifier housed in the Children’s Centre. The computer controlling the music and colour changes was programmed using a special interactive visual programming environment for music, audio, and media called Max/MSP that has been in use for years by a wide range of people: performers, composers, artists, scientists, teachers, and students.

The panels in each seat were connected to an open-source electronics prototyping platform by Arduino. It’s intended for artists, designers, hobbyists, and anyone interested in creating interactive objects or environments, so is based on flexible, easy-to-use hardware and software.

The next job was to make sure it really did work as planned. The volume from the speakers was tested and adjusted according to the approximate head position of young children so it was audible enough for comfortable listening without interfering with the children playing in the rest of the garden. Finally it was crunch time. Would the children actually like it and play with it?

The sensory garden did make a difference – the children had lots of fun playing in it and within a few days of the opening one boy with poor language skills was not just seen playing with the installation but listening to lots of stories he wouldn’t otherwise have heard. Nela’s installation has lots of potential to help children like this by provoking and then rewarding their curiosity with something interesting that also has a useful purpose. It is a great example of how, by combining creative and technical skills, projects like these can really make a difference to a child’s life.

the CS4FN team (from the archive)

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Tony Stockman: Sonification

Two different coloured wave patterns superimposed on one anohter on a black background with random dots like a starscape.
Image by Gerd Altmann from Pixabay

Tony Stockman, who was blind from birth, was a Senior Lecturer at QMUL until his retirement. A leading academic in the field of sonification of data, turning data into sound, he eventually became the President of the “International Community for Auditory Display”: the community of researchers working in this area.

Traditionally, we put a lot of effort into finding the best ways to visualise data so that people can easily see the patterns in it. This is an idea that Florence Nightingale, of lady of the lamp fame, pioneered with Crimean War data about why soldiers were dying. Data visualisation is considered so important it is taught in primary schools where we all learn about pie charts and histograms and the like. You can make a career out of data visualisation, working in the media creating visualisations for news programmes and newspapers, for example, and finding a good visualisation is massively important working as a researcher to help people understand your results. In Big Data a good visualisation can help you gain new insights into what is really happening in your data. Those who can come up with good visualisations can become stars, because they can make such a difference (like Florence Nightingale, in fact)

Many people of course, Tony included cannot see, or are partially sighted, so visualisation is not much help! Tony therefore worked on sonifying data instead, exploring how you can map data onto sounds rather than imagery in a way that does the same thing.: makes the patterns obvious and understandable.

His work in this area started with his PhD where he was exploring how breathing affects changes in heart rate. He first needed a way to both check for noise in the recording and then also a way to present the results so that he could analyse and so understand them. So he invented a simple way to turn data into sound using for example frequencies in the data to be sound frequencies. By listening he could find places in his data where interesting things were happening and then investigate the actual numbers. He did this out of necessity just to make it possible to do research but decades later discovered there was by then a whole research community by then working on uses of and good ways to do sonification,

He went on to explore how sonification could be used to give overviews of data for both sighted and non-sighted people. We are very good at spotting patterns in sound – that is all music is after all – and abnormalities from a pattern in sound can stand out even more than when visualised.

Another area of his sonification research involved developing auditory interfaces, for example to allow people to hear diagrams. One of the most famous, successful data visualisations was the London Tube Map designed by Harry Beck who is now famous as a result because of the way that it made the tube map so easy to understand using abstract nodes and lines that ignored distances. Tony’s team explored ways to present similar node and line diagrams, what computer scientist’s call graphs. After all it is all well and good having screen readers to read text but its not a lot of good if all it tells you reading the ALT text that you have the Tube Map in front of you. And this kind of graph is used in all sorts of every day situations but are especially important if you want to get around on public transport.

There is still a lot more to be done before media that involves imagery as well as text is fully accessible, but Tony showed that it is definitely possible to do better, He also showed throughout his career that being blind did not have to hold him back from being an outstanding computer scientists as well as a leading researcher, even if he did have to innovate himself from the start to make it possible.

More on …


Related Magazine …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

The first Internet concert

Severe Tire Damage
Severe Tire Damage. Image by Strubin, CC BY-SA 4.0 via Wikimedia Commons

Which band was the first to stream a concert live over the Internet? The Rolling Stones decided, in 1994, it should be them. After all, they were one of the greatest, most innovative rock bands of all time. A concert from their tour of that year, in Dallas, was therefore broadcast live. Mick Jagger addressed the world not just the 50,000 packed into the stadium welcoming the world with “I wanna say a special welcome to everyone that’s, climbed into the Internet tonight and, uh, has got into the MBone. And I hope it doesn’t all collapse.” Unknown to them, when planning this publicity coup, another band had got there first: a band of Computer Scientists from Xerox PARC, DEC and Apple, the research centres responsible for many innovations including many of the ideas around graphical user interfaces, networks and multimedia internet had played live on the Internet the year before!

The band which actually went down in history was called Severe Tire Damage. Its members were Russ Haines and Mark Manasse (from DEC), Steven Rubin (a Computer Aided design expert from Apple) and Mark Weiser (famous for the ideas behind calm computing, from Xerox PARC). They were playing a concert at Xerox PARC on  June 24, 1993. At the time researchers there were working on a system called MBone which provided a way to do multimedia over the Internet for the first time. Now we take that for granted (just about everyone with a computer or phone doing Zoom and Teams calls, for example) but then the Internet was only set up for exchanging text and images from one person to another. MBone, short for multicast backbone, allowed packets of data of any kind (so including video data) from one source to be sent to multiple Internet addresses rather than just to one address. Sites that joined the MBone could send and receive multimedia data, including video, live to all the others in one broadcast. This meant for the first time, video calls between multiple people over the Internet were possible. They needed to test the system, of course, so set up a camera in front of Severe Tire Damage and live-streamed their performance to other researchers on the nascent MBone round the world (research can be fun at the same time as being serious!). Possibly there was only a single Australian researcher watching at the time, but it is the principle that counts!

On hearing about the publicity around the Rolling Stones concert, and understanding the technology of course, they decided it was time for one more live internet gig to secure their place in history. Immediately, before the Rolling Stones started their gig, Severe Tire Damage broadcast their own live concert over the MBone to all those (including journalists) waiting for the main act to arrive online. In effect they had set themselves up as an Internet un-billed opening act for the Stones even though they were nowhere near Dallas. Of course that is partly the point, you no longer had to all be on one place to be part of the same concert. So, the Rolling Stones, sadly for them, weren’t even the first to play live over the Internet on that particular day, never mind ever!

– Paul Curzon, Queen Mary University of London

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

Film Futures: Brassed Off

The pit head of a colliery at sunset with a vivid red sky behind the setting sun
Image from Pixabay

Computer Scientists and digital artists are behind the fabulous special effects and computer generated imagery we see in today’s movies, but for a bit of fun, in this series, we look at how movie plots could change if they involved Computer Scientists. Here we look at an alternative version of the film Brassed Off.

***SPOILER ALERT***

Brassed Off, starring Pete Postlethwaite, Tara Fitzgerald and Ewan McGregor, is set at a time when the UK coal and steel industries were being closed down with terrible effects on local communities across the North of England and Wales. It tells the story of the closing of the fictional Grimley Pit (based on the real mining village of Grimethorpe), from the point of view of the members of the colliery brass band and their families. The whole village relies on the pit for their livelihoods.

Danny, the band’s conductor is passionate about the band and wants to keep it going, even if the pit closes. Many of the other band members are totally despondent and just want to take the money that is on offer if they agree to the closure without a fight. They feel they have no future, and have given up hope over both the pit and the band (why have a colliery band if there is no colliery?)

Gloria, a company manager who grew up in the village arrives, conducting a feasibility study for the company to determine if the pit is profitable or not as justification for keeping it open or closing it down. A wonderful musician, she joins the band but doesn’t tell them that she is now management (including not telling her childhood boyfriend, and band member, Andy).

The story follows the battle to keep the pit open, and the effects on the community if it closes, through the eyes of the band members as they take part in a likely final ever brass band competition…

Brassed Off: with computer science

In our computer science film future version, the pit is still closing and Gloria is still management, but with a Computer Science PhD in digital music, she has built a flugelhorn playing robot with a creative AI brain. It can not only play brass band instruments but arrange and compose too. On arriving at Grimley she asks if her robot can join the band. Initially, every one is against the idea, but on hearing how good it is, and how it will help them do well in the national brass band competition they relent. The band, with robot, go all the way to the finals and ultimately win…

The pit, however, closes and there are no jobs, at all, not even low quality work in local supermarkets (automatic tills and robot shelf-stackers have replaced humans) or call centres (now replaced by chatbots). Gloria also loses her job due to a shake-out of middle managers as the AIs take over the knowledge economy jobs. Luckily, she is ok, as with university friends, she starts a company building robot musicians which is an amazing success. The band never make the finals again as bands full of Gloria’s flugelhorn and cornet playing robots take over (also taking the last of the band’s self-esteem). In future years, all the brass bands in the competition are robot bands as with all the pits closing the communities around them collapse. The world’s last ever flugelhorn player is a robot. Gloria and Andy never do get to kiss…

In real life…

Could a robot play a musical instrument? One existed centuries before the computer age. In 1737  Jacques de Vaucanson revealed his flute playing automaton to the public. A small human height figure, it played a real flute, that could be replaced to prove the machine could really play a real instrument. Robots have played various instruments, including drums and a cello playing robot that played with an orchestra in Malmo. While robot orchestras and bands are likely, it seems less likely that humans would stop playing as a result.

Can an AI compose music? Victorian, Ada Lovelace predicted they one day would, a century before the first computer was ever built. She realised that this would be the case just from thinking about the machines that Charles Babbage was trying to build. Her prediction eventually came true. Now of course, generative AI is being used to compose music, and can do so in any style, whether classical or pop. How good, or creative, it is may be debated but it won’t be long before they have super-human music composition powers.

So, a flugelhorn playing robot, that also composes music, is not a pipe dream!

What about the social costs that are the real theme of the film though? When the UK pits and steelworks closed whole communities were destroyed with great, and long lasting, social cost. It was all well and good for politicians to say there are new jobs being created by the new service and knowledge economy, but that was no help when no thought or money had actually been put in to helping communities make the transition. “Get on your bike” was their famous, if ineffective, solution. For example, if the new jobs were to be in technology as suggested then massive technology training programmes for those put out of work were needed, along with financial support in the meantime. Instead, whole communities were effectively left to rot and inequality increased massively. Areas in the North of England and Wales that had been the backbone of the UK economy, still haven’t really recovered 40 years later.

Are we about to make the same mistakes again? We are certainly arriving at a similar point, but now it is those knowledge economy jobs that were supposed to be the saviours 40 years ago that are under threat from AI. There may well be new jobs as old ones disappear…but even if they do will the people who lose their jobs be in a position to take the new ones, or are we heading towards a whole new lost generation. As back then, without serious planning and support, including successful efforts to reduce inequality in society, the changes coming could again cause devastation, this time much more widespread. As it stands technology is increasing, not decreasing, inequality. We need to start now, including coming up with a new economic model of how the world will work that actively reduces inequality in society. Many science fiction writers have written of utopian futures where people only work for fun (eg Arthur C Clarke’s classic “Childhood’s End” is one I’m reading at the moment), but that only happens if wealth is not sucked up by the lucky few. (In “Childhood’s End” it takes alien invaders to force out inequality.)

We can avoid a dystopian future, but only if we try…really hard.

More on …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos

A sound social venture: recognising birds

Dan Stowell was a researcher at Queen Mary University of London when he founded an early version of what is now known as a Social Venture: a company created to do social good. With Florence Wilkinson, he turned birdsong into a tech-based social good.

A Eurasian Wren singing on the end of a branch
A Eurasian Wren: Image by Siegfried Poepperl from Pixabay

His research is about designing methods that computers can use to make sense of bird sounds. One day he met Florence Wilkinson, who works with businesses and young people, and they discovered they both had the same idea: “What if we could make an app that recognises bird sounds?” They decided to create a startup company, Warblr, to make it happen. However, unlike many research driven startups its main aim was not to make money but to do a social good. Dan and FLorence built this into their company mission statement:

…to reconnect people with the natural world through technology. We want to get as many people outdoors as possible, learning about the wildlife on their doorstep and how to protect it.

Dan brought the technical computer science skills needed to create the app, and Florence brought the marketing and communication skills needed to ensure people would hear about it. Together, they persuaded Queen Mary University of London’s innovation unit to give them a start-up grant. As a result their app Warblr exists and even gained some press coverage.

It can help people connect with nature by helping recognise birds – after all one of the problems with bird watching is they are so damned hard to spot and lots that flit by just look like little brown things! However, they are far easier to hear. Once you know what is out there then it adds incentive to try to actually spot it. However, the app has another purpose too. It collects data about the birds spotted, recording the species and where and when it was seen, with that data then made freely available to researchers.

Social ventures are a relatively new idea that universities are now supporting to help their researchers do social good that is sustainable and not just something that lasts until the grants run out. As Dan and Florence showed though, as a researcher you do not need to commit to do everything. To be a successful innovator you need more than technical skills, though. You need the ability to be part of a great team and to recognise a sound deal!

Updated from the archive, written by Paul Curzon, Queen Mary University of London.

More on …

Magazines …

The front cover of issue 21 of CS4FN called Computing Sounds Wild

Our Books …


Subscribe to be notified whenever we publish a new post to the CS4FN blog.



EPSRC supports this blog through research grant EP/W033615/1. 

Nemisindo: breaking the sound barrier

Womens feet walking on a path
Image by ashokorg0 from Pixabay

Games are becoming ever more realistic. Now, thanks to the work of Joshua Reiss’s research team and their spinout company, Nemisindo, it’s not just the graphics that are amazing, the sound effects can be too.

There has been a massive focus over the years in improving the graphics in games. We’ve come along way from Pong and its square ball and rectangular paddles. Year after year, decades after decade, new algorithms, new chips and new techniques have been invented that combined with the capabilities of ever faster computers, have meant that we now have games with realistic, real-time graphics immersing us in the action as we play. And yet games are a multimedia experience and realistic sounds matter too if the worlds are to be truly immersive. For decades film crews have included whole teams of Foley editors whose job is to create realistic everyday sounds (check out the credits next time you watch a film!). Whether the sound is of someone walking on a wooden floor in bare feet, walking on a crunchy path,opening thick, plush curtains, or an armoured knight clanging their way down a bare, black cliff, lots of effort goes into getting the sound just right.

Game sound effects are currently often based on choosing sounds from a sound library, but games, unlike films, are increasingly open. Just about anything can happen and make a unique noise while doing so. The chances of the sound library having all the right sounds get slimmer and slimmer.

Suppose a knight character in a game drops a shield. What should it sound like? Well, it depends on whether it is a wooden shield or a metal one. Did it land on its edge or fall horizontally, and was it curved so it rang like a bell? Is the floor mud or did it hit a stone path? Did it bounce or roll? Is the knight in an echoey hall, on a vast plain or clambering down those clanging cliffs…

All of this is virtually impossible to get exactly right if you’re relying on a library of sound samples. Instead of providing pre-recorded sounds as sound libraries do, the software of Josh and team’s company Nemisindo (which is the Zulu word for ‘sound effects’), create new sounds from scratch exactly when they are needed and in real time as a game is played. This approach is called “procedural audio technology”. It allows the action in the game itself to determine the sounds precisely as the sounds are programmed based on setting options for sounds linked to different action scenarios, rather than selecting a specific sound. Aside from the flexibility it gives, this way of doing sound effects gives big advantages in terms of memory too: because sounds are created on the fly, large libraries of sounds no longer need to be stored with the program. 

Nemisindo’s new software provides generated procedural sounds for the Unreal game engine allowing anyone building games using the engine to program a variety of action scenarios with realistic sounds tuned to the situation in their game as it happens…

In future, if that Knight steps off the stone path just as she drops her shield the sound generated will take the surface it actually lands on into account…

Procedural sound is the future of sound effects so just as games are now stunning visually, expect them in future to become ever more stunning to listen to too. As they do the whole experience will become ever more immersive… and what works for games works for other virtual environments too. All kinds of virtual worlds just became a lot more realistic. Getting the sound exactly right is no longer a barrier to a perfect experience.

Nemisindo has support from Innovate UK.

– Paul Curzon, Queen Mary University of London

More on …


Magazines …


Our Books …

Subscribe to be notified whenever we publish a new post to the CS4FN blog.


This page is funded by EPSRC on research agreement EP/W033615/1.

QMUL CS4FN EPSRC logos