A Wookie for three minutes please

How Foley artists can manipulate natural and synthesised sounds for film, TV and radio

Black and white photo of a walrus being offered a fish, with one already in its mouth
“Are you sure that’s a microphone?”
Image by Kabomani-Tapir from Pixabay

Theatre producers, radio directors and film-makers have been trying to create realistic versions of natural sounds for years. Special effects teams break frozen celery stalks to mimic breaking bones, smack coconut shells on hard packed sand to hear horses gallop, rustle cellophane for crackling fire. Famously, in the first Star Wars movie the Wookie sounds are each made up of up to six animal clips combined, including a walrus! Sometimes the special effect people even record the real thing and play it at the right time! (Not a good idea for the breaking bones though!) The person using props to create sounds for radio and film is called a Foley artist, named after the work of Jack Donovan Foley in the 1920’s. Now the Foley artist is drawing on digital technology to get the job done.

Designing sounds

Sound designers have a hard job finding the right sounds. So how about creating sound automatically using algorithms? Synthetic sound! Research into sound creation is a hot topic, not just for special effects but also to help understand how people hear and for use in many other sound based systems. We can create simple sounds fairly easily using musical instruments and synthesisers, but creating sounds from nature, animal sounds and speech is much more complicated.

The approaches used to recognize sounds can be the basis of generating sounds too. You can either try and hand craft a set of rules that describe what makes the sound sound the way it does, or you can write algorithms that work it out for themselves.

Paying patterns attention

One method, developed as a way to automatically generate synthetic sound, is based on looking for patterns in the sounds. Computer scientists often create mathematical models to better understand things, as well as to recognize and generate computer versions of them. The idea is to look at (or here listen to) lots of examples of the thing being studied. As patterns become obvious they also start to identify elements that don’t have much impact. Those features are ignored so the focus stays on the most important parts. In doing this they build up a general model, or view, that describes all possible examples. This skill of ignoring unimportant detail is called abstraction, and if you create a general view, a model of something, this is called generalisation: both important parts of computational thinking. The result is a hand-crafted model for generating that sound.

That’s pretty difficult to do though, so instead computer scientists write algorithms to do it for them. Now, rather than a person trying to work out what is, or is not important, training algorithms work it out using statistical rules. The more data they see, the stronger the pattern that emerges, which is why these approaches are often referred to as ‘Big Data’. They rely on number crunching vast data sets. The learnt pattern is then matched against new data, looking for examples, or as the basis of creating new examples that match the pattern.

The rain in train(ing)

Number crunching based on Big Data isn’t the only way though, sometimes general patterns can be identified from knowledge of the thing being investigated. For example, rain isn’t one sound but is made up of lots of rain drops all doing a similar thing. Natural sounds often have that kind of property. So knowledge of a phenomenon can be used to create a basic model to build a generator around. This is an approach Richard Turner, now at Cambridge University, has pioneered, analysing the statistical properties of natural sounds. By creating a basic model and then gradually tweaking it to match the sound-quality of lots of different natural sounds, his algorithms can learn what natural sounds are like in general. Then, given a specific natural ‘training’ sound, it can generate synthetic versions of that sound by choosing settings that match its features. You could give it a recorded sample of real rain, for example. Then his sound processing algorithms apply a bunch of maths that pull out the important features of that particular sound based on the statistical models. With the critical features identified, and plugged in to his general model, a new sound of any length can then be generated that still matches the statistical pattern of, and so sounds like, the original. Using the model you can create lots of different versions of rain, that all still sound like rain, lots of different campfires, lots of different streams, and so-on.

For now, the celery stalks are still in use, as are the walrus clippings, but it may not be long before film studios completely replace their Foley bag of tricks with computerised solutions like Richard’s. One wookie for 3 minutes and a dawn chorus for 5 please.


Become a Foley Artist with Sonic Pi

You can have a go at being a Foley artist yourself. Sonic Pi is a free live-coding synth for music creation that is both powerful enough for professional musicians, but intended to get beginners into live coding: combining programming with composing to make live music.

It was designed for use with a Raspberry Pi computer, which is a cheap way to get started, though works with other computers too. Its also a great, fun way to start to learn to program.

Play with anything, and everything, you find around the house, junk or otherwise. See what sounds it makes. Record it, and then see what it makes you think of out of context. Build up your own library of sounds, labelling them with things they sound like. Take clips of films, mute the sound and create your own soundscape for them. Store the sound clips and then manipulate them in Sonic Pi, and see if you can use them as the basis of different sounds.

Listen to the example sound clips made with Sonic Pi on their website, then start adapting them to create your own sounds, your own music. What is the most ‘natural sound’ you can find or create using Sonic Pi?

 

– Jane Waite and Paul Curzon, Queen Mary University of London.
originally published on CS4FN and in an issue of the magazine (see below).


 

Front cover of CS4FN Issue 21 – Computing sounds wild

This article was also originally published in issue 21 of the CS4FN magazine ‘Computing Sounds Wild’ on p16. You can download a PDF copy of Issue 21, as well as all of our previous published material, free, at the CS4FN downloads site.

Computing Sounds Wild explores the work of scientists and engineers who are using computers to understand, identify and recreate wild sounds, especially those of birds. We see how sophisticated algorithms that allow machines to learn, can help recognize birds even when they can’t be seen, so helping conservation efforts. We see how computer models help biologists understand animal behaviour, and we look at how electronic and computer generated sounds, having changed music, are now set to change the soundscapes of films. Making electronic sounds is also a great, fun way to become a computer scientist and learn to program.

QMUL CS4FN EPSRC logos

 

 

Stopping sounds getting left behind: the Bela computer

Clock submerged under blue ripples of sound
Clock Waves Image by Gerd Altmann from Pixabay

Computer-based musical instruments are so flexible and becoming more popular. They have had one disadvantage though. The sound could drag behind the musician in a way that made some digital instruments seem unplayable. Thanks to a new computer called Bela, that problem may now be a thing of the past.

If you pluck a guitar string or thwack a drum the sound you hear is instantaneous. Well, nearly. There’s a tiny delay. The sound still has to leave the instrument and travel to your ear. The vibration of the string or drum skin pushes the air back and forth, and vibrating air is all a sound is. Your ear receives the sound as soon as that vibrating air gets to you. Then your brain has to recognise it as a sound (and tell you what kind of sound it is, which direction it came from, which instrument produced it and so on!). The time it takes for sound and then your brain to do all that is measured in tens of milliseconds – thousandths of a second. It is called ‘latency‘, not because the delay makes it ‘late’ (though it does!), but from the Latin word latens which means hidden or concealed, because the time between the signal being created and being received, it is hidden from us.

Digital instruments take slightly longer than physical instruments, however, because electronic circuitry and computer processing is involved. It’s not just the sound going through air to ear but a digital signal whizzing through a circuit, or being processed by a computer, first to generate the sound which then goes through air to ear.

Your ear (actually your brain) will detect two sounds as being separate if there’s a gap of around 30 milliseconds between them. Drop that gap down to around 10 milliseconds between the sounds and you’ll hear them as a single sound. If that circuit-whizzing adds 10-20 milliseconds then you’re going to notice that the instrument is lagging behind you, making it feel unplayable. Reducing a digital instrument’s latency is therefore a very important part of improving the experience for the musician.

In 2014 Andrew McPherson and colleagues at Queen Mary University of London aimed to solve this problem. They developed Bela, a tiny computer, similar in size to a Raspberry Pi or Arduino, that can be used in a variety of digital instruments but which is special because it has an ultra-low latency of only around 2 milliseconds – super fast.

How does it do it? A computer can seem to run slowly if it is trying to do lots of things at the same time (e.g. lots of apps running or too many windows open at once). That is when the experience for the user can be a bit glitchy. Bela works by prioritising the audio signal above ALL other activities to ensure that, no matter what else the computer is doing, the gap between input (pressing a key) and output (hearing a sound) is barely noticeable. The small size of Bela also makes it completely portable and so easy to use in musical performances without needing the performer to be tethered to a large computer.

There is definitely a demand for such a computer amongst musicians. Andrew and the team wanted to make Bela available, so began fundraising through Kickstarter to create more kits. Their fundraiser reached £5,000 within four hours and within a month they’d raised £54,000, so production could begin and they launched a company, Augmented Instruments Ltd, to sell the Bela hardware kits.

Bela allows musicians to stop worrying about the sounds getting left behind. Instead, they can just get on with playing and creating amazing sounds.

– Jo Brodie and Paul Curzon, Queen Mary University of London

See Bela in action on YouTube. Follow them on Twitter.

QMUL CS4FN EPSrC logos

 

 

Die another Day? Or How Madonna crashed the Internet

From the cs4fn archive …

When pop star Madonna took to the stage at Brixton Academy in 2001 for a rare appearance she made Internet history and caused more that a little Internet misery. Her concert performance was webcast; that is it was broadcast real time over the Internet. A record-breaking audience of 9 million tuned in, and that’s where the trouble started…

A lone mike in front of stage lights
Image by Pexels from Pixabay

The Internet’s early career

The Internet started its career as a way of sending text messages between military bases. What was important was that the message got through, even if parts of the network were damaged say, during times of war. The vision was to build a communications system that could not fail; even if individual computers did, the Internet would never crash. The text messages were split up into tiny packets of information and each of these was sent with an address and their position in the message over the wire. Going via a series of computer links it reached its destination a bit like someone sending a car home bit by bit through the post and then rebuilding it. Because it’s split up the different bits can go by different routes.

Express yourself (but be polite please)

To send all these bits of information a set of protocols (ways of communicating between the computers making up the Internet) were devised. When passing on a packet of information the sending machine first asks the receiving machine if it is both there and ready. If it replies yes then the packet is sent. Then, being a polite protocol, the sender asks the receiver if the packets all arrived safely. This way, with the right address, the packets can find the best way to go from A to B. If on the way some of the links in the chain are damaged and don’t reply, the messages can be sent by a different route. Similarly if some of the packets gets lost in transit between links and need to be resent, or packets are delayed in being sent because they have to go by a round about route, the protocol can work round it. It’s just a matter of time before all the packets arrive at the final destination and can be put back in order. With text the time taken to get there doesn’t really matter that much.

The Internet gets into the groove

The problem with live pop videos, like a Madonna concert, is that it’s no use if the last part of the song arrives first, or you have to wait half an hour for the middle chorus to turn up, or the last word in a sentence vanishes. It needs to all arrive in real time. After all, that is how it’s being sung. So to make web casting work there needs to be something different, a new way of sending the packets. It needs to be fast and it needs to deal with lots more packets as video images carry a gigantic amount of data. The solution is to add something new to the Internet, called an overlay network. This sits on top of the normal wiring but behaves very differently.

The Internet turns rock and roll rebel

So the new real time transmission protocol gets a bit rock and roll, and stops being quite so polite. It takes the packets and throws them quickly onto the Internet. If the receiver catches them, fine. If it doesn’t, then so what? The sender is too busy to check like in the old days. It has to keep up with the music! If the packets are kept small, an odd one lost won’t be missed. This overlay network called the Mbone, lets people tune into the transmissions like a TV station. All these packages are being thrown around and if you want to you can join in and pick them up.

Crazy for you

Like dozens of cars
all racing to get through
a tunnel there were traffic jams.
It was Internet gridlock.

The Madonna webcast was one of the first real tests of this new type of approach. She had millions of eager fans, but it was early days for the technology. Most people watching had slow dial-up modems rather than broadband. Also the number of computers making up the links in the Internet were small and of limited power. As more and more people tuned in to watch, more and more packets needed to be sent and more and more of the links started to clog up. Like dozens of cars all racing to get through a tunnel there were traffic jams. Packets that couldn’t get through tried to find other routes to their destination … which also ended up blocked. If they did finally arrive they couldn’t get through onto the viewers PC as the connection was slow, and if they did, very many were too late to be of any use. It was Internet gridlock.

Who’s that girl?

Viewers suffered as the pictures and sound cut in and out. Pictures froze then jumped. Packets arrived well after their use by date, meaning earlier images had been shown missing bits and looking fuzzy. You couldn’t even recognise Madonna on stage. Some researchers found that packets had, for example, passed over seven different networks to reach a PC in a hotel just four miles away. The packets had taken the scenic route round the world, and arrived too late for the party. It wasn’t only the Madonna fans who suffered. The broadcast made use of the underlying wiring of the Internet and it had filled up with millions of frantic Madonna packets. Anyone else trying to use the Internet at the time discovered that it had virtually ground to a halt and was useless. Madonna’s fans had effectively crashed the Internet!

Webcasts in Vogue

Today’s webcasts have moved on tremendously using the lessons learned from the early days of the Madonna Internet crash. Today video is very much a part of the Internet’s day-to-day duties: the speed of the computer links of the Internet and their processing power has increased massively; more homes have broadband so the packets can get to your PC faster; satellite uplinks now allow the network to identify where the traffic jams are and route the data up and over them; extra links are put into the Internet to switch on at busy times; there are now techniques to unnoticeably compress videos down to small numbers of packets, and intelligent algorithms have been developed to reroute data effectively round blocks. We can also now combine the information flowing to the viewers with information coming back from them so allowing interactive webcasts. With the advent of digital television this service is now in our homes and not just on our PC’s.

Living in a material world

It’s because of thousands of scientists working on new and improved technology and software that we can now watch as the housemate’s antics stream live from the Big Brother house, vote from our armchair for our favourite talent show contestant or ‘press red’ and listen to the director’s commentary as we watch our favourite TV show. Like water and electricity the Internet is now an accepted part of our lives. However, as we come up with even more popular TV shows and concerts, strive to improve the quality of sound and pictures, more people upgrade to broadband and more and more video information floods the Internet … will the Internet Die another Day?

– Peter W. McOwan and Paul Curzon, Queen Mary University of London, 2006

More on …

Read more about women in computing in the cs4fn special issue “The Woman are Here”.

Punk robots learn to pogo

A Punk with a rainbow mohican
Image by Ivana Tomášková from Pixabay

It’s the second of three punk gigs in a row for Neurotic and the PVCs, and tonight they’re sounding good. The audience seem to be enjoying it too. All around the room the people are clapping and cheering, and in the middle of the mosh pit the three robots are dancing. They’re jumping up and down in the style of the classic punk pogo, and they’ve been doing it all night whenever they like the music most. Since Neurotic came on the robots can hardly keep still. In fact Neurotic and the PVCs might be the best, most perfect band for these three robots to listen to, since their frontman, Fiddian, made sure they learned to like the same music he does.

Programming punks

It’s a tough task to get a robot to learn what punk music sounds like, but there are lots of hints lurking in our own brains. Inside your brain are billions of connected cells called neurons that can send messages to one another. When and where the messages get sent depends on how strong each connection is, and we forge new connections whenever we learn something.

What the robots’ programmers did was to wire up a network of computerised connections like the ones in a real brain. Then they let the robots sample lots of different kinds of music and told them what it was, like reggae, pop, and of course, Fiddian’s collection of classic punk. That way the connections in the neural network got stronger and stronger – the more music the robots listened to, the easier it got for them to recognise what kind of stuff it was. When they recognised a style they’d been told to look out for, they would dance, firing a cylinder of compressed air to make them jump up and down.

The robots’ first gig

The last step was to tell the robots to go out and enjoy some punk. The programmers turned off the robots’ neural connections to other kinds of music, so no Kylie or Bob Marley would satisfy them. They would only dance to the angry, churning sound of punk guitars. The robots got dressed up in spray-painted leather, studded belts and safety pins, so with their bloblike bodies they looked like extra-tough boxing gloves on sticks. Then the three two-metre tall troublemakers went to their first gig.

Whenever a band begins to play, the robots’ computer system analyses the sound coming from the stage. If the patterns in it look the same as the idea of punk music they’ve learned, the robots begin to dance. If the pattern isn’t quite right, they stand still. For lots of songs they hardly dance at all, which might seem weird since all the bands that are playing the gig call themselves punk bands. Except there are many different styles of punk music, and the robots have been brought up listening to Fiddian’s favourites. The other styles aren’t close enough to the robots’ idea of punk – they’ve developed taste, and it’s the same as Fiddian’s. Which is why the robots go crazy for Neurotic and the PVCs. Fiddian’s songs are influenced by classic punk like the Clash, the Sex Pistols and Siouxsie & the Banshees, which is exactly the music he’s taught the robots to love. As the robots jump wildly up and down, it’s clear that Neurotic and the PVCs now have three tall, tough, computerised superfans.