What’s that bird? Ask your phone – birdsong-recognition apps


by Dan Stowell, Queen Mary University of London

Could your smartphone automatically tell you what species of bird is singing outside your window? If so how?

Mobile phones contain microphones to pick up your voice. That means they should be able to pick up the sound of birds singing too, right? And maybe even decide which bird is which?

Smartphone apps exist that promise to do just this. They record a sound, analyse it, and tell you which species of bird they think it is most likely to be. But a smartphone doesn’t have the sophisticated brain that we have, evolved over millions of years to understand the world around us. A smartphone has to be programmed by someone to do everything it does. So if you had to program an app to recognise bird sounds, how would you do it? There are two very different ways computer scientists have devised to do this kind of decision making and they are used by researchers for all sorts of applications from diagnosing medical problems to recognising suspicious behaviour in CCTV images. Both ways are used by phone apps to recognise bird song that you can already buy.

Robin image by Darren Coleshill from Pixabay
The sound of the European robin (Erithacus rubecula) better known as robin redbreast, from Wikipedia.

Write down all the rules

If you ask a birdwatcher how to identify a blackbird’s sound, they will tell you specific rules. “It’s high-pitched, not low-pitched.” “It lasts a few seconds and then there’s a silent gap before it does it again.” “It’s twittery and complex, not just a single note.” So if we wrote down all those rules in a recipe for the machine to follow, each rule a little program that could say “Yes, I’m true for that sound”, an app combining them could decide when a sound matches all the rules and when it doesn’t.

Young blackbird in Oxfordshire, from Wikipedia
The sound of a European blackbird (Turdus merula) singing merrily in Finland, from Wikipedia (song 1).

This is called an ‘expert system’ approach. One difficulty is that it can take a lot of time and effort to actually write down enough rules for enough birds: there are hundreds of bird species in the UK alone! Each would need lots of rules to be hand crafted. It also needs lots of input from bird experts to get the rules exactly right. Even then it’s not always possible for people to put into words what makes a sound special. Could you write down exactly what makes you recognise your friends’ voices, and what makes them different from everyone else’s? Probably not! However, this approach can be good because you know exactly what reasons the computer is using when it makes decisions.

This is very different from the other approach which is…

Show it lots of examples

A lot of modern systems use the idea of ‘machine learning’, which means that instead of writing rules down, we create a system that can somehow ‘learn’ what the correct answer should be. We just give it lots of different examples to learn from, telling it what each one is. Once it has seen enough examples to get it right often enough, we let it loose on things we don’t know in advance. This approach is inspired by how the brain works. We know that brains are good at learning, so why not do what they do!

One difficulty with this is that you can’t always be sure how the machine comes up with its decisions. Often the software is a ‘black box’ that gives you an answer but doesn’t tell you what justifies that answer. Is it really listening to the same aspects of the sound as we do? How would we know?

On the other hand, perhaps that’s the great thing about this approach: a computer might be able to give you the right answer without you having to tell it exactly how to do that!

It means we don’t need to write down a ‘recipe’ for every sound we want to detect. If it can learn from examples, and get the answer right when it hears new examples, isn’t that all we need?

Which way is best?

There are hundreds of bird species that you might hear in the UK alone, and many more in tropical countries. Human experts take many years to learn which sound means which bird. It’s a difficult thing to do!

So which approach should your smartphone use if you want it to help identify birds around you? You can find phone apps that use one approach or another. It’s very hard to measure exactly which approach is best, because the conditions change so much. Which one works best when there’s noisy traffic in the background? Which one works best when lots of birds sing together? Which one works best if the bird is singing in a different ‘dialect’ from the examples we used when we created the system?

One way to answer the question is to provide phone apps to people and to see which apps they find most useful. So companies and researchers are creating apps using the ways they hope will work best. The market may well then make the decision. How would you decide?


This article was originally published on the CS4FN website and can also be found on pages 10 and 11 of Issue 21 of the CS4FN magazine ‘Computing sounds wild’. You can download a free PDF copy of the magazine (below), or any of our other free material at our downloads site.


Further bird- (& computing-) themed reading
🐦🐤🦜🦉


Related Magazine …


This blog is funded through EPSRC grant EP/W033615/1.

A Wookie for three minutes please – how Foley artists can manipulate natural and synthesised sounds for film, TV and radio

by Jane Waite and Paul Curzon, Queen Mary University of London.
This story was originally published on CS4FN and in an issue of the magazine (see below).

Theatre producers, radio directors and film-makers have been trying to create realistic versions of natural sounds for years. Special effects teams break frozen celery stalks to mimic breaking bones, smack coconut shells on hard packed sand to hear horses gallop, rustle cellophane for crackling fire. Famously, in the first Star Wars movie the Wookie sounds are each made up of up to six animal clips combined, including a walrus! Sometimes the special effect people even record the real thing and play it at the right time! (Not a good idea for the breaking bones though!) The person using props to create sounds for radio and film is called a Foley artist, named after the work of Jack Donovan Foley in the 1920’s. Now the Foley artist is drawing on digital technology to get the job done.

Black and white photo of a walrus being offered a fish, with one already in its mouth
“Are you sure that’s a microphone?” Walrus photo by Kabomani-Tapir from Pixabay

Designing sounds

Sound designers have a hard job finding the right sounds. So how about creating sound automatically using algorithms? Synthetic sound! Research into sound creation is a hot topic, not just for special effects but also to help understand how people hear and for use in many other sound based systems. We can create simple sounds fairly easily using musical instruments and synthesisers, but creating sounds from nature, animal sounds and speech is much more complicated.

The approaches used to recognize sounds can be the basis of generating sounds too. You can either try and hand craft a set of rules that describe what makes the sound sound the way it does, or you can write algorithms that work it out for themselves.

Paying patterns attention

One method, developed as a way to automatically generate synthetic sound, is based on looking for patterns in the sounds. Computer scientists often create mathematical models to better understand things, as well as to recognize and generate computer versions of them. The idea is to look at (or here listen to) lots of examples of the thing being studied. As patterns become obvious they also start to identify elements that don’t have much impact. Those features are ignored so the focus stays on the most important parts. In doing this they build up a general model, or view, that describes all possible examples. This skill of ignoring unimportant detail is called abstraction, and if you create a general view, a model of something, this is called generalisation: both important parts of computational thinking. The result is a hand-crafted model for generating that sound.

That’s pretty difficult to do though, so instead computer scientists write algorithms to do it for them. Now, rather than a person trying to work out what is, or is not important, training algorithms work it out using statistical rules. The more data they see, the stronger the pattern that emerges, which is why these approaches are often referred to as ‘Big Data’. They rely on number crunching vast data sets. The learnt pattern is then matched against new data, looking for examples, or as the basis of creating new examples that match the pattern.

The rain in train(ing)

Number crunching based on Big Data isn’t the only way though, sometimes general patterns can be identified from knowledge of the thing being investigated. For example, rain isn’t one sound but is made up of lots of rain drops all doing a similar thing. Natural sounds often have that kind of property. So knowledge of a phenomenon can be used to create a basic model to build a generator around. This is an approach Richard Turner, now at Cambridge University, has pioneered, analysing the statistical properties of natural sounds. By creating a basic model and then gradually tweaking it to match the sound-quality of lots of different natural sounds, his algorithms can learn what natural sounds are like in general. Then, given a specific natural ‘training’ sound, it can generate synthetic versions of that sound by choosing settings that match its features. You could give it a recorded sample of real rain, for example. Then his sound processing algorithms apply a bunch of maths that pull out the important features of that particular sound based on the statistical models. With the critical features identified, and plugged in to his general model, a new sound of any length can then be generated that still matches the statistical pattern of, and so sounds like, the original. Using the model you can create lots of different versions of rain, that all still sound like rain, lots of different campfires, lots of different streams, and so-on.

For now, the celery stalks are still in use, as are the walrus clippings, but it may not be long before film studios completely replace their Foley bag of tricks with computerised solutions like Richard’s. One wookie for 3 minutes and a dawn chorus for 5 please.

 


Become a Foley Artist with Sonic Pi

You can have a go at being a Foley artist yourself. Sonic Pi is a free live-coding synth for music creation that is both powerful enough for professional musicians, but intended to get beginners into live coding: combining programming with composing to make live music.

It was designed for use with a Raspberry Pi computer, which is a cheap way to get started, though works with other computers too. Its also a great, fun way to start to learn to program.

Play with anything, and everything, you find around the house, junk or otherwise. See what sounds it makes. Record it, and then see what it makes you think of out of context. Build up your own library of sounds, labelling them with things they sound like. Take clips of films, mute the sound and create your own soundscape for them. Store the sound clips and then manipulate them in Sonic Pi, and see if you can use them as the basis of different sounds.

Listen to the example sound clips made with Sonic Pi on their website, then start adapting them to create your own sounds, your own music. What is the most ‘natural sound’ you can find or create using Sonic Pi?

 


 

This article was also originally published in issue 21 of the CS4FN magazine ‘Computing Sounds Wild’ on p16. You can download a PDF copy of Issue 21, as well as all of our previous published material, free, at the CS4FN downloads site.

Computing Sounds Wild explores the work of scientists and engineers who are using computers to understand, identify and recreate wild sounds, especially those of birds. We see how sophisticated algorithms that allow machines to learn, can help recognize birds even when they can’t be seen, so helping conservation efforts. We see how computer models help biologists understand animal behaviour, and we look at how electronic and computer generated sounds, having changed music, are now set to change the soundscapes of films. Making electronic sounds is also a great, fun way to become a computer scientist and learn to program.

Front cover of CS4FN Issue 21 – Computing sounds wild

 

 

CS4FN Advent – Day 7 – Computing for the birds: dawn chorus, birds as data carriers and a Google April Fool (plus a puzzle!)

Welcome to Day 7 of our advent calendar. Yesterday’s post was about Printed Circuit Birds Boards, today’s theme is the Christmas robin redbreast which features on lots of Christmas cards and today is making a special appearance on our CS4FN Computing advent calendar.

A little robin redbreast.

In this longer post we’ll focus on the ways computer scientists are learning about our feathered friends and we’ll also make room for some of the bird-brained April Fools jokes in computing too.

We hope you enjoy it, and there’s also a puzzle at the end.

1. Computing Sounds Wild – bird is the word

Our free CS4FN magazine, Computing Sounds Wild (you can download a copy here), features the word ”bird” 60 times so it’s definitely very bird-themed.

An interest in nature and an interest in computers don’t obviously go well together. For a band of computer scientists interested in sound they very much do, though. In this issue we explore the work of scientists and engineers using computers to understand, identify and recreate wild sounds, especially those of birds. We see how sophisticated algorithms that allow machines to learn, can help recognize birds even when they can’t be seen, so helping conservation efforts. We see how computer models help biologists understand animal behaviour, and we look at how electronic and computer-generated sounds, having changed music, are now set to change the soundscapes of films. Making electronic sounds is also a great, fun way to become a computer scientist and learn to program.”

2. Singing bird – a human choir singing birdsong

by Jane Waite, QMUL
This article was originally published on the CS4FN website and can also be found on page 15 in the magazine linked above.

“I’m in a choir”. “Really, what do you sing?” “I did a blackbird last week, but I think I’m going to be woodpecker today, I do like a robin though!”

This is no joke! Marcus Coates a British artist, got up very early, and working with a wildlife sound recordist, Geoff Sample, he used 14 microphones to record the dawn chorus over lots of chilly mornings. They slowed the sounds down and matched up each species of bird with different types of human voices. Next they created a film of 19 people making bird song, each person sang a different bird, in their own habitats, a car, a shed even a lady in the bath! The 19 tracks are played together to make the dawn chorus. See it on YouTube below.

Marcus didn’t stop there, he wrote a new bird song score. Yes, for people to sing a new top ten bird hit, but they have to do it very slowly. People sing ‘bird’ about 20 times slower than birds sing ‘bird’ ‘whooooooop’, ‘whooooooop’, ‘tweeeeet’. For a special performance, a choir learned the new song, a new dawn chorus, they sang the slowed down version live, which was recorded, speeded back up and played to the audience, I was there! It was amazing! A human performance, became a minute of tweeting joy. Close your eyes and ‘whoop’ you were in the woods, at the crack of dawn!

Computationally thinking a performance

Computational thinking is at the heart of the way computer scientists solve problems. Marcus Coates, doesn’t claim to be a computer scientist, he is an artist who looks for ways to see how people are like other animals. But we can get an idea of what computational thinking is all about by looking at how he created his sounds. Firstly, he and wildlife sound recordist, Geoff Sample, had to focus on the individual bird sounds in the original recordings, ignore detail they didn’t need, doing abstraction, listening for each bird, working out what aspects of bird sound was important. They looked for patterns isolating each voice, sometimes the bird’s performance was messy and they could not hear particular species clearly, so they were constantly checking for quality. For each bird, they listened and listened until they found just the right ‘slow it down’ speed. Different birds needed different speeds for people to be able to mimic and different kinds of human voices suited each bird type: attention to detail mattered enormously. They had to check the results carefully, evaluating, making sure each really did sound like the appropriate bird and all fitted together into the Dawn Chorus soundscape. They also had to create a bird language, another abstraction, a score as track notes, and that is just an algorithm for making sounds!

3. Sophisticated songbird singing – how do they do it?

by Dan Stowell, QMUL
This article was originally published on the CS4FN website and can also be found on page 14 in the magazine linked above.

How do songbirds make such complex sounds? The answer is on a different branch of the tree of evolution…
We humans have a set of vocal folds (or vocal cords) in our throats, and they vibrate when we speak to make the pitched sound. Air from your lungs passes over them and they chop up the column of air letting more or less through and so making sound waves. This vocal ‘equipment’ is similar in mammals like monkeys and dogs, our evolutionary neighbours. But songbirds are not so similar to us. They make sounds too, but they evolved this skill separately, and so their ‘equipment’ is different: they actually have two sets of vocal folds, one for each lung.

Image by Dieter_G from Pixabay

Sometimes if you hear an impressive, complex sound from a bird, it’s because the bird is actually using the two sides of their voice-box together to make what seems like a single extra-long or extra-fancy sound. Songbirds also have very strong muscles in their throat that help them change the sound extremely quickly. Biologists believe that these skills evolved so that the birds could tell potential mates and rivals how healthy and skillful they were.

So if you ever wondered why you can’t quite sing like a blackbird, now you have a good excuse!

4. Data transmitted on the wing

Computers are great ways of moving data from one place to another and the internet can let you download or share a file very quickly. Before I had the internet at home if I wanted to work on a file on my home computer I had to save a copy from my work computer onto a memory stick and plug it in to my laptop at home. Once I ‘got connected’ at home I was then able to email myself with an attachment and use my home broadband to pick up file. Now I don’t even need to do that. I can save a file on my work computer, it synchronises with the ‘cloud’ and when I get home I can pick up where I left off. When I was using the memory stick my rate of data transfer was entirely down to the speed of road traffic as I sat on the bus on the way to work. Fairly slow, but the data definitely arrived in one piece.

In 1990 a joke memo was published for April Fool’s Day which suggested the use of homing pigeons as a form of internet, in which the birds might carry small packets of data. The memo, called ‘IP over Avian Carriers’ (that is, a bird-based internet), was written in a mock-serious tone (you can read it here) but although it was written for fun the idea has actually been used in real life too. Photographers in remote areas with minimal internet signal have used homing pigeons to send their pictures back.

The beautiful (and quite possibly wi-fi ready, with those antennas) Victoria Crowned Pigeon. Not a carrier pigeon admittedly, but much more photogenic.  Image by Foto-Rabe from Pixabay

A company in the US which offers adventure holidays including rafting used homing pigeons to return rolls of films (before digital film took over) back to the company’s base. The guides and their guests would take loads of photos while having fun rafting on the river and the birds would speed the photos back to the base, where they could be developed, so that when the adventurous guests arrived later their photos were ready for them.

Further reading

Pigeons keep quirky Poudre River rafting tradition afloat (17 July 2017) Coloradoan.

5. Serious fun with pigeons

On April Fool’s Day in 2002 Google ‘admitted’ to its users that the reason their web search results appeared so quickly and were so accurate was because, rather than using automated processes to grab the best result, Google was actually using a bank of pigeons to select the best results. Millions of pigeons viewing web pages and pecking picking the best one for you when you type in your search question. Pretty unlikely, right?

In a rather surprising non-April Fool twist some researchers decided to test out how well pigeons can distinguish different types of information in hospital photographs. They trained pigeons by getting them to view medical pictures of tissue samples taken from healthy people as well as pictures taken from people who were ill. The pigeons had to peck one of two coloured buttons and in doing so learned which pictures were of healthy tissue and which were diseased. If they pecked the correct button they got an extra food reward.

Pigeon, possibly pondering people’s photographs. Image by Davgood Kirshot from Pixabay

The researchers then tested the pigeons with a fresh set of pictures, to see if they could apply their learning to pictures they’d not seen before. Incredibly the pigeons were pretty good at separating the pictures into healthy and unhealthy, with an 80 per cent hit rate.

Further reading

Principle behind Google’s April Fools’ pigeon prank proves more than a joke (27 March 2019) The Conversation.

6. Today’s puzzle

You can download this as a PDF to PRINT or as an editable PDF that you can fill in on a COMPUTER.

You might wonder “What do these kriss-kross puzzles have to do with computing?” Well, you need to use a bit of logical thinking to fill one in and come up with a strategy. If there’s only one word of a particular length then it has to go in that space and can’t fit anywhere else. You’re then using pattern matching to decide which other words can fit in the spaces around it and which match the letters where they overlap. Younger children might just enjoy counting the letters and writing them out, or practising phonics or spelling.

We’ll post the answer tomorrow.

7. Answer to yesterday’s puzzle

The creation of this post was funded by UKRI, through grant EP/K040251/2 held by Professor Ursula Martin, and forms part of a broader project on the development and impact of computing.

Previous Advent Calendar posts

What are birds actually saying?

by Dan Stowell and Paul Curzon, Queen Mary University of London

Birds make so much noise, and it’s very complex. Is it just babble, or are they saying complicated things to each other? If so, could we work out what they are saying, what it means? Could we learn their language and speak to the birds?

Gull image by Thanasis Papazacharias from Pixabay

We know that bird communication is not as complicated as the words and sentences in human speech. So far, no one has been able to find grammatical patterns like those we find in human language. There apparently aren’t rules for birds like the ones we have about verbs and nouns. Birds don’t have to learn grammar! Exactly how complex bird languages are is still hotly debated, though.

Sometimes they’re passing on information about predators, or food, or sometimes just advertising their own fitness – showing off to get a mate (a bit like karaoke nights). Scientists have proved that such specific kinds of information are in the sounds birds make by observing bird behaviour. By playing recordings of birds and seeing how other birds react, they can see what information was communicated by a particular sound. If you play a ‘predator near’ call, for example, then other birds flee, but they stay put if you play other calls. They get the message.

Birds are definitely passing on specific information when they sing.

It turns out some birds have even learnt the languages of other animals and use it both to help those other animals and to support a life of crime. Many animals listen for the alarm calls of the animals around them, and so flee when others see a problem. Birds called Drongos, for example, act as lookouts for Meerkats, giving warning calls when they see Meerkat predators, allowing them to return to the safety of their burrows. However, the Drongos also sound false alarms every so often. They do it when they see a Meerkat with some juicy morcel. As the Meerkats run, the Drongo swoops in to steal the abandoned food.

Unfortunately for the Drongo, Meerkats are quite clever and get wise to the con. Eventually, they start to ignore the Drongo and only listen for their own Meerkat sentry’s call. The Drongo has another trick though. They are really good at mimicking sounds they hear, just like parrots. They have learnt to speak Meerkat just like the scientists do in experiments. So when the Meerkats stop reacting, the Drongos just switch tactics and start making perfect Meerkat language alarm calls instead. Once again the food is theirs.

Drongos give false alarms so they can steal food.

While most of us can’t reproduce bird sounds ourselves, and so talk directly to animals, we can certainly write programs to do it. In Star Wars, C3PO is a master of languages, speaking millions. Real robots of the near future will be able to mimic the sounds of whatever animals they wish and communicate with them in at least the simple ways that animals of different species listen and talk to each other. Perhaps something like this might be used to help protect endangered species from their predators, for example, watching for hawks and issuing timely warnings. We just have to hope they don’t turn to the Dark Side, like the Drongos, and use these skills to support a life of crime.

 


This article was originally published on CS4FN and in issue 21 of the CS4FN magazine ‘Computing Sounds Wild’ on p3. You can download a PDF copy of Issue 21, as well as all of our previous published material, free, at the CS4FN downloads site.

Computing Sounds Wild explores the work of scientists and engineers who are using computers to understand, identify and recreate wild sounds, especially those of birds. We see how sophisticated algorithms that allow machines to learn, can help recognize birds even when they can’t be seen, so helping conservation efforts. We see how computer models help biologists understand animal behaviour, and we look at how electronic and computer generated sounds, having changed music, are now set to change the soundscapes of films. Making electronic sounds is also a great, fun way to become a computer scientist and learn to program.

Front cover of CS4FN Issue 21 – Computing sounds wild