Are you there yet?

Plenty of people love the Weasley family’s clock from the Harry Potter books and films. It shows where members of the family are at any given time. Instead of numbers giving the time, the clock face has locations where someone might be (home, school, shopping) and the many hands on the clock show the family members. The wizarding world uses magic to make their whereabouts clock work, but muggles (and squibs) can use mobile network data to build a simple version, and use Bayesian networks to improve it.

A cell phone tower looking up from inside to a blue sky

Your mobile phone is in contact with several cell towers in the mobile provider’s network. When you want to send a message, it goes first to the nearest cell tower before passing through the network, finally reaching your friend’s phone. As you move around, from home to school, for example, you will pass several towers. The closer you are to a tower the stronger the signal there, and the phone network uses this to estimate where you are, based on signal strength from several towers. This means that, as long as your phone is with you, it can act as a sensor for your location and track you, just like the Weasley’s whereabouts clock.

You could also have a similar system at home that monitors your location, so that it switches on the lights and heating as you get closer to home to welcome you back. On a typical day you might head home somewhere between 3 and 6pm (depending on after-school events) and as you leave school the connection to your phone from the tower nearest the school will weaken, but connections will strengthen with the other cell towers on your route home. But what if you appear to be heading home at 11 in the morning? Perhaps you are, or maybe actually the signal has just dropped from the tower nearest to the school so a tower nearer your home is now getting the strongest signal!

A system using Bayesian logic to determine ‘near home’ or ‘not near home’ can be trained to put things into context. Unless you are ill, it’s unlikely that you’d be heading home before the afternoon so you can use these predicted timings to give a likelihood score of an event (such as you heading home). A Bayesian network takes a piece of information (‘person might be nearby’) and considers this in the context of previous knowledge (‘and that’s expected at this time of day so probably true’ or ‘but is unlikely to be nearby now so more information is needed’). Unlike machine learning which just looks for any patterns in data, in a Bayesian networks approach the way one thing being considered does or does not cause other things is built in from the outset. Here it builds in the different possible causes of the signal dropping at a cell tower.

You could also set up a similar system in a home using wifi points to predict where you are and so what you are doing. Information like that could then feed data into a personalised artificial intelligence looking after you. Not all magic has to be run by magic!

-Jo Brodie, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project. This article was inspired by

Inspired by the blog on Presence Detection Part 1: Home Assistant & Bayesian Probability and a previous cs4fn article on making a Whereabouts Clock.

So, so tired…

Fatigue is a problem that people with a variety of long-term diseases can also suffer from.

A man, hands over face, very, very tired.
Image by Małgorzata Tomczak from Pixabay

This isn’t just normal tiredness, but something much, much worse: so bad that it is a struggle to do anything at all, destroying any chance of a normal life. Doctors can often do little to help beyond managing the underlying disease, then hope the fatigue sorts itself out. Sometimes fatigue can stay with the person long, long after. Maha Albarrak, for her PhD, is exploring how computer technology might help people cope. Her first step is to interview those suffering to find out what kind of help they really need. Then she will work closely with volunteers to come up with solutions that solve the problems that matter.

– Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Is your healthcare algorithm racist?

Algorithms are taking over decision making, and this is especially so in healthcare. But could the algorithms be making biased decisions? Could their decisions be racist? Yes, and such algorithms are already being used.

A medical operation showing an anaesthetist and the head of the patient
Image by David Mark from Pixabay

There is now big money to be made from healthcare software. One of the biggest areas is in intelligent algorithms that help healthcare workers make decisions. Some even completely take over the decision making. In the US, software is used widely, for example, to predict who will most benefit from interventions. The more you help a patient the more it costs. Some people may just get better without extra help, but for others it means the difference between a disability that might have been avoided or not, or even life and death. How do you tell? It matters as money is limited, so someone has to choose. You need to be able to predict outcomes with or without potential treatments. That is the kind of thing that machine learning technology is generally good at. By looking at the history of lots and lots of past patients, their treatments and what happened, these artificial intelligence programs can spot the patterns in the data and then make predictions about new patients.

This is what current commercial software does. Ziad Obermeyer, from UC Berkeley, decided to investigate how well the systems made those decisions. Working with a team combining academics and clinicians, they looked specifically at the differences between black and white patients in one widely used system. It made decisions about whether to put patients on more expensive treatment programmes. What they found was that the system had a big racial bias in the decisions it made. For patients that were equally ill, it was much more likely to recommend white patients for treatment programmes.

One of the problems with machine learning approaches is it is hard to see why they make the decisions they do. They just look for patterns in data, and who knows what patterns they find to base their decisions on? The team had access to the data of a vast number of patients the algorithms had made recommendations about, the decisions made about them and the outcomes. This meant they could evaluate whether patients were treated fairly.

The data given to the algorithm specifically excluded race, supposedly to stop it making decisions on colour of skin. However, despite not having that information, that was ultimately what it was doing. How?

The team found that its decision-making was based on predicting healthcare costs rather than how ill people actually were. The greater the cost saving of putting a person on a treatment programme, the more likely it was to recommend them. At first sight, this seems reasonable, given the aim is to make best use of a limited budget. The system was totally fair in allocating treatment based on cost. However, when the team looked at how ill people were, black people had to be much sicker before they would be recommended for help. There are lots of reasons more money might be spent on white people, so skewing the system. For example, they may be more likely to seek treatment earlier or more often. Being poor means it can be harder to seek healthcare due to difficulties getting to hospital, difficulties taking time off work, etc. If more black people in the data used to train the system are poor then this will lead to them seeking help less, so less is being spent on them. The system had spotted patterns like this and that was how it was making decisions. Even though it wasn’t told who was black and white, it had learnt to be biased.

There is an easy way to fix the system. Instead of including data about costs and having it use that as the basis of decision making, you can use direct measures of how ill a person is: for example, using the number of different conditions the patient is suffering from and the rule of thumb that the more complications you have, the more you will benefit from treatment. The researchers showed that if the system was trained this way instead, the racial bias disappeared. Access to healthcare became much fairer.

If we are going to allow machines to take healthcare decisions for us based on their predictions, we have to make sure we know how they make those predictions, and make sure they are fair. You should not lose the chance of the help you need just because of your ethnicity, or because you are poor. We must take care not to build racist algorithms. Just because computers aren’t human doesn’t mean they can’t be humane.

– Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Solving real problems with Bayesian networks

Bayesian networks give a foundation for tools that support decision making based on evidence collected and the probabilities of one thing causing another (see “What are the chances of that?“).

COVID virus with DNA strand in background.
Image by Pete Linforth from Pixabay

The first algorithms that enabled Bayesian network models to be calculated on a computer were discovered separately by two different research groups in the late 1980s. Since then, a series of easy-to-use software packages have been developed that implement these algorithms, so that people without any knowledge of computing or statistics can easily build and run their own models.

These algorithms do ‘exact’ computations and can handle Bayesian networks for many different types of problems, but they can run into a barrier: when run on Bayesian networks beyond a certain size or complexity, they take too long to compute even on the world’s fastest computers. However, newer algorithms – which provide good approximate calculations rather than exact ones – have made it possible to deal with much larger problems, and this is a really exciting ongoing research area.

– Norman Fenton, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Diagnose? Delay delivery? Decisions, decisions. Decisions about diabetes in pregnancy

In the film Minority Report, a team of psychics – who can see into the future – predict who might cause harm, allowing the police to intervene before the harm happens. It is science fiction. But smart technology is able to see into the future. It may be able to warn months in advance when a mother’s body might be about to harm her unborn baby and so allow the harm to be prevented before it even happens.

Baby holding feet with feet in foreground.
Image by Daniel Nebreda from Pixabay

Gestational diabetes (or GDM) is a type of diabetes that appears only during pregnancy. Once the baby is born it usually disappears. Although it doesn’t tend to produce many symptoms it can increase the risk of complications in pregnancy so pregnant women are tested for it to avoid problems. Women who’ve had GDM are also at greater risk of developing Type 2 diabetes later on, joining an estimated 4 million people who have the condition in the UK.

Diabetes happens either when someone’s pancreas is unable to produce enough of a chemical called insulin, or because the body stops responding to the insulin that is produced. We need insulin to help us make use of glucose: a kind of sugar in our food that gives us energy. In Type 1 diabetes (commonly diagnosed in young people) the pancreas pretty much stops producing any insulin. In Type 2 diabetes (more commonly diagnosed in older people) the problem isn’t so much the pancreas (in fact in many cases it produces even more insulin), it’s that the person has become resistant to insulin. The result from either ‘not enough insulin’ or ‘plenty of insulin but can’t use it properly’ is that glucose isn’t able to get into our cells to fuel them. It’s a bit like being unable to open the fuel cap on a car, so the driver can’t fill it with petrol. This means higher levels of glucose circulate in the bloodstream and, unfortunately, high glucose can cause lots of damage to blood vessels.

During a normal pregnancy, women often become a little more insulin-resistant than usual anyway. This is an effect of pregnancy hormones from the placenta. From the point of view of the developing foetus, which is sharing a blood supply with mum, this is mostly good news as the blood arriving in the placenta is full of glucose to help the baby grow. That sounds great but if the woman becomes too insulin-resistant and there’s too much glucose in her blood it can lead to accelerated growth (a very large baby) and increase the risk of complications during pregnancy and at birth. Not great for mum or baby. Doctors regularly monitor the blood glucose levels in a GDM pregnancy to keep both mother and baby in good health. Once taught, anyone can measure their own blood glucose levels using a finger-prick test and people with diabetes do this several times a day.It will save money but also be much more flexible for mothers.

In-depth screening of every pregnant woman, to see if she has, or is at risk of, GDM costs money and is time-consuming, and most pregnant women will not develop GDM anyway. PAMBAYESIAN researchers at Queen Mary have developed a prototype intelligent decision-making tool, both to help doctors decide who needs further investigation, but also to help the women decide when they need additional support from their healthcare team.

The team of computer scientists and maternity experts developed a Bayesian network with information based on expert knowledge about GDM, then trained it on real (anonymised) patient data. They are now evaluating its performance and refining it. There are different decision points throughout a GDM pregnancy. First, does the person have GDM or are they at increased risk (perhaps because of a family history)? If ‘yes’ then the next decision is how best to care for them and whether or not to begin medical treatment or just give diet and lifestyle support. Later on in the pregnancy the woman and her doctor must consider when it’s best for her to deliver her baby, then later she needs ongoing support to prevent her GDM from leading to Type 2 diabetes. Currently in early development work, it’s hoped that if given blood glucose readings, the GDM Bayesian network will ultimately be able to take account of the woman’s risk factors (like age, ethnicity and previous GDM) that increase her risk. It would use that information to predict how likely she is to develop the condition in this pregnancy, and suggest what should happen next.

Systems like this mean that one day your smartphone may be smart enough to help protect you and your unborn baby from future harm.

– Jo Brodie, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Bayes’ theorem as an algorithm

Thomas Bayes is famous for the theorem named after him: Bayes’ theorem. (See What are the chances of that?) It can be used in any situation where we want to calculate a more accurate probability of something given extra evidence. We will look at a version for our virus problem from there. For a graphical version of what this algorithm is doing see A graphical explanation of Bayes’ theorem

We want to know the probability that you have the virus (called the “posterior probability”), given that have you just tested positive. In that case Bayes’ theorem becomes:

The theorem tells us that the chance that a person who tests positive actually has the virus is just the number of people with the virus who test positive divided by the total number of people (with or without the virus) who test positive.

The theorem can be used as the basis of an algorithm to compute the new, more accurate probability that we are after. We will assume, to make things easier to follow, that we are considering a population of a thousand people. We get the following algorithm:

To calculate accurate probability that you have the virus after testing positive:

  • STEP 1: Calculate how many people BOTH have the virus AND test positive.
  • STEP 2: Calculate the number of people who will test positive (whether they have the virus or not).
  • STEP 3: Divide ANSWER 1) by ANSWER 2) to give the final answer of the probability you have the virus after testing positive.

Let’s work through it with the numbers from our example. Stay calm! This is going to get hairy if you are not a computer!

What do we know? Well, actually we need another little algorithm to do Step 1:

To calculate how many people BOTH have the virus AND test positive (answer to step 1):

  • STEP 1a: Calculate the probability that you will test positive if you do have the virus.
  • STEP 1b: Calculate the probability you have the virus BEFORE knowing the test result.
  • STEP 1c: Multiply ANSWER 1a by ANSWER 1b by 1000 (our population).
A question mark in a globe network on a digital background
Image by Gerd Altmann from Pixabay

This calculates the answer to Step 1 for us. We have said we have a test that is always positive if you do have the virus (in reality tests do get it wrong this way too but, to keep things simple, we will ignore that here). That means the answer needed for Step 1a is a probability of 1 (meaning it is 100 per cent certain that it gets the answer right if you have the virus).

What about Step 1b? That is the country-wide probability of having the virus we are starting with. Knowing nothing else about an individual we have said 1 in 200 people have the virus. That makes the answer needed for this step: 1 / 200, so probability, 0.005

We can now calculate Step 1c: We just multiply those two numbers 1 x 0.005 and multiply that by the total number of people: 1000. This gives the answer that five people out of the 1000 have the virus and test positive.

Step 2 is a bit more tricky: it is the number of people out of our 1000 who test positive. That includes all those with the virus but ALSO those that the test wrongly says have the virus when they don’t. We need to add the numbers for these two groups: those with the virus and those without.

To calculate the number of people who test positive (answer to step 2):

  • STEP 2a: Calculate the number of people who have the virus AND who test positive (This is just the answer from Step 1.)
  • STEP 2b: Calculate the number of people who do NOT have the virus AND who test positive.
  • STEP 2c: Add ANSWER 2a and ANSWER 2b together.

We have already worked out the first part (Step 2a). It is just the answer from Step 1, so we already know it is five people. Step 2b is calculated in a similar way to Step 1 as follows:

To calculate the number of people who do not have the virus AND who test positive (answer to step 2b):

  • STEP 2bi: Calculate the probability that you will test positive if you do NOT have the virus.
  • STEP 2bii: Calculate the probability you do not have the virus.
  • STEP 2biii: Multiple ANSWER 2bi by ANSWER 2bii and then by 1000 to give the number of people who do not have the virus but test positive.

We know the answer to Step 2bi, as we said there was a two per cent chance of the test telling you that you had the virus when you didn’t. That means the answer to this step is 2 / 100 = 0.02.

For Step 2bii, the probability a person does NOT have the virus, we just need to calculate the rest of the population excluding those with the virus. We said one in every 200 people have the virus. That means 199 in 200 do not have it. The answer to this step is therefore 199 / 200 = 0.995.

So, to work out Step 2biii to find out the number of people who do not have the virus but test positive: we multiply our two above answers 0.02 x 0.995, then multiply this by 1000. This gives answer 19.9: so about 20 out of the 1000 people are incorrectly told they have the virus.

We can now go back to Step 2c and add the answer from Step 2a (of those correctly told they have the virus) to that from Step 2b (those told they have the virus when they do not). This is 5 + 20, so 25 people in total are given a positive result. This is the answer to Step 2.

Finally, we can work out the overall, more accurate probability (Step 3). Divide the answer from Step 1, (five people), by the answer to Step 2 (25 people), to give the final probability as 5 / 25 = 0.2 or a 20 per cent chance you actually have the virus after testing positive.

Don’t forget we have just made up the numbers here to show the maths. Although no test is 100 per cent accurate, the current Covid tests can be confirmed with an additional test to give further evidence.

Norman Fenton and Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Cloudy with a chance of pain

One day you may have personalised pain forecasts…

A red sky full of pink and grey clouds
Image by Cindy Eyler from Pixabay

Very many people suffering from diseases like arthritis think that the weather affects the pain they feel. Many dread the coming of Autumn, for example, as they know their lives will get worse in the cold and wet weather. Others have found that trips to warmer countries have helped reduce their suffering from pain. Doctors have long been sceptical of these claims as there has been little evidence to support it, but then there have only been a few small-scale studies investigating it. It also isn’t helped by the fact that different people believe different weather affects them and in different ways: some like it hot, some like it cold, some feel they suffer most when the rain comes…

A team from the University of Manchester realised that they could use crowd-sourced science, where members of the public collect data on their phones, to do a massive experiment to find out the truth. 13,000 people suffering from long-term pain took part, recording how much pain they were in every day for over a year. Their phones recorded their location and linked it to the local weather at the time. This gave the researchers millions of reports of pain to analyse against the specific weather that person was actually experiencing at the time.It gave the researchers millions of reports of pain to analyse.

So who was right: the doctors or the patients? Well, actually many of the patients were right as the weather did affect the amount of pain they personally felt. Especially problematic were days when the humidity was high, the air pressure was low, or the wind was very strong (in that order).

These results mean clinicians can now start to take the weather seriously. It may also be possible to create pain forecast programs for people affected, based on their local weather reports. It also opens up new areas to study to understand the causes of pain and so find new ways to alleviate it.

So, if you are unlucky to suffer from chronic pain, it could be your phone that, one day, might be warning you that tomorrow will be cloudy with a chance of pain.

Visit the Cloudy with a Chance of Pain project website

– Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

A simple Bayesian network for having a virus

Computer tools based on what are called “Bayesian networks” give accurate ways to determine how likely things are. For example, they give a good way, based on evidence, to determine how likely a given person has COVID. As you collect more evidence, the probability the network gives becomes more accurate.

A Bayesian network for having a virus

How likely is it that you have COVID? There is lots of evidence you might collect to decide whether or not you do. It causes many (but not all) people to cough. So if you do have the a cough that is useful evidence. Other things like flu, however, also cause people to cough. Catching COVID is also known to be caused by breathing the same air as infected people. The more socialising you have done the higher the chance you have caught it, but also the more people in your area with the disease the more likely it is that you have caught it by socialising. You can also take a test – having COVID will cause a positive result. However, tests are not fully accurate, so even with a positive test you may or may not have COVID…

Deciding how likely it is you have COVID relies on knowing lots of facts about the causes of COVID and about the symptoms it causes. It also relies on knowing the probabilities of things such as how likely it is that COVID causes a cough. Finally it relies on knowing lots of facts about you such as whether you have had a positive test result or not.

A Bayesian network is just a way of drawing a diagram that collects all this information in one place. Once created it can be used to determine how likely things like whether you have COVID are to be true based on known facts, known causes, and the chances of one thing causing another. It gives a powerful way to reason about these facts and probabilities based on “causal relationships”. That reasoning allows accurate probabilities to be calculated about the things you are interested in knowing. Given I have a cough and no other symptoms, have had a negative test but have recently socialised outside my family, am I 80 per cent certain I have COVID or is the chance I have it only 2 per cent?

We can take all the evidence for and against our having the virus and draw a Bayesian network as shown in the diagram. For each bubble the percentages show the chance that for a random person in the population this thing is currently true. Arrows show which things can cause others. So, in the diagram, this means that 0.5 per cent of the population currently have the virus (as 1 in 200 have the virus, so probability 0.005, and to turn a probability into a percentage you just multiply by 100); 0.4 per cent of the population have been in recent contact with an infected person; 10 per cent have a cough; 2 per cent have flu, and so on. This is all general evidence we can collect about the country as a whole. (Note we have made up these numbers for the example as they may change over time, but they are the kinds of data scientists collect to help policy makers make decisions.)

The model also includes probabilities not shown, like the chance of a person getting the virus if they have been in recent contact with an infected person and the probability of a positive test depending on whether they do, or do not, have the virus. Once a particular Bayesian network like this has been created it can form the basis of a decision making tool that does all the calculations.

We then want to know about you. Do you have a cough, have you lost your sense of taste or smell, what was the result of your test, and have you been in contact with an infected relative? From this information, we can update the probabilities in the Bayesian network (using a theorem called Bayes’ theorem) to give a new probability for how likely it is that you have the virus. Computer software can do this for us, though the more complicated the Bayesian network, the longer it takes to do all the calculations.

The result, though, is that the computer can give you a personalised risk assessment of how likely it is that you have the virus based on the specific evidence about you. You can find such a comprehensive personal COVID risk calculator, based on a Bayesian network with much more data, at covid19.apps.agenarisk.com/

– Norman Fenton and Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

Here

A manequin in shadow, arms spread wide waiting to be dressed
Image by Zaccaria Boschetti from Pixabay

Amy Dowse wondered if an app might help people suffering with anxiety. One way to overcome panic attacks is a mindfulness technique where you focus on the here and now – your surroundings rather than your internal feelings. For her university MSc project, she created an app to help people do this, called Here. It prompts you to look for coloured objects in the real world then use them to build a picture in the app. For example, you look at the colour of the clothes that people around you are wearing and try to fully dress a figure on the app using what you see.

– Paul Curzon, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.

A graphical explanation of Bayes theorem

A diagrammatic proof

If you take a test how do you work out how likely it is that you have the virus? Bayesian reasoning is one way (see “What are the chances of that”). Here is a graphical version of what that kind of reasoning is actually about.

If recent data shows that the virus currently affects one in 200 of the population, then it is reasonable to start with the assumption that the probability YOU have the virus is one in 200 (we call this the ‘prior probability’). Another way of saying that is that the prior probability is 0.5 per cent.

A better estimate

Suppose the probability a random person has the virus is 1 in 200 or 0.5 per cent. With no other evidence, your best guess that you have the virus is then also 0.5 per cent. You have also however taken a test and it was positive. However, for every 100 people taking the test, 2 will test positive when they actually do NOT have the virus. This means that the false positive rate is 2 per cent.

How? Bayes worked out a general equation for calculating this new, more accurate probability, called the ‘posterior’ probability (see page 8). It is based, here, on the probability of having the virus before testing (the original, prior probability) and any new evidence, which here is the test result.

A surprising result

How likely is it that you have the virus? With only this evidence, the probability you have the virus is still only 20 per cent.

– Norman Fenton, Queen Mary University of London, Spring 2021

Download Issue 27 of the cs4fn magazine on Smart Health here.

This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.