Thomas Bayes is famous for the theorem named after him: Bayes’ theorem. (See What are the chances of that?) It can be used in any situation where we want to calculate a more accurate probability of something given extra evidence. We will look at a version for our virus problem from there. For a graphical version of what this algorithm is doing see A graphical explanation of Bayes’ theorem
We want to know the probability that you have the virus (called the “posterior probability”), given that have you just tested positive. In that case Bayes’ theorem becomes:
The theorem tells us that the chance that a person who tests positive actually has the virus is just the number of people with the virus who test positive divided by the total number of people (with or without the virus) who test positive.
The theorem can be used as the basis of an algorithm to compute the new, more accurate probability that we are after. We will assume, to make things easier to follow, that we are considering a population of a thousand people. We get the following algorithm:
To calculate accurate probability that you have the virus after testing positive:
- STEP 1: Calculate how many people BOTH have the virus AND test positive.
- STEP 2: Calculate the number of people who will test positive (whether they have the virus or not).
- STEP 3: Divide ANSWER 1) by ANSWER 2) to give the final answer of the probability you have the virus after testing positive.
Let’s work through it with the numbers from our example. Stay calm! This is going to get hairy if you are not a computer!
What do we know? Well, actually we need another little algorithm to do Step 1:
To calculate how many people BOTH have the virus AND test positive (answer to step 1):
- STEP 1a: Calculate the probability that you will test positive if you do have the virus.
- STEP 1b: Calculate the probability you have the virus BEFORE knowing the test result.
- STEP 1c: Multiply ANSWER 1a by ANSWER 1b by 1000 (our population).
This calculates the answer to Step 1 for us. We have said we have a test that is always positive if you do have the virus (in reality tests do get it wrong this way too but, to keep things simple, we will ignore that here). That means the answer needed for Step 1a is a probability of 1 (meaning it is 100 per cent certain that it gets the answer right if you have the virus).
What about Step 1b? That is the country-wide probability of having the virus we are starting with. Knowing nothing else about an individual we have said 1 in 200 people have the virus. That makes the answer needed for this step: 1 / 200, so probability, 0.005
We can now calculate Step 1c: We just multiply those two numbers 1 x 0.005 and multiply that by the total number of people: 1000. This gives the answer that five people out of the 1000 have the virus and test positive.
Step 2 is a bit more tricky: it is the number of people out of our 1000 who test positive. That includes all those with the virus but ALSO those that the test wrongly says have the virus when they don’t. We need to add the numbers for these two groups: those with the virus and those without.
To calculate the number of people who test positive (answer to step 2):
- STEP 2a: Calculate the number of people who have the virus AND who test positive (This is just the answer from Step 1.)
- STEP 2b: Calculate the number of people who do NOT have the virus AND who test positive.
- STEP 2c: Add ANSWER 2a and ANSWER 2b together.
We have already worked out the first part (Step 2a). It is just the answer from Step 1, so we already know it is five people. Step 2b is calculated in a similar way to Step 1 as follows:
To calculate the number of people who do not have the virus AND who test positive (answer to step 2b):
- STEP 2bi: Calculate the probability that you will test positive if you do NOT have the virus.
- STEP 2bii: Calculate the probability you do not have the virus.
- STEP 2biii: Multiple ANSWER 2bi by ANSWER 2bii and then by 1000 to give the number of people who do not have the virus but test positive.
We know the answer to Step 2bi, as we said there was a two per cent chance of the test telling you that you had the virus when you didn’t. That means the answer to this step is 2 / 100 = 0.02.
For Step 2bii, the probability a person does NOT have the virus, we just need to calculate the rest of the population excluding those with the virus. We said one in every 200 people have the virus. That means 199 in 200 do not have it. The answer to this step is therefore 199 / 200 = 0.995.
So, to work out Step 2biii to find out the number of people who do not have the virus but test positive: we multiply our two above answers 0.02 x 0.995, then multiply this by 1000. This gives answer 19.9: so about 20 out of the 1000 people are incorrectly told they have the virus.
We can now go back to Step 2c and add the answer from Step 2a (of those correctly told they have the virus) to that from Step 2b (those told they have the virus when they do not). This is 5 + 20, so 25 people in total are given a positive result. This is the answer to Step 2.
Finally, we can work out the overall, more accurate probability (Step 3). Divide the answer from Step 1, (five people), by the answer to Step 2 (25 people), to give the final probability as 5 / 25 = 0.2 or a 20 per cent chance you actually have the virus after testing positive.
Don’t forget we have just made up the numbers here to show the maths. Although no test is 100 per cent accurate, the current Covid tests can be confirmed with an additional test to give further evidence.
– Norman Fenton and Paul Curzon, Queen Mary University of London, Spring 2021
This post and issue 27 of the cs4fn magazine have been funded by EPSRC as part of the PAMBAYESIAN project.