Another Covid-19 blog post for the non-expert, this time on what’s meant by test accuracy.
Before we get started, a question to stimulate your curiosity. The percentage of the population infected with Covid-19 is uncertain but let’s assume it’s very low at 1 in a 1,000 people (yes I know it’s likely higher than that but for the sake of argument). Let’s also assume a test for Covid-19 has an accuracy of 99%. If someone is tested and found positive, what’s the probability they have the disease? Obviously it’s 99% – right? but you’d be wrong, it’s actually just 9%. The reason why, gets us into statistics, but I have tried to keep the post as non-mathematical as possible. For the mathphiles however, I’ve included the equation at the end of the post. If you’re a mathphobe, please feel free to ignore it. Either way, some of the outcomes might surprise you.
There’s been a lot of discussion recently about new widespread testing of the population for Covid-19 using the so-called antibody test. Current data on Covid-19 infection comes from a test to determine if you have the virus. The antibody test, which we hope is introduced soon, looks to see if it has infected you in the past (the ideal time for the antibody test is around 28 days after infection). These antibody tests are currently being assessed for accuracy, and we’ve heard echoes of Brexit, “no test is better than a bad test”. So what constitutes a good or a bad test? To answer that question, let’s get back to the opening paragraph. Assume Covid-19 affects 1 in a 1,000 people and a test is 99% accurate. If someone is tested and found positive, what’s the probability they’ve got the disease? If you said 99%, then you fell into what’s known as the Bayesian trap. It’s named after the 18th century statistician Thomas Bayes, who at the time believed his finding were nothing special. When you’ve read this post you might disagree.
Although not intuitively obvious, the probability of having Covid-19 following a positive test depends upon your prior chances of having the disease. In our example 1 in a 1,000 people in the population have Covid-19 and so the prior chance is 1 in a 1,000. Assume you are one person in a random sample of 1,000 people taken from the population and you test positive. If the test is 99% accurate, then one in a hundred will give a false positive. This means, if the other 999 people, who do not have Covid-19, were tested, 1% would give a false positive, that’s 9.99 (call it 10) people. You are therefore one in a group of 11 people, all testing positive. You are the only true positive, the others are false positives and so 1 in 11 is 9%. I’ve shown the equation below and if you’re really bored in lockdown, you can experiment with the calculations yourself. Those more trusting can take my word for it.
I must emphasise I’m talking about testing randomly across the population. This is not being done in the UK with Covid-19 currently, it’s only those with symptoms in hospital and health workers who are being tested. This is not random and so different statistical calculations apply. If antibody random testing across the population is introduced however, watch out for the Bayesian trap.
The biggest issue with calculating these Bayesian statistics is knowing the occurrence in a population. In the example above the occurrence was one in a thousand, but in practice that number can be hard to measure. This is the case generally and even more so for Covid-19 because it’s such a new disease, and the infection rate is constantly changing. At the time of writing there are 61,000 positive tests in the UK with a population of 66 million. That’s an occurrence of 0.1% (1 in a 1,000) and if that’s the true occurrence, assuming an antibody test is 99% accurate, a positive test means the probability you’ve truly had the disease is just 9%. That’s worrying low. But the 61,000 positives aren’t randomly spread, these come from hospital cases and so perhaps the percentage tested versus those found to be positive is a better estimate. Rounding the numbers, the 61,000 positives were in a total number of 230,000 tests, so that’s 26% – let’s call it 30% for the sake of argument. With that infection rate, if we conducted random testing using a 99% accurate test and you were positive, then the probability that’s a correct result is 97%, which is very good. The problem is, like the 0.1% figure, the 30% infection rate is unreliable because it comes from hospital cases, which is not a random sample. The best estimates we have currently for the world-wide rate of infection is between 1.88 and 11.43%. Taking the higher value (11.43%), and assuming a 99% accurate test, the probability of you having Covid-19 if you test positive is 92.7%. Taking the lower value (1.88) the probability is 65.5%.
The antibody tests are being assessed for accuracy currently, but what effect does the test accuracy have on the probability of any one test being correct? If the test was 90%, instead of 99%, and the infection rate was 0.1%, then there’s virtually no difference; a positive test still means you have about a 9% probability of having the disease. If the infection rate is 30%, the probability of you having Covid-19 following a positive test is 79%, lower than 97% probability achieved with the 99% accurate test, but still not bad.
So what do all these figures really mean? A good way of looking at this is to plot the data on a graph which is shown in the Figure. The graph plots the proportion of the population infected with Covid-19 versus the probability any one test will give a true positive result. I plotted four lines, showing 99%, 95%, 90% and 80% accuracy in the antibody test in a random population. We can see, as the infection rate in the population falls, so does the probability of any single test giving a correct answer. The shape of the curves is also important. Take the 99% accuracy curve, for example. The curve increases rapidly to an infection rate of about 10% then flattens off. Without knowing the correct infection rate in a population, you need that flat part of the curve to extend across as wide an infection rate as possible. It’s only if the infection rate is lower than about 10%, the probability of any one test giving a correct answer falls off. The curve for the 80% accuracy curve on the other hand, falls away very quickly. A high test accuracy is therefore necessary because the probability of a true test remains acceptable across a wider range of infection rates. This, at least to some extent, offsets the lack of knowledge about the percentage of the population infected with the virus.
Returning to the example of 1 in a 1,000 and a 99% test accuracy, we have already established a positive test equates to a probability 9% of having Covid-19. But what if you did a second confirmatory test, would that double your chances to 18%? Actually no, because with a second test your prior chance of having Covid-19 is not 0.1% but 9% and if you plug that number into the equation below you get a probability of 91%. This assumes (1) the first and second tests are independent of each other and (2) both tests were positive, but all this gets a bit too complicated for a simple blog post.
Epidemiologists are interested in the infection rate across the entire population, and the antibody test is important in this respect. Individuals however, are more concerned whether or not they have had Covid-19 and the problem is, without knowing the epidemiological statistics on the infection rate, you don’t know the probability of any individual test being true. One set of figures feeds into the other and over time, with more data, the statistics get progressively better. In fact this is what Bayesian theory says, the more data you have the more accurate picture of the world you have.
Running down to the chemist, or buying a test on-line, suddenly doesn’t seem so attractive does it? But perhaps the slogan, “test, test, test” now makes sense, because we need to build data across the total population. In this respect Mr Spock was right when he said, “logic clearly dictates that the needs of the many outweigh the needs of the few.” Things are not always as simple as they might seem.
For the mathphiles
Keeping to the example of 1 in a 1,000 people in a population having the disease and a test accuracy of 99% then the calculation of the probability of any test being true is as follows:
PD = 0.09, or 9%
Note that tests often quote one accuracy for false positives and another for false negatives. I have simplified the calculations here by basing them only on false positives. In practice the two values, positives and negatives, are usually close.