Test characteristics

Revision as of 22:14, 15 January 2024 by Nikolas (talk | contribs)

Any single test or investigation has a certain characteristics which are important to know about.

True and false positive and negative

When researching a certain test's characteristics and usefulness in diagnosing a certain disorder, one would perform the test on both healthy people and people with the disorder. A good test will be positive in most people with the disorder and negative in most people without the disorder. However, because no test is perfect, the test will be negative in some people with the disorder, and it will be positive in some people without the disorder.

After performing the test on a number of subjects, each subject will be put into one category:

  • True positives (TP) - those who have the disorder and tested positive with the test
  • False positives (FP) - those without the disorder but who tested positive anyway
  • True negatives (TN) - those without the disorder and who tested negative
  • False negatives (FN) - those with the disorder but who tested negative anyway

The best tests have as few false positives and false negatives as possible.

Sensitivity

For a given test and illness, sensitivity refers to the proportion of sick people who are tested that produce a positive test result. In terms of the above, in a certain population, sensitivity refers to the ratio of how many people are True Positives of those who have the disorder (True Positives + False Negatives).

A perfectly sensitive test is 100% sensitive, meaning that 100% of tested sick people will test positive, meaning that no sick people will test negative. Few tests are 100% sensitive, and in real-life, most test which are regarded as highly sensitive have sensitivities around 95%.

An example of a highly sensitive test is measuring D-dimer in suspected venous thromboembolism, which has a sensitivity of 95%.

An example of a test which is less sensitive is chest radiography in suspected rib fracture, which has a sensitivity of approximately 60%.

The sensitivity of a test is not affected by the prevalence of the disease in the tested population.

Specificity

For a given test and illness, specificity refers to the proportion of healthy people who are tested that produce a negative test result. In terms of the above, in a certain population, specificity refers to the ratio of how many people are True Negatives of those who do not have the disorder (True Negatives + False Positives).

A perfectly specific test is 100% specific, meaning that 100% of tested healthy people will test negative, meaning that no healthy people will test positive. Few tests are 100% specific, and in real-life, most test which are regarded as highly specific have specificities around 95%.

An example of a highly specific test is measuring anti-tissue glutaminase antibodies in suspected coeliac disease, which has a specificity of 95%.

An example of a test with low specificity is measuring PSA in suspected prostate cancer, which has a specificity of 20%.

The specificity of a test is not affected by the prevalence of the disease in the tested population.

Unfortunately, when designing a test, there is most commonly a tradeoff between sensitivity and specificity. One cannot design a test to be perfectly sensitive, as that would produce many false positives, as such giving a low specificity, and vice-versa.

Receive operating characteristic curve (ROC curve)

A receive operating characteristic curve (ROC curve) refers to a curve formed by a test's sensitivity on the y axis and the inverse of the specificity on the x axis. A ROC curve allows for visualisation of both sensitivity and specificity simultaneously. The area under the ROC curve is a good estimate of the test's performance. A perfect test has an area of under the ROC curve of 1, while the worst test has an area of 0.5. Most real-life tests lie somewhere in-between, but the closer to 1, the better.

Pre-test and post-test probability

The pre-test probability refers to the probability that a patient with a certain symptom or clinical finding has a certain condition before performing a test or investigation. For an asymptomatic person, the pre-test probability is equal to the prevalence of the disease in the general population. For a symptomatic person, the pre-test probability is equal to the prevalence in a population of people with the condition and the previously mentioned symptom. As such, if the patient has symptoms or clinical findings, they have a higher probability of having the disease than the general population, and so the pre-test probability is higher than the prevalence.

As an example, the prevalence of urinary tract infection in the general population is 11%, meaning that, if you pick a random person in the world, there is an 11% chance that they have an UTI, regardless of whether they have symptoms. However, among all people with typical urinary tract infection symptoms, approximately 80% of them have urinary tract infection. As such, if a person has typical urinary tract infection symptoms, the pre-test probability of them having UTI is 80%.

Likewise, the post-test probability refers to the probability that a patient with a certain symptom or clinical finding has a certain condition after performing the test or investigation. If a test's characteristic does not allow one to be more certain of the diagnosis after performing the test, the test is quite useless. Ideally, a test should increase the post-test probability to be much higher than the pre-test probability.

Urine analysis example

The pre-test probability of urinary tract infection in an adult, non-elderly woman with typical UTI symptoms is approximately 80%. On the other hand, if the woman has symptoms which are less typical for UTI, the pre-test probability is 50%.

In the woman with typical UTI symptoms and therefore a pre-test probability of 80%, performing a urine dipstick test can yield a post-test probability of anywhere from 40-95%, depending on the result. If the dipstick is negative for nitrite and leukocytes, the post-test probability is 40%, and if positive for both, 95%. If positive for only one of the two, the post-test probability is still 90%.

In the woman with atypical UTI symptoms and therefore a pre-test probability of 50%, performing the urine dipstick analysis can yield a post-test probability of anywhere from 10% to 90%, depending on the result. If the dipstick is negative for nitrite and leukocytes, the post-test probability is 10%, and if positive for both, 90%. If positive for only one of the two, the post-test probability is still 75%.

Between the two patients, the second patient is actually the one who will benefit more from the urine dipstick analysis. This is because, in a patient with typical UTI symptoms, the post-test probability is still relatively high despite a negative dipstick (40%), so one would usually prescribe antibiotics regardless of the result. And in most people with typical UTI symptoms, the dipstick is positive anyway. Because of this, many say it's not necessary to perform a dipstick test in an adult, non-elderly woman with typical UTI symptoms, especially if they've had the same symptoms before.

However, for the other patient, the dipstick provides much value. If the urine is negative for nitrite and leukocytes, the post-test probability is only 10%, and so antibiotics are very unlikely to be of benefit. As such, in this patient, performing the test can help decide whether antibiotics should be administered or not, and whether to evaluate for other causes of the symptoms.

Positive predictive value

For a given test and illness, the positive predictive value (PPV) of a test refers to the probability that a patient has the illness if they have tested positive. Intuitively, it can be difficult to understand the difference between specificity and PPV, and I've given up trying to understand why. However, it's only important to know that the positive predictive value of a test is perhaps more important for us than sensitivity, as it tells us more about the usefulness of a test than the test's sensitivity.

In a certain population, positive predictive value refers to the ratio of how many people are True Positives of those who tested positive (True Positives + False Positives).

In many cases, test with high specificity have high positive predictive value as well. I can't think of any specific examples.

However, and this is important to know, the positive predictive value of a test depends not only on the test's characteristics but also the pre-test probability of the disorder, which in turn is equal to the prevalence of the disorder (if there are no symptoms). When the pre-test probability increases, the PPV increases as well, and vice-versa. As such, even if the test is excellent and has a high specificity and sensitivity, the test may have a low positive predictive value regardless if the prevalence is low (the disease is rare).

Negative predictive value

For a given test and illness, the negative predictive value (NPV) of a test refers to the probability that a patient does not have the illness if they have tested negative. Like positive predictive value, the negative predictive value is important as it tells us the probability that the patient does not have the disease if they test negative.

In a certain population, negative predictive value refers to the ratio of how many people are True Negatives of those who tested negative (True Negatives + False Negatives).

In many cases, tests with high sensitivity have high negative predictive value as well. For example, unless the pre-test probability is high, a negative D-dimer has a close to 100% negative predictive value for venous thromboembolism. As such, patients with a low or medium pre-test probability for VTE who test negative for D-dimer have VTE ruled out.

As with PPV, the negative predictive value of a test depends not only on the test's characteristics but also the pre-test probability of the disorder. However, in contrast to PPV, NPV decreases as the prevalence increases. As such, even if the test is really accurate and has a high specificity and sensitivity, the test may have a low negative predictive value regardless if the prevalence is high (the disease is common).

Precision and accuracy

The precision of a test or investigation refers to the reproducibility or consistency of the result. When repeating a test on the same sample, the machine should produce the same result every time. However, because machines and chemistry is impossible to accurately predict, there will be some variance in the results. When a test has a high precision, the results are very close to each other. Let's say that we have a blood glucose sample. and we use to separate methods to measure the blood glucose, one precise and one not, and we repeat the measurement five times for each. The results may look like this:

  • For the low-precision test: 6.5, 3.2, 4.9, 7.5, 3.1
  • For the high-precision test: 4.5, 4.3, 4.6, 4.5, 4.4

It's obvious that the high-precision test is more valuable for us, but it's important to know that test accuracy is unrelated to precision.

The accuracy of a test refers to how close the result is to the actual value in the sample, i.e. how well it represents the truth. For example, a test which measures the CRP as 9 mg/L when the CRP in the sample is actually 4 is not accurate.

A test can be precise without being accurate. For example, in the aforementioned blood glucose example, if the blood glucose level in the sample was actually 4.5, the high-precision test was both precise and accurate. However, a high-precision low-accuracy test could produce the same results if the blood glucose level in the sample was 7.6, for example.

Analytical variation

When performing any test, there will be some random variation in the analysis. This can be because of temperature changes in the laboratory, differences in reagents, differences in pipetted volume of sample, etc.

For example, when measuring leukocytes, there is a 2% analytical variation in the measurement when the true value is around 7x109/L. As such, the result can vary by 0.14 units just due to analytical variation. A measurement of 7 one day and 6.9 the next day therefore does not necessarily reflect an actual decrease in the leukocytes in the sample; it could just be due to analytical variation.

Analytical variation varies from laboratory to laboratory and from test to test. It can usually be looked up in your local laboratory handbook.

Other examples of analytical variation (in my local laboratory):

  • <3% for values around 25 mg/L for CRP
  • 7% for values between 15 - 700 µg/L for ferritin
  • 9% for values around 20 ng/L for troponin I

Biological variation

The human body is tightly regulated by homeostasis, but no compound in the blood stays at the exact same level over time. The concentration of compounds in the blood change with age, time of day, food intake,

Reference range

When designing a quantitative test, one must determine a range of values which are regarded as "healthy". This is made by using the test to make many measurements on a large, healthy population and plotting those results. This forms a normal distribution curve. Then, the referance range is chosen so that 95% of the measurements from the large, healthy population end up within that range (each end of the range is two standard deviations from the mean). This means that 5% of healthy people have measurements that end up outside the reference range, which is important to know! As such, values which are slightly outside the reference range may be normal. However, the farther away from the reference range the value is, the higher the probability of pathology.

The obvious next question is why we choose to define the reference range so that 5% of healthy people are outside the range. Why not define it so that 100% of the measurements from the healthy population end up in the range? The reason for this is threefold:

  1. Because it affects the test's sensitivity
    • A reference range defined by 99% instead of 95% of the healthy population would have a lower sensitivity (but a higher specificity, but we usually prefer a higher sensitivity)
  2. It eliminates any outliers which would mess up the interval
  3. Those with measurements in the 5% range may have subclinical disease