We evaluate the validity of a study before examining its results because it will generally be inappropriate to apply the results of a biased study to our patients. If we cannot trust that the results reflect a reasonable estimation of the truth we seek to address, how can we then use those results to guide patient care? However, if we are satisfied with a study’s validity we need to know what the results mean and what to do with them.
In this segment of the evidence-based medicine series, we discuss several commonly reported study measures and how we can ultimately apply study findings for the good of patients. This is, after all, why we ask clinical questions in the first place.
Measures of Treatment Effect
For many types of clinical questions, the proportion of patients in each group experiencing an outcome is the most commonly reported result. This can be presented in several ways, each with subtly different effects.
For example, suppose a hypothetical trial of perioperative beta-blockade finds a postoperative mortality of 5% in the treatment group and 15% in the control group. In this study, the absolute risk reduction (ARR) is 0.15-0.05 = 0.10, and the relative risk (RR) of death is 0.05/0.15 = 0.33. In other words, the risk of death in the treatment group is one-third the risk of death in the control group, whereas the difference in risk between treated and untreated patients is 0.10, or 10%. The relative risk reduction (RRR) is (1-RR) x 100% = 67%, meaning that perioperative beta-blockers reduce the risk of death by 67%.
Although these numbers all seem quite different from one another, they are derived from the same study results: a difference in the proportion of deaths between the intervention groups. However, taken together they provide far more information than any individual result.
To illustrate this, suppose you knew the relative risk of death found in Study A was 10%, meaning the relative risk reduction was 90%. This may sound quite striking, until you later learn that the risk in the treatment group was 0.0001 and the risk in the control group was 0.001. This is quite different from Study B, in which the risk of death in the treatment group was 10% and the risk in the control group was 100%, even though the RR was still 10%. This difference is captured in the ARR. For the first study, the ARR was 0.0009 (or 0.09%), whereas in the second study the ARR was 0.90 (or 90%).
It can be difficult to communicate these differences clearly using terms such as ARR, but the number needed to treat (NNT) provides a more accessible means of reporting effects. The NNT is the number of patients you would need to treat to prevent one adverse event, or achieve one more successful outcome and is calculated as 1/ARR.
For Study A the NNT is 1,111, meaning we would need to treat more than 1,000 patients to prevent a single death. For many treatments, this would prove prohibitively costly and perhaps even dangerous depending on the frequency and severity of side effects. Study B, on the other hand, has an NNT of just over 1, meaning that nearly every treated case represents an averted death: Even though the relative risks are identical, the full meaning of the results is drastically different.
Other measures of treatment effect include odds ratios, commonly reported in case–control studies but actually appropriate in any comparative study, and hazard ratios, commonly reported in survival studies. We do not address these measures in more detail here, but loosely speaking the same principles discussed for relative risks apply.
Measures from Studies of Diagnostic Tests
When we order a diagnostic study, we are trying to gain information about the patient’s underlying probability of a disorder. That is, the diagnostic test moves us from a pre-test probability to a post-test probability. Historically, terms such as sensitivity and specificity have been used to describe the properties of a diagnostic test. But these terms have significant limitations, one of which is that they do not consider the pre-test probability at all.
Likelihood ratios overcome this limitation. Basically, a likelihood ratio (LR) converts pre-test odds to post-test odds. Because we think in terms of probabilities rather than odds, we can either use a nomogram to make the conversion for us or recall that for a probability p, odds = p/(1 - p) and p = odds/(1 + odds).
For example, suppose we suspect that a patient may have iron-deficiency anemia and quantify this suspicion with a pre-test probability of 25%. If the ferritin is 8 mcg/L, we can apply the likelihood ratio of 55 found from a literature search locating Guyatt, et al. (1992). The pre-test odds is one-third, which when multiplied by the LR of 55 yields a post-test odds of 18.3. This then can be converted back to a post-test probability of 95%. Alternatively, the widely available nomograms give the same result.
Clearly, this diagnostic test has drastically affected our sense of whether the patient has iron-deficiency anemia. Likelihood ratios for many common problems may be found in the recommended readings.
Perhaps the greatest stumbling block to the use of likelihood ratios is how to determine pre-test probabilities. This really should not be a major worry because it is our business to estimate probabilities of disease every time we see a patient. However, this estimation can be strengthened by using evidence-based principles to find literature to support your chosen pre-test probabilities. This further emphasizes that EBM affects all aspects of clinical decision-making.
Measures of Precision
Each of the measures discussed thus far is a point estimate of the true effect based on the study data. Because the true effect for all humans can never be known, we need some way of describing how precise our point estimates are. Statistically, confidence intervals (CIs) provide this information. An accurate definition of this measure of precision is not intuitive, but in practice the CI can provide answers to two key questions. First, does the CI cross the point of no effect (e.g., a relative risk of 1 or an absolute risk reduction of 0)? Second, how wide is the CI?
If the answer to the first question is yes, we cannot state with any certainty that there really is an effect of the treatment: a finding of “no effect” is considered plausible, because it is contained within the CI. If the CI is very wide, the true effect could be any value across a wide range of possibilities. This makes decision making problematic, unless the entire range of the CI represents a clinically important effect.
We will talk in more detail about CIs in a later segment, but the important message here is that a point estimate requires a CI before meaningful conclusions affecting patient care may be reached.
Applying Results to Patient Care
Once validity issues have been addressed and results have been processed, the key determinants of whether a study’s results can be applied to your patient are whether the study population was reasonably similar to your patient and whether the study setting was reasonably similar to your own. This need not be exact, but if a study enrolled only men, application of the results to women may not be supported.
On the other hand, if a study excluded individuals younger than 60 and your patient is 59 you may still feel comfortable applying the findings of this study to your patient’s care. The application of study results to individual patients is often not a simple decision. A general recommendation is to carefully determine whether there is a compelling reason to suggest that the study results might not apply to your patient. If not, generalizing the results is likely reasonable.
Additional considerations include the balance between benefits and risks, costs, and, of course, patient and provider values. If a treatment promotes survival but may have a negative impact on quality of life (for a recent example, see the MADIT II trial of AICD implantation in patients with prior MI and heart failure), patients and providers must carefully evaluate their priorities in determining the best course of action. Also, a costly treatment having a small but significant benefit may not be justified in an era of limited resources. These issues are at the heart of medicine and are best addressed by collaborative decision-making among patients, care providers, insurers, policy makers, and all other members of our healthcare system.
The results of a study can be reported in many ways, with different measures fitting different clinical questions. The keys to look for are a point estimate and a measure of the precision of that estimate. Applying results to patient care requires complex decisions that go well beyond the numbers from any study. In the upcoming segments of this series, we will focus more attention on how results are evaluated statistically. This will provide additional depth to the discussion of study results and how they inform our clinical decisions. TH
Dr. West practices in the Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minn.