The previous installments in this series have discussed how to ask answerable clinical questions and then search for the best evidence addressing those questions. Not all evidence is of high enough quality to provide meaningful information for patient care, however, and it is important to evaluate all studies with a critical eye toward study design and analysis.
A study can be flawed in many ways, and while many flaws still allow us to apply study results to patients, we need to understand these limitations. It is also insufficient to trust factors such as a medical journal’s impact factor or prestige: Many examples of suboptimal evidence come from higher-tier journals, and it has been estimated that even in the top internal medicine journals up to 50% of papers contain significant design and analysis errors.
While the growth of EBM has directed increasing attention to these issues, the onus remains on the literature consumer to critically appraise the evidence in order to make treatment decisions in as informed a manner as is possible.
Results from a valid study can be expected to be unbiased. In other words, these results should portray the true underlying effect of interest. There are many threats to a study’s validity. Such factors must be evaluated to ensure that they do not systematically affect results and therefore alter the correct interpretation of study findings.
The primary goal of any unbiased study design is to make the comparison groups as similar as possible for all factors potentially affecting the outcome of interest—except for the intervention or exposure of interest. If the only difference between groups’ histories, comorbidities, study experiences, and so on is the intervention or exposure, we can be more confident that any observed outcome differences are due to the exposure rather than other confounding variables.
For example, consider a trial of treatment options for esophageal cancer in which twice as many control group patients smoked as in the intervention group. If the intervention group had better outcomes, we would not know whether this was due to the intervention or to the lower smoking rates in the treatment arm of the study. A well-designed, valid study will make every effort to minimize such problems. This principle applies to all study designs, including observational designs such as case-control and cohort studies, and experimental designs such as the classic randomized controlled trial. We will briefly present a few of the key threats to study validity in this segment of the series. We will focus on clinical trial designs, but the same principles apply to observational designs as well.
Minimize Bias and Protect Study Validity
Randomization: If we wish to make study groups similar on all variables other than the exposure of interest, and we can assign interventions such as in a clinical trial, we can maximize validity by appropriately randomizing patients to intervention groups. Randomization has the effect of balancing comparison groups with respect to both recognized and unrecognized factors that may affect outcomes.
A key feature to look for in a randomization procedure is that the randomization algorithm is in fact completely random. It should be impossible to predict for any study subject to which group they will be randomized. Therefore, for example, procedures systematically alternating subject assignments among groups (A-B-A-B- … ) are not truly random and do not confer the validity benefits of true randomization. It is also important that the randomization process be separate from all other aspects of the study, so that no other factors may influence group assignment. This is closely related to the concept of blinding.
Blinding: If patients, providers, or anybody else involved in a research study are aware of treatment assignments, conscious or subconscious differences in the experience of study participants can be introduced. This is important at all stages of a study, from randomization as described previously through to data analysis at the conclusion of a study. This is also important for all participants in a study. Practically speaking, it may not be possible to blind everybody involved in a study to the assigned treatment group (consider a study of surgical versus medical therapy, where a sham incision may not be desirable or ethical). However, blinding of patients and outcome assessors is desirable whenever feasible. Again, the goal is to treat all study subjects the same way throughout the study, so that the only difference between groups is the intervention of interest.
Intention-to-treat analysis: An intention-to-treat analysis attributes all patients to the group to which they were originally randomized. This further ensures that we are measuring the effect of the intervention of interest rather than imbalances across other factors that might impact whether patients complete the intended treatment program. This has become a well-accepted procedure in clinical trial practice.
Complete follow-up: Loss to follow-up and missing data in general can lead to bias if patients with missing data systematically differ from study completers. No statistical technique can fully compensate for missing data, and there are no general rules regarding acceptable amounts of missing data.
Unfortunately, it is essentially impossible to entirely eliminate missing data, but sensitivity analyses can be helpful in judging whether the degree of missing data is likely to change study findings. In these analyses, study outcomes for different possible missing data results are reviewed. If the conclusions of the study are consistent across the range of possible missing data points, we have good evidence that the amount of missing data is unlikely to be a major limitation of the study.
Validity for Observational Study Designs
The biases to which case-control and cohort studies are prone differ from those of prospective clinical trials, but identical general principles apply. We will not review these biases in detail. The important point is that the goal remains to keep the groups similar on all variables apart from the explanatory variable of interest.
For example, recall bias, in which cases may often be more likely than controls to recall an exposure, can result in associations between exposure and outcome that may be due either to the exposure itself or to the likelihood of recalling an exposure. This can be a serious validity concern for case-control studies, or any design requiring a retrospective recollection of past experiences. Additional information on many other common biases may be found in the recommended reading sources.
Once an article addressing your clinical question has been identified, the quality of the evidence must be critically appraised. The first central feature of this appraisal is an evaluation of the validity, or lack of bias, of the reported results. Only a valid unbiased study can be trusted to accurately represent a true underlying effect. The goal of techniques to protect validity is to isolate the intervention or exposure of interest as the only varying factor, so that any observed findings can be attributed to the exposure rather than explained by other variables. Once we have reassured ourselves that a study is reasonably valid, we need to be able to interpret the results and determine whether we can apply the results to the care of our patients. We will address these aspects of critical appraisal in the next installment of this series. TH
Dr. West practices in the Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minn.