Reading and conducting instrumental variable studies: guide, glossary, and checklist

Venexia Walker; Eleanor Sanderson; Michael G Levin; Scott M Damraurer; Timothy Feeney; Neil M Davies

doi:10.1136/bmj-2023-078093

Research Methods & Reporting

Reading and conducting instrumental variable studies: guide, glossary, and checklist

BMJ 2024; 387 doi: https://doi.org/10.1136/bmj-2023-078093 (Published 14 October 2024) Cite this as: BMJ 2024;387:e078093

Venexia Walker, senior research fellow1 2,
Eleanor Sanderson, lecturer in medical statistics1 2,
Michael G Levin, cardiologist3 4 5,
Scott M Damraurer, associate professor of surgery4 5,
Timothy Feeney, research editor6,
Neil M Davies, professor of medical statistics7 8 9

¹Medical Research Council Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
²Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
³Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
⁴Department of Surgery and Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
⁵Corporal Michael J Crescenz VA Medical Center, Philadelphia, PA, USA
⁶Department of Epidemiology, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
⁷Division of Psychiatry, University College London, London W1T 7NF, UK
⁸Department of Statistical Science, University College London, London, UK
⁹K G Jebsen Centre for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway

Correspondence to: N M Davies neil.m.davies{at}ucl.ac.uk (or @nm_davies on X)

Accepted 15 May 2024

Instrumental variable analysis uses naturally occurring variation to estimate the causal effects of treatments, interventions, and risk factors on outcomes in the population from observational data. Under specific assumptions, instrumental variable methods can provide unbiased estimates of causal effects. This article explains these assumptions and the information and tests typically reported in instrumental variable studies, which can assess the credibility of the findings of instrumental variable studies.

In clinical practice, establishing causation is crucial for informed decision making in patient care. Instrumental variable analysis is increasingly used to provide evidence about causal effects in clinical research (see box 1 for glossary). Instruments are variables that are associated with the intervention, which have no uncontrolled common causes with the outcome and only affect the outcome via the intervention. They can be used to overcome measured and unmeasured confounding of intervention-outcome associations and provide unbiased estimates of the causal effects of an intervention on an outcome using observational data (fig 1). Instrumental variables are defined by three assumptions (box 2).

Box 1

Glossary of terms used in instrumental variable studies

Concepts

Natural experiment: A source of variation in the likelihood of receiving an intervention in the real world that can be used to investigate the causal impact of an intervention.
Instrumental variable: A specific variable in a dataset that is (1) associated with an intervention, (2) has no uncontrolled common causes with the outcome, and (3) only affects the outcome via the intervention.
Fourth point identifying assumption: The assumption used to estimate the mean effect of the intervention on the outcome, without which it is only possible to estimate bounds for the effect of the intervention on the outcome.
Local average treatment effect or complier average causal effect: The effect of an intervention on individuals whose intervention status is affected by the instrument.
Counterfactual values: The patients’ outcomes had they been allocated to the other strategy (ie, the patients’ outcomes following intervention if they were assigned to control or the patients’ outcomes following control if they were assigned to intervention).

Statistical methods

Reduced form: The instrument-outcome association, which, if the instrumental variable assumptions hold, is a valid test of the null hypothesis that the intervention does not affect the outcome.
Wald estimator: The ratio of the instrument-outcome and instrument-intervention associations.1
Two-stage least squares: An instrumental variable estimator. The first stage estimates the instrument(s)-intervention association(s) and uses these associations to predict the intervention values.2 The second stage uses the predicted interventions in a regression to estimate the effect of the intervention(s) on the outcome.

RETURN TO TEXT

Fig 1

Similarities and differences between instrumental variable analysis and randomised controlled trials

Box 2

Key assumptions that define instrument variables3

Relevance (IV1): The instrument must be associated with the intervention.
Independence (IV2): The instrument and the outcome must have no uncontrolled common causes.
The exclusion restriction (IV3): The instrument must only affect the outcome through the intervention.

RETURN TO TEXT

Instrumental variable analysis has a long history (supplementary box 1), with applications in many fields, including healthcare and economics. The approach has increased in popularity owing to the availability of larger datasets, the recognition of the need to obtain reliable estimates when key covariates are not measured, and the use of different analytical assumptions.4 5

Researchers increasingly use instrumental variable analyses to inform a wide range of clinical questions. For example, institutional variation in testing or treatment practices have been used as instrumental variables to estimate the effects of perioperatively testing for coronary heart disease on postoperative mortality rates,6 the relative safety of robotic versus laparoscopic surgery for cholecystectomy,7 and the length of storage of red blood cells and patient survival.8 Physicians’ preferences for treatments have been used to investigate the effects of cyclo-oxygenase-2 (COX-2) versus non-selective, non-steroidal anti-inflammatory drugs (NSAIDs) on gastric complications,9 10 and the effects of conventional versus atypical antipsychotic drug treatments on mortality in elderly patients.11 Allocation to treatment in randomised controlled trials with non-compliance is an instrumental variable previously used to investigate the effects of flexible duty hour conditions for surgeons on patient outcomes and surgeons’ training and wellbeing12 and the effects of reducing amyloid levels on cognition.13 Distance from or time to admission to a particular type of hospital has been used as an instrument for receiving a specific treatment.14 15

One of the most commonly used applications of instrumental variables is mendelian randomisation—using genetic variants as instrumental variables. The core principles of instrumental variable analysis still apply to mendelian randomisation and have been covered in detail and will not be discussed here.16 17

Here, we provide a practical guide for researchers to read, interpret, and conduct instrumental variable studies using non-genetic observational data. In this article, we discuss why a study should use instruments, key concepts and assumptions, how to assess the validity of instrumental variable assumptions, and how to interpret results.

Summary points

Instrumental variable analysis is a research method that uses naturally occurring variation (ie, variation not controlled by researchers), such as policy decisions, clinical preferences, distance, or time, to provide evidence about the causal effects of interventions on outcomes from observational data
Instrumental variables can provide credible evidence about the causal effects even if other observational techniques have residual confounding, reverse causation, or other forms of bias
This article demonstrates how to perform an instrumental variable analysis using commonly available packages
In common with all empirical research methods, instrumental variable analysis depends on assumptions that readers and reviewers must assess
Many sources of evidence, using a range of assumptions, can help inform clinical decisions
A critical appraisal checklist is provided to help assess and interpret instrumental variable studies

Clinical and public health implications

Researchers increasingly use large datasets of electronic medical records, registries, or administrative claims data to provide evidence about the effects of interventions on patient outcomes. An important limitation of these datasets is that while the large sample size allows for very precise results, they frequently have inadequate measures of critical confounders. Confounders are variables that affect the likelihood of receiving the intervention and that also affect the outcome (eg, previous neuropsychiatric diagnoses and the likelihood of being prescribed varenicline rather than nicotine replacement therapy for smoking cessation). Patients rarely receive interventions entirely randomly. Key confounders, such as morbidity and other indications for intervention, are often challenging or impossible to measure with sufficient accuracy from diagnosis or billing codes or are unmeasured or unmeasurable. Thus, matching individuals receiving the intervention with sufficiently comparable controls can be difficult or impossible. As a result, observational analysis of large scale databases could provide unreliable evidence about interventions’ comparative effectiveness and safety. This issue is challenging for clinicians and patients because they need reliable evidence of the causal effects of different interventions to make well informed decisions. Instrumental variables can provide an alternative source of evidence about the effects of different interventions and, while less precise than other approaches, might be less affected by individual level biases such as confounding by indication, where the indications for intervention also affect the likelihood of an outcome.

Why use an instrument?

Most observational methods, such as multivariable adjusted regression or propensity score analysis, assume that it is possible to measure a sufficient set of confounders to account for all differences in the outcome between individuals given the intervention and control, except for those caused by the intervention.18 19 However, the correct set of confounders is not always known, and even if they have been identified, measuring and accounting for baseline differences is extremely difficult, which can result in multivariable-adjusted and propensity score analyses having serious biases and providing misleading results.

For example, COX-2 inhibitors were developed to cause fewer gastrointestinal complications than traditional NSAIDs and marketed to patients and physicians accordingly. As a result, patients prescribed these drug treatments typically were at higher risk of gastrointestinal complications at baseline. Thus, in observational datasets, patients prescribed COX-2 tended to have higher rates of gastrointestinal complications than patients prescribed NSAIDs, a difference that was not fully attenuated after adjustment for measured confounders. This result is because the pre-existing differences in the risk of gastrointestinal complications are very challenging to measure sufficiently, especially in electronic medical records, resulting in residual confounding by indication.

Alternatively, patients prescribed nicotine replacement therapy for smoking cessation differ from those prescribed drugs such as varenicline: patients prescribed nicotine replacement therapy tend to be more unwell, be older, and have poorer mental health.20 However, electronic medical records or other datasets often do not record these differences. For example, patients might discuss smoking cessation with their general practitioner when they have preclinical symptoms of heart disease; these symptoms might not be perfectly captured in medical records.

Instrumental variable analysis offers an approach to deal with these problems. It relies on a distinct set of assumptions from other methods, which do not require measuring or knowing all the potential confounders of the intervention and outcome.

What is an instrumental variable?

The following three assumptions define instrumental variables. Firstly, the instrument is associated with the intervention of interest (relevance); secondly, it shares no uncontrolled common cause with the outcome (independence); and thirdly, it only affects the outcome through the intervention (exclusion restriction). Instruments only need to be associated with the likelihood of receiving the intervention; they do not necessarily need to cause it.3 Instrumental variable analyses exploit naturally occurring variation (the instrument) to estimate the impact of the intervention on an outcome. This variation can be due to clinical or policy decisions unrelated to unmeasured confounders. Box 2 defines these assumptions, and figure 2 uses a directed acyclic graph to represent these assumptions. Assessing the plausibility of the assumptions is critical to determining whether a proposed instrumental variable is valid and is discussed in detail below.

Fig 2

Assumptions of multivariable adjusted and instrumental variable studies. Physicians’ prescribing preferences are typically unmeasured; thus, normally, studies use prescriptions issued to the physicians’ previous patients to proxy for their preferences. Multivariable adjustment (MV1) assumes that a sufficient set of confounders can be measured to control for all open paths between the intervention and the outcome. In contrast, instrumental variable analysis assumes that there is an instrument that associates with the intervention (relevance; IV1), has no uncontrolled common cause with the outcome (independence; IV2), and only affects the outcome through the intervention (exclusion restriction; IV3)

These assumptions can be defined unconditionally, or more often conditionally, on other important covariates in a dataset; for example, physicians’ prescribing preferences are usually conditioned on a patient’s age. If these assumptions are violated, for example, by residual confounding of the instrument-outcome association, then the results of an instrumental variable analysis can be more biased than other approaches, such as multivariable adjustment and propensity score. Thus, a key challenge for authors and readers of instrumental variable studies is determining whether the assumptions are plausible for the research question.

Types of instruments

Numerous natural experiments have been proposed and assessed as potential instruments. These commonly include physician preference (eg, preference to prescribe one intervention versus another for a given diagnosis), access to intervention (eg, distance to a hospital with specific speciality staff or equipment), or randomisation (eg, in the context of a randomised controlled trial with non-compliance). Examples of these instruments are given below. Other sources of variation have also been used and are covered elsewhere.21

Physician preference

Clinicians have preferences for many clinical decisions, such as testing, treatments, or diagnoses. These pre-existing preferences could be independent of the subsequent patients they see. For example, a physician might prefer prescribing nicotine replacement therapy over drug treatments such as varenicline.20 Studies generally cannot measure physicians’ preferences for one intervention or another, so they measure preferences in other ways. For instance, physicians’ prescribing preferences might be captured by looking at previous prescriptions for the interventions under consideration or, more rarely, surveys used to elicit preferences. Physicians’ prescriptions to their previous patients are often associated with the prescriptions they issue to their future patients. If this preference occurs in a way that is unrelated to the patient level confounders of their current patients, the independence assumption could hold. Physicians’ previously demonstrated preferences are consistently associated with their prescriptions to their current patients.9 10 A potential weakness of physicians’ prescribing preferences as an instrument is that they might not be specific to the treatment of interest and could be associated with broader differences in care.

Access

Access instruments include distance to hospitals,14 travel times to the hospital as a proxy for quicker treatment,22 the raising of the school leaving age as a proxy for education,23 and date of treatment as a proxy for choice of treatment.24 Here, for the instrumental variable assumptions to hold, access must associate with the likelihood of receiving the intervention but not directly affect the outcome or share any unmeasured confounders with the outcome. A potential weakness of studies using access based instruments is that geographical location and distance to healthcare facilities are often highly non-random and are related to important unmeasured confounders such as socioeconomic position.

Random assignment in the presence of non-compliance

Treatment assignment in a randomised controlled trial with non-compliance or an encouragement design can be an instrumental variable.25 26 27 By design, random assignment should balance confounders between individuals assigned to the intervention and those assigned to the control. Conventional analyses of randomised trials report the intention-to-treat estimate, which is the difference in outcomes between participants assigned to the intervention and participants assigned to the control. However, if some trial participants do not comply with their allocation, the intention-to-treat estimate will underestimate the effects of taking the intervention because it will also reflect the effects of compliance.

Instrumental variable analysis can be used to estimate the effects of taking the intervention, which can be estimated by assuming that the treatment assignment affects the likelihood of receiving the intervention in the same direction for all individuals (ie, the instrument has a monotonic effect if it increases the likelihood of the intervention for some individuals it does not decrease it for others). Under the monotonicity assumption, the instrumental variable estimate will reflect the complier or local average treatment effects (see box 3 for definitions). This parameter is the effect of the intervention on individuals whose treatment status was affected by the instrument. A limitation of random assignment is that assignment might alter behaviour in other ways, leading to violations of the exclusion restriction (eg, if individuals assigned to control in an unblinded trial seek treatment via other means). Examples of using allocation to treatment as an instrument include a cluster randomised trial of vitamin A supplementation with non-compliance.25 Treatment allocation can be used to estimate the effects of an underlying continuous risk factor, for example, the effects of reducing amyloid levels on cognition rather than the effect of being allocated to amyloid-lowering drug treatment.13 If the risk factor is continuous, then it is more challenging to interpret under monotonicity, and studies might make other assumptions (eg, assuming a constant effect of the risk factor).

Box 3

Point identifying assumptions and interpretation

The three core assumptions for instrumental variable analysis are only sufficient to estimate the bounds of a causal effect, which are the largest and smallest values consistent with the observed data. However, instrumental variable bounds are typically very wide, so most instrumental variable studies require a further fourth, point identifying assumption. Options for the fourth assumption include the constant treatment effect (IV4h), no effect modification (IV4n), no simultaneous heterogeneity (NOSH; IV4nosh), and monotonicity (IV4m).2 28 29

The constant treatment effect assumption requires that the effect of the intervention on the outcome is the same for all individuals. For example, if the intervention of interest was an anti-hypertensive drug treatment such as angiotensin-converting enzyme (ACE) inhibitors, these inhibitors should give the same reduction in systolic blood pressure for all participants, regardless of any other characteristics.
The no effect modification assumption requires that the intervention has the same effect on the outcome irrespective of the instrument’s value. For example, if the effects of ACE inhibitors are the same irrespective of physicians’ preference.
The NOSH assumption requires that any heterogeneity in the effects of the instrument on the intervention is independent of heterogeneity in the effects of the intervention on the outcome. This assumption would hold if the variation in the effect of physician preferences on prescribing were not related to the treatment’s expected efficacy (ie, the instrument implicitly samples a representative sample of causal effects from the population).
The monotonicity assumption requires that the effect of the instrument on the likelihood of receiving the intervention is always in the same direction (eg, the instrument only increases or decreases the likelihood of receiving the intervention). For example, a patient with a physician who prefers to prescribe ACE inhibitors will be more likely to receive an ACE inhibitor than a patient who attends a physician who prefers another anti-hypertensive drug.

Assessment of point identifying assumptions

These point identifying assumptions are untestable but falsifiable. The constant treatment effect assumption is potentially falsifiable by checking for differences in the implied effects of the intervention across covariates. For binary interventions with causal binary instruments and binary outcomes, monotonicity inequalities can falsify the monotonicity assumption.30 Cumulative distribution graphs for continuous interventions can assess this assumption.2 If the proposed instrument is a preference, assessing the plausibility of the monotonicity assumption is possible by conducting a preference survey.31 These surveys suggest that a strict definition of monotonicity is unlikely to be plausible, as there is substantial heterogeneity in clinical treatment decisions. However, Small and colleagues in 2017 proposed a more plausible assumption: stochastic monotonicity, which requires that the effect of the instrument on the exposure is monotonic conditional on a set of covariates.32

Interpretation of instrumental variable estimates

Instrumental variable estimates can be interpreted as the average treatment effect under the constant treatment effect, no effect modification, or NOSH assumptions. The constant treatment effect assumption identifies the average treatment effect by assuming the intervention has the same effect for all individuals. This assumption is most commonly used to identify the effects on continuous outcomes. However, this assumption can be implausible. For example, an intervention could only have a constant effect on a binary outcome if it entirely determined the outcome or did not affect it. In the example of ACE inhibitor use, it is implausible to assume that ACE inhibitors have the same effect on every individual in the population.

The no effect modification assumption identifies the intervention’s effect on those participants who receive the intervention by assuming that the effect of the intervention is independent of the instrument’s value. For example, in a randomised controlled trial with an encouragement design where the intervention is an encouragement to take a treatment, allocation to the intervention or control arm does not change the effect of the treatment. This assumption can identify interventions’ effects on binary outcomes and estimate causal risk and odds ratios. Finally, the NOSH assumption requires that heterogeneity in the effects of the instrument on the likelihood of receiving the intervention must be independent of heterogeneity in the effect of the intervention on the outcome to be interpreted as the average treatment effect.29

Instrumental variable estimates can be interpreted as reflecting a local average treatment effect using the monotonicity assumption. The monotonicity assumption identifies the effects of the intervention on those individuals whose intervention status was affected by the instrument. This assumption is typically, but not exclusively, applied to binary instruments and interventions.33 Individuals who either always take the intervention or never take the intervention, regardless of whether they were assigned to it, will not be affected by the instrument. Two groups of individuals remain: those who only take the intervention when they are assigned to it (known as compliers), and those who only take the intervention when they are not assigned to it (known as defiers). The monotonicity assumption assumes that there are no defiers in the sample. For example, physicians’ prescribing preferences could have a monotonic effect if patients prescribed nicotine replacement therapy who attended a physician who previously prescribed varenicline would also have been prescribed nicotine replacement therapy by a physician who previously prescribed nicotine replacement therapy (and vice versa).

RETURN TO TEXT

The instrumental variable assumptions need to be assessed and considered for each application (box 4). Just because the assumptions are plausible for one treatment or population does not mean that they will be valid in another.

Box 4

Critical appraisal checklist for evaluating instrumental variable studies

Readers of instrumental variable studies could consider the following questions:

Core instrumental variable assumptions

Is there evidence that the instruments are associated with the intervention of interest? Does the study report a first stage partial F statistic?
Are the instruments associated with measured potential confounders of the intervention and outcome?
Are there likely to be different confounders of the instrument-outcome association than the intervention-outcome association?
Is the proposed instrument likely to affect the outcome via mechanisms other than the intervention of interest?
Do the authors use negative control outcomes to investigate the plausibility of the instrumental variable assumptions?

Fourth instrumental variable assumption

Do the authors report the fourth instrumental variable assumption?
Do the authors describe their estimand and how it relates to clinical practice?

Methods

Does the study clearly state the instrumental variable estimator used in the analysis?
For two-stage least squares, are the same covariates included in both stages of the analysis?

Data presentation

Do the authors present the instrument-outcome association, an instrumental variable estimate, or both?
If they provide an instrumental variable estimate, do they compare it with the multivariable-adjusted estimate?
Was the definition of the instrument prespecified, or was the definition of the instrument chosen based on the data under analysis?
Do the authors provide the code they used to allow researchers to reproduce their findings?

Interpretation

If the instrumental variable estimate is similar to the multivariable adjusted estimate and provides evidence consistent with a causal effect, could it be due to weak instrument bias in a single study or confounding of the instrument-outcome association?
If the instrumental variable estimate differs from the multivariable adjusted estimate and provides little evidence of a causal effect, could this be due to weak instrument bias or confounding?
Are the 95% confidence intervals of the estimate sufficiently precise to test for differences with the multivariable adjusted estimate and detect a clinically meaningful difference?

Clinical implications

Do the results triangulate with other forms of evidence?
If a randomised clinical trial is not feasible or unlikely to be conducted in the short term, and there is existing evidence from multiple instrumental variable studies, and other robust study designs converge on consistent results, this information may help guide patient care; for example, informing clinical guidelines or regulatory decisions.

RETURN TO TEXT

Assessment of instrumental variable assumptions

Directed acyclic graphs provide a convenient and transparent way to depict and explain the assumptions required for an applied instrumental variable analysis.34 35 36 Researchers can adapt the structure used in figure 2 for specific research questions. Studies can then use empirical data to assess whether the three core assumptions for instrumental variables hold.

The first instrumental variable assumption (relevance) states that the instrument must be strongly associated with the likelihood of taking the intervention. The strength of the instrument-intervention association is easily testable. For example, in the study of drug treatments for smoking cessation, we found that physicians who had previously prescribed varenicline were 24 percentage points (95% confidence interval 23 to 25) more likely to prescribe varenicline to their subsequent patients than physicians who had previously prescribed nicotine replacement therapy. However, a difference in treatment rates across instrument values is insufficient to measure instrument strength because it does not reflect the sample size. In a small study of a few hundred patients, even a very large difference in treatment rates across the instrument’s value will provide very little information about the effects of treatment.

In contrast, the first stage partial F statistic of the regression of the intervention on the instrument indicates both the strength of the association and the total sample size. The first stage partial F statistic in an instrumental variable analysis is analogous to the sample size in a randomised controlled trial. Most instrumental variable estimation packages in Stata and R (such as ivreg2 or AER, respectively)37 38 will report this F statistic by default. A value above 10 is considered strong and unlikely to lead to weak instrument bias.39 However, an F statistic above 10 does not guarantee that an instrumental variable study will have sufficient statistical power to detect an effect size of interest.

The remaining assumptions are untestable, so they cannot be proven to hold, but they are falsifiable.31 40 An assumption is falsifiable if it is possible to use empirical data to disprove it. The independence assumption can be falsified by testing the instrument-covariate associations using covariate balance and bias component plots,41 42 or randomisation tests.43 If the instrumental variable assumptions hold, no associations between the instrument and alternative pathways or other covariates that predict the outcome should be detected.44 The exclusion restriction is falsifiable by demonstrating that other variables are affected by the instrument, which also affects the outcome. For example, in a study of angiotensin-converting enzyme (ACE) inhibitors for cardiovascular disease, if physicians who are more likely to prescribe these inhibitors are also more likely to prescribe statins, which also affect cardiovascular disease, the exclusion restriction assumption would be violated.

Another way to falsify the independence and exclusion restriction assumptions is using negative controls to investigate whether the instrument predicts the outcome in subgroups of the population for which the instrument does not affect the likelihood of receiving the intervention. If evidence indicates that the instrument affects the outcome, even in subgroups where the instrument does not affect the likelihood of receiving the intervention, the instrumental variable assumptions are unlikely to be plausible (eg, by using patients who do not have hypertension (eg, children) who were treated for other indications by physicians who preferred ACE inhibitors). Falsification tests are useful indicators of how plausible the assumptions are likely to be, however, and failure to falsify an assumption does not prove it holds. For example, if the instrument was associated with an unmeasured confounder of the intervention and the outcome, this association would not be evident in a covariate plot that only included measured covariates.

A further way to assess the plausibility of assumptions is to investigate any differences (heterogeneity) in the effect sizes implied by different instruments. This approach requires more than one instrument (which, when there are more instruments than interventions, is technically known as being over-identified). If more than one instrument is affecting the likelihood of receiving the intervention (eg, physicians’ preferences and distance to the healthcare facility), the heterogeneity in the effects of the intervention implied by each instrument could indicate violations of the instrumental variable assumptions. Bonet’s instrumental variable inequality tests can also falsify binary interventions’ exclusion restriction and independence assumptions.45

How to generate instrumental variable estimates

Instrumental variables can test whether an intervention affects an outcome and estimate the magnitude of that effect. The simplest estimator is the instrument-outcome association (reduced form; box 1), which can be estimated using regression methods (eg, linear or logistic regression methods). This estimator does not estimate the magnitude of the effect of the intervention on the outcome. However, under the instrumental variable assumptions, it is a valid test of the null hypothesis that the intervention does not affect the outcome. An advantage of this test is that it is simple, requires the fewest and weakest assumptions, and can test for the existence of an effect. A disadvantage is that it does not provide a scale for the effect of the intervention on the outcome, limiting the interpretation of the results. Ideally, we want to know the average effect of the intervention (also known as the average treatment effect), and not just the effect of the instrument. For example, researchers and readers might be more interested in the effect of prescribing varenicline or nicotine replacement therapy (the intervention) on their current patient than the effect of physicians’ previous prescriptions for smoking cessation treatment (the instrument) on smoking cessation rates (the outcome).

Several instrumental variable estimators can estimate the average treatment effect. Some of the most used instrumental variable estimators are covered below. However, these methods were largely developed to estimate average treatment effects for normally distributed instruments, exposures, and outcomes assuming linear mechanisms. Although, in practice, these methods are widely used for binary outcomes or non-linear mechanisms (sometimes with the same or a different name), the interpretation can be difficult and more advanced methods might be required.

If only one instrument is available, then the average effect of the intervention on an outcome can be estimated using instrumental variable estimators, such as the Wald estimator, which is the ratio of the instrument-outcome association divided by the instrument-intervention association. This estimator rescales the instrument-outcome association to the intervention scale and indicates the effect of a unit change in the intervention on the outcome. For example, if patients prescribed smoking cessation treatments by physicians who previously prescribed varenicline were 1 percentage point more likely to cease smoking (the instrument-outcome association) and 10 percentage points more likely to be prescribed varenicline (the instrument-intervention association), then the Wald estimate would be −0.01 ÷ 0.1 = −0.1. This estimate would imply that prescribing varenicline increases the absolute probability of stopping smoking by 10 percentage points.

When a study has one or more instruments available, for example, if a study used physicians’ preferences and distance to healthcare facility as instruments, then the effects of the intervention on the outcome can be estimated using a two-stage least squares estimator. This estimator comprises two regressions or stages. The first stage is a regression of the intervention on the instruments, which can predict the intervention value based on the instrument values. The second stage is a regression of the predicted intervention status on the outcome. The estimated coefficient on the predicted value is the instrumental variable estimate of the effect of the intervention on the outcome. A simulated example and the formulas are provided in the supplementary materials. It is usually essential that both stages of instrumental variable analysis contain the same covariates.46 However, this approach will not account for the estimation error from the first stage and is likely to give incorrect standard errors and confidence intervals. Typically, most analyses use a package such as ivreg2 in Stata or AER or ivreg packages in R,37 38 which compute the instrumental variable estimates in one step and integrates the estimation errors from both stages.

Different types of outcomes require different instrumental variable estimators, which rely on logic similar to the two-stage least squares estimator. Commonly used estimators include:

Continuous outcomes: Mean differences (eg, effects of smoking cessation treatment on body mass index using physicians’ prescribing preferences20) can be estimated using additive structural mean models.2
Binary outcomes: Causal risk differences, odds ratios, and risk ratios (eg, estimating the effects of coronary bypass surgery on mortality14) can be estimated using additive, logistical, and multiplicative structural mean models and control function approaches.33 47 48
Survival outcomes: Methods using instrumental variables with survival outcomes, which adopt a similar approach to two-stage least squares, or the control function approach,49 have been developed to allow for covariate and outcome dependent censoring.50 For example, estimating the effects of screening frequency on colorectal cancer diagnoses using international differences in screening policies.51
Instrumental variable quantile regression: Non-linear effects of the intervention can be estimated using instrumental variable quantile regression.52 53 54 For example, investigating whether the effects of a unit increase in body mass index on healthcare costs differ between underweight and overweight individuals.55

Methods for instrumental variable estimation is an area of active methodological development, spanning statistics, econometrics, and computer science. Examples include estimators combining instrumental variable analysis and matching56 and estimators using machine learning.57 58 59

Data for instrumental variable studies

Instrumental variable studies typically require measures of the instrument, the intervention, and the outcome for individual level data analysis using the same sample of people. This straightforward approach allows the most flexibility to test and evaluate the instrumental variable assumptions. However, integrating additional external datasets can improve the power and precision of instrumental variable analyses using an approach known as two-sample instrumental variable analysis.60 This approach estimates the instrument-intervention association in one sample and the instrument-outcome association in another, from which the Wald estimator can be calculated. For example, a study could estimate the effects of policy reform on educational attainment using census data from the entire population but estimate the effects on health outcomes in a cohort study subsampled from the same underlying population.61 A two-sample instrumental variable analysis does not need measures of the intervention or the outcome in all samples, which can increase power considerably, particularly when the outcome is rare or difficult to measure.

Summary

Instrumental variable analysis can provide reliable evidence about the causal effects of an intervention, even if the intervention-outcome association is affected by unmeasured confounding. Key to conducting and reading instrumental variable studies is assessing the plausibility of the three core assumptions on instrumental variables. Does the instrument strongly associate with the intervention? Is there a rationale for why the instrument-outcome association is less likely to have confounding than the intervention-outcome association? Is there evidence that measured covariates are less strongly associated with the instrument than the intervention? Are alternative pathways available that could mediate the effects of the instrument?

Instrumental variable analysis can provide a valuable complement to other forms of observational analysis. Unlike other approaches, instrumental variables have distinct assumptions and can strengthen inferences when combined with other sources of evidence. The increasing amount of data available for clinical research means that there is a growing opportunity to use these methods to improve patient care.

Acknowledgments

We thank Brian Lee, Luke Keele, Ting Ye, and Robert Platt, for their extremely helpful comments; and Christopher Worsham and Tarjei Widding-Havneraas for reviewing the manuscript.

Footnotes

Contributors: VW, ES, and NMD conceived the paper and wrote the first draft. All other authors revised the manuscript and provided critical feedback. All authors act as guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: VW (MC_UU_00032/03) and ES (MC_UU_00032/01) work in the MRC Integrative Epidemiology Unit, which receives funding from the UK Medical Research Council. TF has received funding from Pfizer, Takeda, Acadia, and iHeed for unrelated consulting work; and receives funds from The BMJ for editorial work. MGL received support from the Institute for Translational Medicine and Therapeutics of the Perelman School of Medicine at the University of Pennsylvania, NIH/NHLBI National Research Service Award postdoctoral fellowship (T32HL007843), and Measey Foundation. SMD receives research support from RenalytixAI and Novo Nordisk, outside the scope of the current research. NMD is supported by the Norwegian Research Council via grant number 295989 and partly by grant HL105756 from the National Heart, Lung, and Blood Institute (NHLBI). NMD receives funds from The BMJ for editorial work. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from the UK Medical Research Council, Norwegian Research Council, NIH/NHLBI, Measey Foundation, and Doris Duke Foundation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work; TF has received funding from Pfizer, Takeda, Acadia, and iHeed for unrelated consulting work, and receives funds from The BMJ for editorial work; NMD receives funds from The BMJ for editorial work; SMD receives research support from RenalytixAI and Novo Nordisk, outside the scope of the current research.
Provenance and peer review: Not commissioned; externally peer reviewed.

http://creativecommons.org/licenses/by/4.0/

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/.

References

↵
1. Wald A
. Note on the Consistency of the Maximum Likelihood Estimate. Ann Math Stat1949;20:595-601doi:10.1214/aoms/1177729952.
OpenUrl CrossRef
↵
1. Angrist JD,
2. Imbens GW
. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat Assoc1995;90:431-42. doi:10.1080/01621459.1995.10476535.
OpenUrl CrossRef
↵
1. Hernán MA,
2. Robins JM
. Instruments for causal inference: an epidemiologist’s dream?Epidemiology2006;17:360-72. doi:10.1097/01.ede.0000222409.00878.37 pmid:16755261
OpenUrl CrossRef PubMed Web of Science
↵
1. Jena AB,
2. Worsham C
. Random acts of medicine: the hidden forces that sway doctors, impact patients, and shape our health.Doubleday, 2023.
↵
1. Khullar D,
2. Jena AB
. “Natural Experiments” in Health Care Research. JAMA Health Forum2021;2:e210290. doi:10.1001/jamahealthforum.2021.0290 pmid:36218753
OpenUrl CrossRef PubMed
↵
1. Cheng XS,
2. Liu S,
3. Han J,
4. et al
. Association of Pretransplant Coronary Heart Disease Testing With Early Kidney Transplant Outcomes. JAMA Intern Med2023;183:134-41. doi:10.1001/jamainternmed.2022.6069 pmid:36595271
OpenUrl CrossRef PubMed
↵
1. Kalata S,
2. Thumma JR,
3. Norton EC,
4. Dimick JB,
5. Sheetz KH
. Comparative Safety of Robotic-Assisted vs Laparoscopic Cholecystectomy. JAMA Surg2023;158:1303-10. doi:10.1001/jamasurg.2023.4389. pmid:37728932
OpenUrl CrossRef PubMed
↵
1. Halmin M,
2. Rostgaard K,
3. Lee BK,
4. et al
. Length of Storage of Red Blood Cells and Patient Survival After Blood Transfusion: A Binational Cohort Study. Ann Intern Med2017;166:248-56. doi:10.7326/M16-1415 pmid:27992899
OpenUrl CrossRef PubMed
↵
1. Brookhart MA,
2. Wang PS,
3. Solomon DH,
4. Schneeweiss S
. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology2006;17:268-75. doi:10.1097/01.ede.0000193606.58671.c5 pmid:16617275
OpenUrl CrossRef PubMed Web of Science
↵
1. Davies NM,
2. Smith GD,
3. Windmeijer F,
4. Martin RM
. COX-2 selective nonsteroidal anti-inflammatory drugs and risk of gastrointestinal tract complications and myocardial infarction: an instrumental variable analysis. Epidemiology2013;24:352-62. doi:10.1097/EDE.0b013e318289e024 pmid:23532054
OpenUrl CrossRef PubMed
↵
1. Wang PS,
2. Schneeweiss S,
3. Avorn J,
4. et al
. Risk of death in elderly users of conventional vs. atypical antipsychotic medications. N Engl J Med2005;353:2335-41. doi:10.1056/NEJMoa052827 pmid:16319382
OpenUrl CrossRef PubMed Web of Science
↵
1. Bilimoria KY,
2. Chung JW,
3. Hedges LV,
4. et al
. National Cluster-Randomized Trial of Duty-Hour Flexibility in Surgical Training. N Engl J Med2016;374:713-27. doi:10.1056/NEJMoa1515724 pmid:26836220
OpenUrl CrossRef PubMed
↵
1. Ackley SF,
2. Zimmerman SC,
3. Brenowitz WD,
4. et al
. Effect of reductions in amyloid levels on cognitive change in randomized trials: instrumental variable meta-analysis. BMJ2021;372:n156. doi:10.1136/bmj.n156 pmid:33632704
OpenUrl Abstract/FREE Full Text
↵
1. McClellan M,
2. McNeil BJ,
3. Newhouse JP
. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. JAMA1994;272:859-66. doi:10.1001/jama.1994.03520110039026 pmid:8078163
OpenUrl CrossRef PubMed Web of Science
↵
1. Svedahl ER,
2. Pape K,
3. Austad B,
4. et al
. Impact of altering referral threshold from out-of-hours primary care to hospital on patient safety and further health service use: a cohort study. BMJ Qual Saf2023;32:330-40. doi:10.1136/bmjqs-2022-014944 pmid:36522178
OpenUrl Abstract/FREE Full Text
↵
1. Davies NM,
2. Holmes MV,
3. Davey Smith G
. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ2018;362:k601. doi:10.1136/bmj.k601 pmid:30002074
OpenUrl FREE Full Text
↵
1. Sanderson E,
2. Glymour MM,
3. Holmes MV,
4. et al
. Mendelian randomization. Nat Rev Methods Primers2022;2:6. doi:10.1038/s43586-021-00092-5 pmid:37325194
OpenUrl CrossRef PubMed
↵
1. VanderWeele TJ
. Principles of confounder selection. Eur J Epidemiol2019;34:211-9. doi:10.1007/s10654-019-00494-6 pmid:30840181
OpenUrl CrossRef PubMed
↵
1. Rosenbaum PR,
2. Rubin DB
. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika1983;70:41. doi:10.1093/biomet/70.1.41.
OpenUrl CrossRef Web of Science
↵
1. Thomas KH,
2. Martin RM,
3. Davies NM,
4. Metcalfe C,
5. Windmeijer F,
6. Gunnell D
. Smoking cessation treatment and risk of depression, suicide, and self harm in the Clinical Practice Research Datalink: prospective cohort study. BMJ2013;347:f5704. doi:10.1136/bmj.f5704 pmid:24124105
OpenUrl Abstract/FREE Full Text
↵
1. Widding-Havneraas T,
2. Chaulagain A,
3. Lyhmann I,
4. et al
. Preference-based instrumental variables in health research rely on important and underreported assumptions: a systematic review. J Clin Epidemiol2021;139:269-78. doi:10.1016/j.jclinepi.2021.06.006 pmid:34126207
OpenUrl CrossRef PubMed
↵
1. Guo Z,
2. Cheng J,
3. Lorch SA,
4. Small DS
. Using an instrumental variable to test for unmeasured confounding. Stat Med2014;33:3528-46. doi:10.1002/sim.6227 pmid:24930696
OpenUrl CrossRef PubMed
↵
1. Davies NM,
2. Dickson M,
3. Davey Smith G,
4. van den Berg GJ,
5. Windmeijer F
. The causal effects of education on health outcomes in the UK Biobank. Nat Hum Behav2018;2:117-25. doi:10.1038/s41562-017-0279-y pmid:30406209
OpenUrl CrossRef PubMed
↵
1. Gokhale M,
2. Buse JB,
3. DeFilippo Mack C,
4. et al
. Calendar time as an instrumental variable in assessing the risk of heart failure with antihyperglycemic drugs. Pharmacoepidemiol Drug Saf2018;27:857-66. doi:10.1002/pds.4578 pmid:29943442
OpenUrl CrossRef PubMed
↵
1. Greenland S
. An introduction to instrumental variables for epidemiologists. Int J Epidemiol2000;29:722-9. doi:10.1093/ije/29.4.722 pmid:10922351
OpenUrl CrossRef PubMed Web of Science
↵
1. Hirano K,
2. Imbens GW,
3. Rubin DB,
4. Zhou XH
. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics2000;1:69-88. doi:10.1093/biostatistics/1.1.69 pmid:12933526
OpenUrl CrossRef PubMed
↵
1. Li S-M,
2. Ran A-R,
3. Kang M-T,
4. et al.,
5. Anyang Childhood Eye Study Group
. Effect of Text Messaging Parents of School-Aged Children on Outdoor Time to Control Myopia: A Randomized Clinical Trial. JAMA Pediatr2022;176:1077-83. doi:10.1001/jamapediatrics.2022.3542 pmid:36155742
OpenUrl CrossRef PubMed
↵
1. Hernán MA,
2. Robins JM
. Causal Inference: What If.Chapman & Hall/CRC, 2020.
↵
1. Hartwig FP,
2. Wang L,
3. Davey Smith G,
4. Davies NM
. Average Causal Effect Estimation Via Instrumental Variables: the No Simultaneous Heterogeneity Assumption. Epidemiology2023;34:325-32. doi:10.1097/EDE.0000000000001596 pmid:36709456
OpenUrl CrossRef PubMed
↵
1. Swanson SA,
2. Miller M,
3. Robins JM,
4. Hernán MA
. Definition and evaluation of the monotonicity condition for preference-based instruments. Epidemiology2015;26:414-20. doi:10.1097/EDE.0000000000000279. pmid:25782755
OpenUrl CrossRef PubMed
↵
1. Labrecque J,
2. Swanson SA
. Understanding the Assumptions Underlying Instrumental Variable Analyses: a Brief Review of Falsification Strategies and Related Tools. Curr Epidemiol Rep2018;5:214-20. doi:10.1007/s40471-018-0152-1 pmid:30148040
OpenUrl CrossRef PubMed
↵
1. Small DS,
2. Tan Z,
3. Ramsahai RR,
4. et al
. Instrumental Variable Estimation with a Stochastic Monotonicity Assumption. Stat Sci2017;32. doi:10.1214/17-STS623.
OpenUrl CrossRef
↵
1. Clarke PS,
2. Windmeijer F
. Instrumental Variable Estimators for Binary Outcomes. J Am Stat Assoc2012;107:1638-52. doi:10.1080/01621459.2012.734171.
OpenUrl CrossRef
↵
1. Pearl J
. Causality: models, reasoning, and inference.Cambridge University Press, 2000.
↵
1. Feeney T,
2. Hartwig FP,
3. Davies N
. How to use directed acyclic graphs: guide for clinical researchers. BMJ2024;387:e078226. doi:10.1136/bmj-2023-078226.
OpenUrl CrossRef
↵
1. Steiner PM,
2. Kim Y,
3. Hall CE,
4. Su D
. Graphical Models for Quasi-experimental Designs. Sociol Methods Res2017;46:155-88. doi:10.1177/0049124115582272 pmid:30174355
OpenUrl CrossRef PubMed
↵
1. Kleiber C,
2. Zeileis A
. Applied Econometrics with R.Springer, 2008doi:10.1007/978-0-387-77318-6.
OpenUrl CrossRef
↵
Baum CF, Schaffer ME, Stillman S. IVREG2: Stata module for extended instrumental variables/2SLS and GMM estimation. 2013. https://EconPapers.repec.org/RePEc:boc:bocode:s425401
↵
Stock J, Yogo M. Testing for weak instruments in linear IV regression. National Bureau of Economic Research Technical Working Paper Series. 2002;284.
↵
1. Keele L,
2. Zhao Q,
3. Kelz RR,
4. Small D
. Falsification Tests for Instrumental Variable Designs With an Application to Tendency to Operate. Med Care2019;57:167-71. doi:10.1097/MLR.0000000000001040 pmid:30520835
OpenUrl CrossRef PubMed
↵
1. Jackson JW,
2. Swanson SA
. Toward a clearer portrayal of confounding bias in instrumental variable applications. Epidemiology2015;26:498-504. doi:10.1097/EDE.0000000000000287 pmid:25978796
OpenUrl CrossRef PubMed
↵
1. Davies NM,
2. Thomas KH,
3. Taylor AE,
4. et al
. How to compare instrumental variable and conventional regression analyses using negative controls and bias plots. Int J Epidemiol2017;46:2067-77. doi:10.1093/ije/dyx014 pmid:28398582
OpenUrl CrossRef PubMed
↵
1. Branson Z,
2. Keele L
. Evaluating a Key Instrumental Variable Assumption Using Randomization Tests. Am J Epidemiol2020;189:1412-20. doi:10.1093/aje/kwaa089 pmid:32432319
OpenUrl CrossRef PubMed
↵
1. Lipsitch M,
2. Tchetgen Tchetgen E,
3. Cohen T
. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology2010;21:383-8. doi:10.1097/EDE.0b013e3181d61eeb pmid:20335814
OpenUrl CrossRef PubMed Web of Science
↵
1. Balke A,
2. Pearl J
. Bounds on Treatment Effects from Studies with Imperfect Compliance. J Am Stat Assoc1997;92:1171-6doi:10.1080/01621459.1997.10474074.
OpenUrl CrossRef Web of Science
↵
1. Wooldridge J
. Econometric analysis of cross section and panel data.MIT press, 2002.
↵
1. Clarke PS,
2. Windmeijer F
. Identification of causal effects on binary outcomes using structural mean models. Biostatistics2010;11:756-70. doi:10.1093/biostatistics/kxq024 pmid:20522728
OpenUrl CrossRef PubMed Web of Science
↵
1. Newey WK
. Nonparametric Instrumental Variables Estimation. Am Econ Rev2013;103:550-6. doi:10.1257/aer.103.3.550.
OpenUrl CrossRef
↵
1. Tchetgen Tchetgen EJ,
2. Walter S,
3. Vansteelandt S,
4. Martinussen T,
5. Glymour M
. Instrumental variable estimation in a survival context. Epidemiology2015;26:402-10. doi:10.1097/EDE.0000000000000262 pmid:25692223
OpenUrl CrossRef PubMed
↵
1. Lee Y,
2. Kennedy EH,
3. Mitra N
. Doubly robust nonparametric instrumental variable estimators for survival outcomes. Biostatistics2023;24:518-37. doi:10.1093/biostatistics/kxab036 pmid:34676400
OpenUrl CrossRef PubMed
↵
1. Engel C,
2. Vasen HF,
3. Seppälä T,
4. et al.,
5. German HNPCC Consortium, the Dutch Lynch Syndrome Collaborative Group, and the Finnish Lynch Syndrome Registry
. No Difference in Colorectal Cancer Incidence or Stage at Detection by Colonoscopy Among 3 Countries With Different Lynch Syndrome Surveillance Policies. Gastroenterology2018;155:1400-1409.e2. doi:10.1053/j.gastro.2018.07.030 pmid:30063918
OpenUrl CrossRef PubMed
↵
1. Chernozhukov V,
2. Hansen C,
3. An IV
. Model of Quantile Treatment Effects. Econometrica2005;73:245-61. doi:10.1111/j.1468-0262.2005.00570.x.
OpenUrl CrossRef Web of Science
↵
Chernozhukov V, Fernandez-Val I, Han S, et al. CQIV: Stata module to perform censored quantile instrumental variables regression. 2012. https://ideas.repec.org/c/boc/bocode/s457478.html
↵
1. Chernozhukov V,
2. Fernández-Val I,
3. Han S,
4. et al
. Censored quantile instrumental-variable estimation with Stata. Stata J2019;19:768-81.doi:10.1177/1536867X19893615.
OpenUrl CrossRef
↵
1. Cawley J,
2. Meyerhoefer C
. The medical care costs of obesity: an instrumental variables approach. J Health Econ2012;31:219-30. doi:10.1016/j.jhealeco.2011.10.003 pmid:22094013
OpenUrl CrossRef PubMed Web of Science
↵
1. Kang H,
2. Kreuels B,
3. May J,
4. et al
. Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. Ann Appl Stat2016;10. doi:10.1214/15-AOAS894.
OpenUrl CrossRef
↵
1. Chen Y,
2. Xu L,
3. Gulcehre C,
4. et al
. On Instrumental Variable Regression for Deep Offline Policy Evaluation. J Mach Learn Res2022;23:1-41.
OpenUrl
↵
1. Kreif N,
2. DiazOrdaz K
. Machine learning in policy evaluation: new tools for causal inference.arXiv2019;1903.00402. doi:10.48550/ARXIV.1903.00402
OpenUrl CrossRef
↵
1. Takatsu K,
2. Levis AW,
3. Kennedy E,
4. et al
. Doubly robust machine learning for an instrumental variable study of surgical care for cholecystitis.arXiv2023;2307.06269. doi:10.48550/ARXIV.2307.06269
OpenUrl CrossRef
↵
1. Inoue A,
2. Solon G
. Two-Sample Instrumental Variables Estimators. Rev Econ Stat2010;92:557-61doi:10.1162/REST_a_00011.
OpenUrl CrossRef
↵
1. Zhao Q,
2. Wang J,
3. Spiller W,
4. et al
. Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples. Stat Sci2019;34. doi:10.1214/18-STS692.
OpenUrl CrossRef

[1] ↵
Wald A
. Note on the Consistency of the Maximum Likelihood Estimate. Ann Math Stat1949;20:595-601doi:10.1214/aoms/1177729952.
OpenUrl CrossRef

[2] Wald A

[3] ↵
Angrist JD,
Imbens GW
. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat Assoc1995;90:431-42. doi:10.1080/01621459.1995.10476535.
OpenUrl CrossRef

[4] Angrist JD,

[5] Imbens GW

[6] ↵
Hernán MA,
Robins JM
. Instruments for causal inference: an epidemiologist’s dream?Epidemiology2006;17:360-72. doi:10.1097/01.ede.0000222409.00878.37 pmid:16755261
OpenUrl CrossRef PubMed Web of Science

[7] Hernán MA,

[8] Robins JM

[9] ↵
Jena AB,
Worsham C
. Random acts of medicine: the hidden forces that sway doctors, impact patients, and shape our health.Doubleday, 2023.

[10] Jena AB,

[11] Worsham C

[12] ↵
Khullar D,
Jena AB
. “Natural Experiments” in Health Care Research. JAMA Health Forum2021;2:e210290. doi:10.1001/jamahealthforum.2021.0290 pmid:36218753
OpenUrl CrossRef PubMed

[13] Khullar D,

[14] Jena AB

[15] ↵
Cheng XS,
Liu S,
Han J,
et al
. Association of Pretransplant Coronary Heart Disease Testing With Early Kidney Transplant Outcomes. JAMA Intern Med2023;183:134-41. doi:10.1001/jamainternmed.2022.6069 pmid:36595271
OpenUrl CrossRef PubMed

[16] Cheng XS,

[17] Liu S,

[18] Han J,

[19] et al

[20] ↵
Kalata S,
Thumma JR,
Norton EC,
Dimick JB,
Sheetz KH
. Comparative Safety of Robotic-Assisted vs Laparoscopic Cholecystectomy. JAMA Surg2023;158:1303-10. doi:10.1001/jamasurg.2023.4389. pmid:37728932
OpenUrl CrossRef PubMed

[21] Kalata S,

[22] Thumma JR,

[23] Norton EC,

[24] Dimick JB,

[25] Sheetz KH

[26] ↵
Halmin M,
Rostgaard K,
Lee BK,
et al
. Length of Storage of Red Blood Cells and Patient Survival After Blood Transfusion: A Binational Cohort Study. Ann Intern Med2017;166:248-56. doi:10.7326/M16-1415 pmid:27992899
OpenUrl CrossRef PubMed

[27] Halmin M,

[28] Rostgaard K,

[29] Lee BK,

[30] et al

[31] ↵
Brookhart MA,
Wang PS,
Solomon DH,
Schneeweiss S
. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology2006;17:268-75. doi:10.1097/01.ede.0000193606.58671.c5 pmid:16617275
OpenUrl CrossRef PubMed Web of Science

[32] Brookhart MA,

[33] Wang PS,

[34] Solomon DH,

[35] Schneeweiss S

[36] ↵
Davies NM,
Smith GD,
Windmeijer F,
Martin RM
. COX-2 selective nonsteroidal anti-inflammatory drugs and risk of gastrointestinal tract complications and myocardial infarction: an instrumental variable analysis. Epidemiology2013;24:352-62. doi:10.1097/EDE.0b013e318289e024 pmid:23532054
OpenUrl CrossRef PubMed

[37] Davies NM,

[38] Smith GD,

[39] Windmeijer F,

[40] Martin RM

[41] ↵
Wang PS,
Schneeweiss S,
Avorn J,
et al
. Risk of death in elderly users of conventional vs. atypical antipsychotic medications. N Engl J Med2005;353:2335-41. doi:10.1056/NEJMoa052827 pmid:16319382
OpenUrl CrossRef PubMed Web of Science

[42] Wang PS,

[43] Schneeweiss S,

[44] Avorn J,

[45] et al

[46] ↵
Bilimoria KY,
Chung JW,
Hedges LV,
et al
. National Cluster-Randomized Trial of Duty-Hour Flexibility in Surgical Training. N Engl J Med2016;374:713-27. doi:10.1056/NEJMoa1515724 pmid:26836220
OpenUrl CrossRef PubMed

[47] Bilimoria KY,

[48] Chung JW,

[49] Hedges LV,

[50] et al

[51] ↵
Ackley SF,
Zimmerman SC,
Brenowitz WD,
et al
. Effect of reductions in amyloid levels on cognitive change in randomized trials: instrumental variable meta-analysis. BMJ2021;372:n156. doi:10.1136/bmj.n156 pmid:33632704
OpenUrl Abstract/FREE Full Text

[52] Ackley SF,

[53] Zimmerman SC,

[54] Brenowitz WD,

[55] et al

[56] ↵
McClellan M,
McNeil BJ,
Newhouse JP
. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. JAMA1994;272:859-66. doi:10.1001/jama.1994.03520110039026 pmid:8078163
OpenUrl CrossRef PubMed Web of Science

[57] McClellan M,

[58] McNeil BJ,

[59] Newhouse JP

[60] ↵
Svedahl ER,
Pape K,
Austad B,
et al
. Impact of altering referral threshold from out-of-hours primary care to hospital on patient safety and further health service use: a cohort study. BMJ Qual Saf2023;32:330-40. doi:10.1136/bmjqs-2022-014944 pmid:36522178
OpenUrl Abstract/FREE Full Text

[61] Svedahl ER,

[62] Pape K,

[63] Austad B,

[64] et al

[65] ↵
Davies NM,
Holmes MV,
Davey Smith G
. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ2018;362:k601. doi:10.1136/bmj.k601 pmid:30002074
OpenUrl FREE Full Text

[66] Davies NM,

[67] Holmes MV,

[68] Davey Smith G

[69] ↵
Sanderson E,
Glymour MM,
Holmes MV,
et al
. Mendelian randomization. Nat Rev Methods Primers2022;2:6. doi:10.1038/s43586-021-00092-5 pmid:37325194
OpenUrl CrossRef PubMed

[70] Sanderson E,

[71] Glymour MM,

[72] Holmes MV,

[73] et al

[74] ↵
VanderWeele TJ
. Principles of confounder selection. Eur J Epidemiol2019;34:211-9. doi:10.1007/s10654-019-00494-6 pmid:30840181
OpenUrl CrossRef PubMed

[75] VanderWeele TJ

[76] ↵
Rosenbaum PR,
Rubin DB
. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika1983;70:41. doi:10.1093/biomet/70.1.41.
OpenUrl CrossRef Web of Science

[77] Rosenbaum PR,

[78] Rubin DB

[79] ↵
Thomas KH,
Martin RM,
Davies NM,
Metcalfe C,
Windmeijer F,
Gunnell D
. Smoking cessation treatment and risk of depression, suicide, and self harm in the Clinical Practice Research Datalink: prospective cohort study. BMJ2013;347:f5704. doi:10.1136/bmj.f5704 pmid:24124105
OpenUrl Abstract/FREE Full Text

[80] Thomas KH,

[81] Martin RM,

[82] Davies NM,

[83] Metcalfe C,

[84] Windmeijer F,

[85] Gunnell D

[86] ↵
Widding-Havneraas T,
Chaulagain A,
Lyhmann I,
et al
. Preference-based instrumental variables in health research rely on important and underreported assumptions: a systematic review. J Clin Epidemiol2021;139:269-78. doi:10.1016/j.jclinepi.2021.06.006 pmid:34126207
OpenUrl CrossRef PubMed

[87] Widding-Havneraas T,

[88] Chaulagain A,

[89] Lyhmann I,

[90] et al

[91] ↵
Guo Z,
Cheng J,
Lorch SA,
Small DS
. Using an instrumental variable to test for unmeasured confounding. Stat Med2014;33:3528-46. doi:10.1002/sim.6227 pmid:24930696
OpenUrl CrossRef PubMed

[92] Guo Z,

[93] Cheng J,

[94] Lorch SA,

[95] Small DS

[96] ↵
Davies NM,
Dickson M,
Davey Smith G,
van den Berg GJ,
Windmeijer F
. The causal effects of education on health outcomes in the UK Biobank. Nat Hum Behav2018;2:117-25. doi:10.1038/s41562-017-0279-y pmid:30406209
OpenUrl CrossRef PubMed

[97] Davies NM,

[98] Dickson M,

[99] Davey Smith G,

[100] van den Berg GJ,

[101] Windmeijer F

[102] ↵
Gokhale M,
Buse JB,
DeFilippo Mack C,
et al
. Calendar time as an instrumental variable in assessing the risk of heart failure with antihyperglycemic drugs. Pharmacoepidemiol Drug Saf2018;27:857-66. doi:10.1002/pds.4578 pmid:29943442
OpenUrl CrossRef PubMed

[103] Gokhale M,

[104] Buse JB,

[105] DeFilippo Mack C,

[106] et al

[107] ↵
Greenland S
. An introduction to instrumental variables for epidemiologists. Int J Epidemiol2000;29:722-9. doi:10.1093/ije/29.4.722 pmid:10922351
OpenUrl CrossRef PubMed Web of Science

[108] Greenland S

[109] ↵
Hirano K,
Imbens GW,
Rubin DB,
Zhou XH
. Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics2000;1:69-88. doi:10.1093/biostatistics/1.1.69 pmid:12933526
OpenUrl CrossRef PubMed

[110] Hirano K,

[111] Imbens GW,

[112] Rubin DB,

[113] Zhou XH

[114] ↵
Li S-M,
Ran A-R,
Kang M-T,
et al.,
Anyang Childhood Eye Study Group
. Effect of Text Messaging Parents of School-Aged Children on Outdoor Time to Control Myopia: A Randomized Clinical Trial. JAMA Pediatr2022;176:1077-83. doi:10.1001/jamapediatrics.2022.3542 pmid:36155742
OpenUrl CrossRef PubMed

[115] Li S-M,

[116] Ran A-R,

[117] Kang M-T,

[118] et al.,

[119] Anyang Childhood Eye Study Group

[120] ↵
Hernán MA,
Robins JM
. Causal Inference: What If.Chapman & Hall/CRC, 2020.

[121] Hernán MA,

[122] Robins JM

[123] ↵
Hartwig FP,
Wang L,
Davey Smith G,
Davies NM
. Average Causal Effect Estimation Via Instrumental Variables: the No Simultaneous Heterogeneity Assumption. Epidemiology2023;34:325-32. doi:10.1097/EDE.0000000000001596 pmid:36709456
OpenUrl CrossRef PubMed

[124] Hartwig FP,

[125] Wang L,

[126] Davey Smith G,

[127] Davies NM

[128] ↵
Swanson SA,
Miller M,
Robins JM,
Hernán MA
. Definition and evaluation of the monotonicity condition for preference-based instruments. Epidemiology2015;26:414-20. doi:10.1097/EDE.0000000000000279. pmid:25782755
OpenUrl CrossRef PubMed

[129] Swanson SA,

[130] Miller M,

[131] Robins JM,

[132] Hernán MA

[133] ↵
Labrecque J,
Swanson SA
. Understanding the Assumptions Underlying Instrumental Variable Analyses: a Brief Review of Falsification Strategies and Related Tools. Curr Epidemiol Rep2018;5:214-20. doi:10.1007/s40471-018-0152-1 pmid:30148040
OpenUrl CrossRef PubMed

[134] Labrecque J,

[135] Swanson SA

[136] ↵
Small DS,
Tan Z,
Ramsahai RR,
et al
. Instrumental Variable Estimation with a Stochastic Monotonicity Assumption. Stat Sci2017;32. doi:10.1214/17-STS623.
OpenUrl CrossRef

[137] Small DS,

[138] Tan Z,

[139] Ramsahai RR,

[140] et al

[141] ↵
Clarke PS,
Windmeijer F
. Instrumental Variable Estimators for Binary Outcomes. J Am Stat Assoc2012;107:1638-52. doi:10.1080/01621459.2012.734171.
OpenUrl CrossRef

[142] Clarke PS,

[143] Windmeijer F

[144] ↵
Pearl J
. Causality: models, reasoning, and inference.Cambridge University Press, 2000.

[145] Pearl J

[146] ↵
Feeney T,
Hartwig FP,
Davies N
. How to use directed acyclic graphs: guide for clinical researchers. BMJ2024;387:e078226. doi:10.1136/bmj-2023-078226.
OpenUrl CrossRef

[147] Feeney T,

[148] Hartwig FP,

[149] Davies N

[150] ↵
Steiner PM,
Kim Y,
Hall CE,
Su D
. Graphical Models for Quasi-experimental Designs. Sociol Methods Res2017;46:155-88. doi:10.1177/0049124115582272 pmid:30174355
OpenUrl CrossRef PubMed

[151] Steiner PM,

[152] Kim Y,

[153] Hall CE,

[154] Su D

[155] ↵
Kleiber C,
Zeileis A
. Applied Econometrics with R.Springer, 2008doi:10.1007/978-0-387-77318-6.
OpenUrl CrossRef

[156] Kleiber C,

[157] Zeileis A

[158] ↵
Baum CF, Schaffer ME, Stillman S. IVREG2: Stata module for extended instrumental variables/2SLS and GMM estimation. 2013. https://EconPapers.repec.org/RePEc:boc:bocode:s425401

[159] ↵
Stock J, Yogo M. Testing for weak instruments in linear IV regression. National Bureau of Economic Research Technical Working Paper Series. 2002;284.

[160] ↵
Keele L,
Zhao Q,
Kelz RR,
Small D
. Falsification Tests for Instrumental Variable Designs With an Application to Tendency to Operate. Med Care2019;57:167-71. doi:10.1097/MLR.0000000000001040 pmid:30520835
OpenUrl CrossRef PubMed

[161] Keele L,

[162] Zhao Q,

[163] Kelz RR,

[164] Small D

[165] ↵
Jackson JW,
Swanson SA
. Toward a clearer portrayal of confounding bias in instrumental variable applications. Epidemiology2015;26:498-504. doi:10.1097/EDE.0000000000000287 pmid:25978796
OpenUrl CrossRef PubMed

[166] Jackson JW,

[167] Swanson SA

[168] ↵
Davies NM,
Thomas KH,
Taylor AE,
et al
. How to compare instrumental variable and conventional regression analyses using negative controls and bias plots. Int J Epidemiol2017;46:2067-77. doi:10.1093/ije/dyx014 pmid:28398582
OpenUrl CrossRef PubMed

[169] Davies NM,

[170] Thomas KH,

[171] Taylor AE,

[172] et al

[173] ↵
Branson Z,
Keele L
. Evaluating a Key Instrumental Variable Assumption Using Randomization Tests. Am J Epidemiol2020;189:1412-20. doi:10.1093/aje/kwaa089 pmid:32432319
OpenUrl CrossRef PubMed

[174] Branson Z,

[175] Keele L

[176] ↵
Lipsitch M,
Tchetgen Tchetgen E,
Cohen T
. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology2010;21:383-8. doi:10.1097/EDE.0b013e3181d61eeb pmid:20335814
OpenUrl CrossRef PubMed Web of Science

[177] Lipsitch M,

[178] Tchetgen Tchetgen E,

[179] Cohen T

[180] ↵
Balke A,
Pearl J
. Bounds on Treatment Effects from Studies with Imperfect Compliance. J Am Stat Assoc1997;92:1171-6doi:10.1080/01621459.1997.10474074.
OpenUrl CrossRef Web of Science

[181] Balke A,

[182] Pearl J

[183] ↵
Wooldridge J
. Econometric analysis of cross section and panel data.MIT press, 2002.

[184] Wooldridge J

[185] ↵
Clarke PS,
Windmeijer F
. Identification of causal effects on binary outcomes using structural mean models. Biostatistics2010;11:756-70. doi:10.1093/biostatistics/kxq024 pmid:20522728
OpenUrl CrossRef PubMed Web of Science

[186] Clarke PS,

[187] Windmeijer F

[188] ↵
Newey WK
. Nonparametric Instrumental Variables Estimation. Am Econ Rev2013;103:550-6. doi:10.1257/aer.103.3.550.
OpenUrl CrossRef

[189] Newey WK

[190] ↵
Tchetgen Tchetgen EJ,
Walter S,
Vansteelandt S,
Martinussen T,
Glymour M
. Instrumental variable estimation in a survival context. Epidemiology2015;26:402-10. doi:10.1097/EDE.0000000000000262 pmid:25692223
OpenUrl CrossRef PubMed

[191] Tchetgen Tchetgen EJ,

[192] Walter S,

[193] Vansteelandt S,

[194] Martinussen T,

[195] Glymour M

[196] ↵
Lee Y,
Kennedy EH,
Mitra N
. Doubly robust nonparametric instrumental variable estimators for survival outcomes. Biostatistics2023;24:518-37. doi:10.1093/biostatistics/kxab036 pmid:34676400
OpenUrl CrossRef PubMed

[197] Lee Y,

[198] Kennedy EH,

[199] Mitra N

[200] ↵
Engel C,
Vasen HF,
Seppälä T,
et al.,
German HNPCC Consortium, the Dutch Lynch Syndrome Collaborative Group, and the Finnish Lynch Syndrome Registry
. No Difference in Colorectal Cancer Incidence or Stage at Detection by Colonoscopy Among 3 Countries With Different Lynch Syndrome Surveillance Policies. Gastroenterology2018;155:1400-1409.e2. doi:10.1053/j.gastro.2018.07.030 pmid:30063918
OpenUrl CrossRef PubMed

[201] Engel C,

[202] Vasen HF,

[203] Seppälä T,

[204] et al.,

[205] German HNPCC Consortium, the Dutch Lynch Syndrome Collaborative Group, and the Finnish Lynch Syndrome Registry

[206] ↵
Chernozhukov V,
Hansen C,
An IV
. Model of Quantile Treatment Effects. Econometrica2005;73:245-61. doi:10.1111/j.1468-0262.2005.00570.x.
OpenUrl CrossRef Web of Science

[207] Chernozhukov V,

[208] Hansen C,

[209] An IV

[210] ↵
Chernozhukov V, Fernandez-Val I, Han S, et al. CQIV: Stata module to perform censored quantile instrumental variables regression. 2012. https://ideas.repec.org/c/boc/bocode/s457478.html

[211] ↵
Chernozhukov V,
Fernández-Val I,
Han S,
et al
. Censored quantile instrumental-variable estimation with Stata. Stata J2019;19:768-81.doi:10.1177/1536867X19893615.
OpenUrl CrossRef

[212] Chernozhukov V,

[213] Fernández-Val I,

[214] Han S,

[215] et al

[216] ↵
Cawley J,
Meyerhoefer C
. The medical care costs of obesity: an instrumental variables approach. J Health Econ2012;31:219-30. doi:10.1016/j.jhealeco.2011.10.003 pmid:22094013
OpenUrl CrossRef PubMed Web of Science

[217] Cawley J,

[218] Meyerhoefer C

[219] ↵
Kang H,
Kreuels B,
May J,
et al
. Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. Ann Appl Stat2016;10. doi:10.1214/15-AOAS894.
OpenUrl CrossRef

[220] Kang H,

[221] Kreuels B,

[222] May J,

[223] et al

[224] ↵
Chen Y,
Xu L,
Gulcehre C,
et al
. On Instrumental Variable Regression for Deep Offline Policy Evaluation. J Mach Learn Res2022;23:1-41.
OpenUrl

[225] Chen Y,

[226] Xu L,

[227] Gulcehre C,

[228] et al

[229] ↵
Kreif N,
DiazOrdaz K
. Machine learning in policy evaluation: new tools for causal inference.arXiv2019;1903.00402. doi:10.48550/ARXIV.1903.00402
OpenUrl CrossRef

[230] Kreif N,

[231] DiazOrdaz K

[232] ↵
Takatsu K,
Levis AW,
Kennedy E,
et al
. Doubly robust machine learning for an instrumental variable study of surgical care for cholecystitis.arXiv2023;2307.06269. doi:10.48550/ARXIV.2307.06269
OpenUrl CrossRef

[233] Takatsu K,

[234] Levis AW,

[235] Kennedy E,

[236] et al

[237] ↵
Inoue A,
Solon G
. Two-Sample Instrumental Variables Estimators. Rev Econ Stat2010;92:557-61doi:10.1162/REST_a_00011.
OpenUrl CrossRef

[238] Inoue A,

[239] Solon G

[240] ↵
Zhao Q,
Wang J,
Spiller W,
et al
. Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples. Stat Sci2019;34. doi:10.1214/18-STS692.
OpenUrl CrossRef

[241] Zhao Q,

[242] Wang J,

[243] Spiller W,

[244] et al

Search form

Reading and conducting instrumental variable studies: guide, glossary, and checklist

Glossary of terms used in instrumental variable studies

Concepts

Statistical methods

Key assumptions that define instrument variables3

Summary points

Clinical and public health implications

Why use an instrument?

What is an instrumental variable?

Types of instruments

Physician preference

Access

Random assignment in the presence of non-compliance

Point identifying assumptions and interpretation

Assessment of point identifying assumptions

Interpretation of instrumental variable estimates

Critical appraisal checklist for evaluating instrumental variable studies

Core instrumental variable assumptions

Fourth instrumental variable assumption

Methods

Data presentation

Interpretation

Clinical implications

Assessment of instrumental variable assumptions

How to generate instrumental variable estimates

Data for instrumental variable studies

Summary

Acknowledgments

Footnotes

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information