A step-by-step approach for selecting an optimal minimal important difference
BMJ 2023; 381 doi: https://doi.org/10.1136/bmj-2022-073822 (Published 26 May 2023) Cite this as: BMJ 2023;381:e073822- Yuting Wang, doctoral candidate1,
- Tahira Devji, methodologist and medical student1 2,
- Alonso Carrasco-Labra, associated professor3,
- Madeleine T King, professor4,
- Berend Terluin, visiting fellow5 7,
- Caroline B Terwee, full professor6 7,
- Michael Walsh, associate professor1 8 9,
- Toshi A Furukawa, professor10,
- Gordon H Guyatt
, distinguished professor1
- 1Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON L8S 4L8, Canada
- 2Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- 3Department of Preventive and Restorative Sciences; Center for Integrative Global Oral Health, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, USA
- 4School of Psychology, University of Sydney, Sydney, NSW, Australia
- 5Vrije Universiteit Medical Centre Amsterdam, Department of General Practice, Amsterdam, Netherlands
- 6Vrije Universiteit Medical Centre Amsterdam, Department of Epidemiology and Data Science, Amsterdam, Netherlands
- 7Amsterdam Public Health Research Institute, Methodology, Amsterdam, Netherlands
- 8Department of Medicine, McMaster University, Hamilton, ON, Canada
- 9Population Health Research Institute, Hamilton Health Sciences/McMaster University, Hamilton, ON, Canada
- 10Department of Health Promotion and Human Behavior, Kyoto University Graduate School of Medicine/School of Public Health, Kyoto, Japan
- Correspondence to: G H Guyatt guyatt{at}mcmaster.ca
- Accepted 27 March 2023
When measuring a health state experienced and best known by patients, physiological measures and clinicians’ estimates have serious limitations. Thus, the direct measurement of patients’ perspectives represents the only satisfactory approach. Clinicians and researchers can measure patient experience—including symptom status, physical function, mental health, social function, wellbeing, and quality of life—using patient reported outcome measures (PROMs). Use of PROMs enhances understanding of the effects of interventions designed to affect disease status and course on patients’ lives.123 Authorities thus advocate for using PROMs as endpoint measures in clinical trials examining treatment effects.45678
PROM results are, however, challenging to interpret. Provided with large enough sample size, small differences in PROM scores within or between groups that might not be important to patients could achieve statistical significance.9 Researchers have proposed that the minimal important difference (MID), the smallest change or difference that patients perceive as important (on average), could aid interpretation of PROM scores.1011 The MID has clear implications for interpreting differences between groups in randomised trials: the smaller the mean differences in relation to the MID, the less likely differences represent important and substantial effects, and the larger the difference the more likely. Thus, when presenting results as mean difference in scores between groups, MIDs inform judgments as to whether mean treatment effects represent trivial, small but important, moderate, or large effects.12 MIDs can also work as a threshold for responder analyses in which trialists estimate the proportion of patients who have achieved an important treatment benefit.13
Researchers can choose between an anchor based or distribution based approach to estimate an MID. Anchor based methods examine the relation between a PROM of interest and an anchor that is itself easily interpretable (box 1).10 Distribution based methods use the statistical characteristics of PROM scores to estimate MIDs, thus providing no clear relation to the importance of the change in PROM scores to patients.14 Anchor based MIDs therefore represent a far better approach to aid interpretation of the magnitude of treatment effects.141516
Glossary terms regarding minimal important differences (MIDs)
Anchor based MID
MID refers to the smallest difference in score in the outcome of interest that informed patients or informed proxies perceive as important, and these can be beneficial or harmful. Anchor based MIDs relate a difference in the target PROM (patient reported outcome measure) to an independent measure (ie, the anchor) that is itself interpretable. Investigators can use a transition rating scale (ie, global rating of change) as the anchor. For example, “Since last month when we started the new treatment, are you feeling better or worse and, if so, to what extent” would have the following responses: “very much better,” “much better,” “moderately better,” “slightly better,” “about the same,” “slightly worse,” “moderately worse,” “much worse,” and “very much worse.” The investigators can then establish the anchor based MID for a PROM by estimating the average change in the small but importantly improved group (ie, the MID group) on the anchor (eg, the “slightly better” group), where the changes in the PROM score should at least have a moderate correlation (ie, correlation coefficient=0.5) with the transition rating scale.
Most credible MID estimates
These estimates refer to the available MID estimates that receive the highest ratings across the five core credibility criteria (ie, the highest credibility rank, see appendix 3).
Consistency among MID estimates
Consistency is defined as 80% of the MID estimates lying within an absolute value of 10% of the PROM score above or below the median (ie, within a range of 20% of the median).
Near the median of the whole distribution
When the absolute difference between the median of the selected MID estimates and the median of the whole distribution is less than an absolute value of 10% of the PROM score.
Enough MID estimates
Analogously to subgroup analysis, we will use the same threshold for enough MID estimates (ie, at least three estimates per contextualised factor).
Explained MID variability
For binary contextualised factors, we categorise the MID estimates into two groups (eg, surgical group v non-surgical group); to test whether variability is explained by the factors, we compare the medians of the two groups using Wilcoxon rank sum test with a threshold P value of 0.10.
Conducting a clinical trial or systematic review using PROMs requires a predefined MID at the stage of developing the protocol.567 For a given PROM, however, multiple anchor based MIDs differing substantially from one another are often available.171819 Currently, researchers might assume all MID estimates from published studies that have undergone peer review are equally trustworthy, and randomly choose one for their use.202122 Such a practice risks choosing a misleading MID estimate, and thus misinterpretation of results.
In response to this problem, the MID research community has sought approaches to selecting optimal MIDs. Suggestions have included use of MIDs meeting specific methodological requirements,23 as well as use of MIDs established in contexts similar to the trial.152425 When variability exists among available MID estimates, a consensus process could inform MID selection,2627 or different MIDs might be triangulated to generate one MID.282930 These suggestions all have limitations including narrowness of perspective, lack of specification and, until recently, lack of criteria to define methodology that will produce a trustworthy MID.
Our team developed an instrument to evaluate the credibility of anchor based MIDs.31 The instrument and its recent extension32 provide a systematic approach to deal with the methodological assessment for anchor based MIDs. We have also developed a living, anchor based, MID inventory (Patient Reported Outcome Minimal Important Difference (PROMID) Database) that includes all available estimates of known PROMs from the literature and includes credibility assessment of each MID estimate (https://www.promid.org/).33
The development of the inventory and assessment of credibility did not, however, solve the problem of choosing an optimal MID estimate: when several widely varying MID estimates are available, credibility might be similar across estimates. Thus, a systematic approach to selecting the optimal anchor based MID remains unavailable. We then developed an approach to resolve this deficiency; here, we describe the methods of development, the rationale for the approach, and the steps to select the optimal MID from available MID estimates.
Summary points
A systematic step-by-step approach has been developed to select an optimal, anchor based, minimal important difference (MID) from various MID estimates of a given patient reported outcomes measure (PROM)
This approach relies on information on credibility and contextualised factors of all available anchor based MIDs of a target PROM to select the optimal anchor based MID
The approach resolves the difficulties in choosing an optimal MID among multiple MIDs to interpret PROM results, which will prove helpful for clinical trials, systematic reviews, and clinical practice guidelines that use PROMs
An optimal MID should be credible and should, as far as possible, match the intended application contexts
Methods
After formation of a steering committee, the development of the selection approach consisted of three stages: conducting a systematic survey to identify issues related to selecting MID; gathering expert views on the general selection framework; reaching consensus on the details under the selection framework; and formulating a systematic, step-by-step selection approach.
Steering committee
Before the start of the project, we established a steering committee including clinicians, health research methodologists, and clinical epidemiologists (AC-L, BT, CBT, GHG, MTK, MW, TD, TAF, and YW), of whom several have expertise in health status measurement and MID research. The steering committee regularly attended virtual meetings to discuss outstanding issues and decide the next steps of the development. We recorded the meetings and circulated summaries of discussions.
Systematic survey identified candidate items related to MID selection
We conducted a systematic survey, searching up to March 2020, to qualitatively summarise items that researchers and methodologists have offered to select optimal MID estimates. We have previously reported a detailed description about the systematic survey.34 Briefly, the survey identified 29 items that constituted candidate criteria for selecting an optimal MID. They covered MID methodology issues (including statistical issues related to anchors, PROMs, and MIDs), as well as factors affecting the generalisability of MID estimates, including the contexts in which the estimates were developed.34
Expert views on the selection framework and decisions on the candidate items
In parallel with the systematic survey, we collected the committee’s views on the selection framework. Through discussions, the committee agreed on two key broad criteria: MID methodological rigor and generalisability—that is, to develop the selection approach for an optimal MID, the most crucial and first criterion is the methodological rigor of MID estimate development. Secondarily, the optimal MID should, as far as is possible, match the intended application contexts.
We then reached consensus on items identified from the survey34 to be included in the selection framework, which informed the following development of the systematic selection approach. A core team (AC-L, GHG, TD, and YW) first conducted intensive discussions on the candidate items identified from the survey and made preliminary decisions on the candidate items. The core team circulated the suggested items to the committee. The steering committee reached the agreements on the relevant items (see appendix 1 for details).
Development of the step-by-step selection approach
Guided by the consensus on the selection framework and corresponding items, the core team developed a tentative step-by-step selection approach and tested it using the data of the Western Ontario and McMaster Universities Arthritis Index (WOMAC) obtained from the MID inventory database (up to 2018).33 The entire committee then, in a series of meetings, provided feedback to refine the selection approach. When the committee identified concerns and provided suggestions, we tested the revised version with several MID estimates in the inventory (eg, pain visual analogue scale (VAS), knee injury and osteoarthritis outcome score (KOOS)).33 This iterative process continued through six committee meetings and concluded with the committee members agreeing on a definitive process of optimal MID selection. During the example based refinement, the committee reached agreements on issues related to relative and absolute MIDs, and how to deal with the MIDs for the same PROM (or subdomain) that used different scales (see appendix 1 for details).
Results
Rationale of the selection approach
As the most important aspect for selecting an anchor based MID, the committee prioritised the methodological rigor. Thus, we first apply the credibility assessment for all available MID estimates3132 and choose the most credible MIDs (first stage in fig 1 (step 1 to 30); appendix 2 presents the criteria for choosing the most credible estimates) for a given PROM (or subdomain). The median of the most credible MIDs for that PROM (or subdomain) constitutes the initial best guess as to an optimal MID.13
Selection process for an optimal, anchor based, minimal important difference (MID). The process has two stages: to choose the most credible MIDs by credibility (step 1 to 30), and then to explore contextualised factors (starting from step 30). *Step 1 aims to find the MID estimates with five ratings of “definitely yes/yes/definitely closely related” across the five core credibility criteria. Criteria questions include: (Q1) is the patient or necessary proxy responding directly to both the patient reported outcomes measure (PROM) and the anchor; (Q2) is the anchor easily understandable and relevant for patients or necessary proxy; (Q3) has the anchor shown good correlation with the PROM or (Q3.1) if the correlation is not reported, to what extent is the construct of the anchor closely related to the construct of the PROM); (Q4) is the MID precise; and (Q5) does the threshold or difference between groups on the anchor used to estimate the MID reflect a small but important difference (response options for Q1 are yes or no; all other criteria are rated on a 5 point adjectival scale with response options for “definitely yes” (“definitely closely related” for Q3.1), “to a great extent,” “not so much,” “definitely no” (“definitely not related” for Q3.1), and “impossible to tell” (appendix 2). †Step 4: the most credible estimates are the available MID estimates that receive the highest ratings across the five core credibility criteria. To arrive at the most credible estimates, the criteria are progressively relaxed, as follows: firstly, MID estimates should be chosen with five “definitely yes/yes/definitely closely related” ratings across the five credibility core criteria (Q1-5 (or using Q3.1 instead if correlation coefficient is not reported)); if, however, no estimates meet the five “definitely yes/yes/definitely closely related” ratings, this definition of the most credible MIDs can be relaxed to include MIDs that are rated as five “definitely yes/yes/definitely closely related” or “to a great extent” ratings across the core criteria; if no MIDs meet this relaxed definition, the criteria can be further relaxed to select estimates in lower credibility rank (see appendix 3 for the credibility ranks, higher rank means higher credibility)—ie, estimates with four “definitely yes/yes/definitely closely related” ratings across the core criteria; if not available again, MIDs should be selected with four “definitely yes/yes/definitely closely related” or “to a great extent” ratings across the core criteria, and so on (appendix 3). When MIDs with higher rank are available, do not go down the ranking system to select the MIDs with lower rank (eg, if there is one estimate at rank 1, this estimate would be the only one to be used). ‡Steps 5, 15, or 16: consistency is defined as 80% of the MID estimates lying within an absolute value of 10% of the PROM score above or below the median (ie, within a range of 20% of the median). §Steps 9, 22, or 23: near the median of the whole distribution means that the absolute difference between the median of the selected MID estimates and the median of the whole distribution is less than an absolute value of 10% of the PROM score. ¶Step 14: only applicable when, among the most credible MID estimates, there are estimates anchored to transition ratings—if, however, no estimates are anchored to transition ratings or all the estimates anchored to transition ratings are with long recall period (>4 weeks), skip this step and its downstream steps (ie, steps 14, 16, 20, 21, 23, 26, 27, and 29). **Steps 31 or 35: at least three most credible estimates per contextualised factor. ††Steps 34 or 41: to test whether variability is explained by contextualised factors, compare the distributions; for binary factors, use Wilcoxon rank sum test with a threshold P=0.10
The committee recognised that optimal MIDs might vary by contexts.34 If such differences exist, the median of the most credible MIDs will not apply to all contexts. Evidence suggesting that contextual differences might exist include: the most credible MIDs are inconsistent, or the most credible MIDs are consistent with one another but their median is not near the median of all MIDs. Either of these findings requires that investigators seek contextualised factors that can explain the variability among MID estimates (second stage in fig 1 (starting from step 30)).
If investigators identify such contextualised factors, they will select the median of the most credible MIDs under each context as the optimal MIDs. If, however, they fail to identify contextualised factors that explain MID variability, they choose the median of the most credible MIDs as the optimal MID.
MID selection approach in detail
Figure 1 presents the complete processes of the selection approach and appendix 2 provides a more detailed narrative description for the selection process. We summarise the process below.
The process begins with identifying the most credible MIDs. Investigators should apply the credibility instrument,3132 count the number of the five core credibility criteria met, and select the MID estimates with the highest count (fig 1 (steps 1-4); appendix 2). The most credible MIDs are those with the highest ratings across the five core credibility criteria (ie, the highest credibility rank, appendix 3).
Appendix 2 elaborates on how investigators can assess each credibility criterion and select the most credible MID. Briefly, criterion 1 is rated yes or no. All other criteria are rated on a 5 point adjectival scale with response options for “definitely yes” (or “definitely closely related”) (highest credibility); “to a great extent”; “not so much”; “definitely no” (or “definitely not related”); and “impossible to tell” (lowest credibility). The best MIDs are those meeting five “definitely yes/yes/definitely closely related” ratings for the five core criteria (fig 1 (step 1)). To identify the most credible MIDs, however, investigators can progressively relax the criteria (fig 1 (step 4); appendix 2) until they find available MIDs with the highest credibility rank (higher rank means higher credibility, appendix 3). For example, no MIDs for KOOS-quality of life (Qol) met five “definitely yes/yes/definitely closely related” ratings (ie, rank 1). We thus relaxed our criteria and found two MIDs with highest credibility rank, referring to them as the most credible MIDs, which were those meeting five “definitely yes/yes/definitely closely related” or “to a great extent” ratings (ie, rank 2; appendices 3-4).
Investigators will then check the consistency of the selected MID estimates (fig 1 (step 5)) and compare their distribution to the distribution of all available MIDs (fig 1 (step 9)). The committee suggested that MIDs be considered consistent with one another if 80% of the estimates lie within an absolute value of 10% of the PROM score above or below their median (ie, within a range of 20% of the median; box 1). When the most credible MIDs are consistent (fig 1 (step 7)) and the median of the most credible MIDs is near the median of the whole distribution (fig 1 (step 13); when the absolute difference between the median of the most credible MIDs and the median of the whole distribution is less than an absolute value of 10% of the PROM score (box 1)), investigators can be confident that the median of the most credible MIDs represents the optimal MID (fig 1 (step 19); appendix 2). For example, we identified the most credible MIDs for the pain visual analogue scale (VAS-pain; 0-100) to be 17, 13.5, 12, 15, 13,16, 14, 17, and 20.4, of which the median was 15. By definition, they were consistent. Because their median was near the median of the whole distribution (ie, 15.7)—the absolute difference between them was within 10—we selected the median of the most credible, which was 15, as the optimal MID (appendix 4).
The most credible estimates could, however, be inconsistent and excessive variability might exist. If that is the case, investigators should try to explain the variability by further consideration of the credibility criteria. The most important credibility criterion focuses on the anchor validity: the correlation between the anchor and the PROM of interest.3134 Therefore, when the most credible estimates are inconsistent (fig 1 (step 6)), investigators will prioritise the correlation criterion to explain the remaining variability and, among the most credible estimates, select MIDs with high correlations with the anchor (r ≥0.5)31 (fig 1 (step 8); appendix 2).
The credibility assessment instrument has four additional extension credibility criteria that focus on the validity of transition rating anchors (appendix 2).3135 After restricting to MIDs with high correlations with the anchor (r ≥0.5),31 if substantial variability among the estimates remains (fig 1 (step 17)) or if no MIDs with high correlations are available (fig 1 (step 10)), because MIDs were often estimated on transition rating anchors (more than half in the MID inventory),33 investigators could further consider the additional credibility criteria.31 We suggest assessing the recall period for the transition rating anchors, because except for follow-up length, the relevant data for the other three additional criteria were rarely reported.3136 Investigators could remove the estimates anchored to transition ratings with a long recall period (>4 weeks)31 (fig 1 (step 14)) and determine whether the remaining most credible estimates prove consistent (fig 1 (step 16); appendix 2). If, however, no selected estimates are anchored to transition ratings or all the selected estimates anchored to transition ratings are with long recall period (>4 weeks), investigators would skip the assessments (ie, skip steps 14, 16, 20, 21, 23, 26, 27, and 29 in fig 1).
The further consideration of the credibility criteria regarding the anchor validity above, however, might not be necessary. Because there might be few estimates with a high correlation of 0.5 between the PROM and the anchor, it is possible that all the estimates using transition anchors have long recall period; more often, the most credible estimates would appear to be consistent. For example, in our worked examples (appendices 4-5), we did not further consider the correlation and recall period criteria. After applying the five core credibility assessments, the most credible MIDs were consistent (appendices 4-5).
At this stage, investigators will face one of three situations:
The (newly) identified most credible MIDs are consistent, and their median is near the median of all MIDs (fig 1 (steps 13, 25, or 27); appendix 2). If this is the case (referring to the VAS-pain example above), the median of these (newly) identified most credible MIDs represents the optimal MID, applicable to all contexts (fig 1 (steps 19, 28, or 29); appendix 2).
The (newly) identified most credible MIDs are consistent, but their median differs considerably from the median of the whole distribution (fig 1 (steps 12, 24, or 26); appendix 2).
The (newly) identified most credible MIDs are inconsistent (fig 1 (steps 6, 17, or 20); appendix 2).
Intuitively, for the second situation, the substantial differences of the two medians might be attributed to MID credibility alone. Contextualised factors, however, could have a role. For example, all the most credible MIDs might be established in a same context (see the worked example below), and the different estimate in another context could be due to the lower credibility or rather due to the different context. Therefore, because either the second or third situation above suggests that the MIDs might be context dependent, further exploration seeking contextualisation as an explanation of variability is required (fig 1 (step 30); appendix 2).
Investigators will use the most credible MID estimates for exploring variability (fig 1 (step 31)). When, however, the number of credible estimates is insufficient for this exploration (fig 1 (step 33))—that is, fewer than three estimates per contextualised factor (box 1)—investigators will then use all available estimates for the exploration (fig 1 (step 35)). If the search for contextualised factors yields an explanation for the variability, the optimal MID will be context dependent: one optimal MID for each context (fig 1 (steps 40 or 44); appendix 2). If no contextual explanation is found, investigators will select the median of the most credible estimates among all available MIDs as the optimal MID (fig 1 (step 45); appendix 2).
Worked example of the selection approach
To further illustrate the selection approach, we describe a worked example here, using the pain subdomain of the WOMAC. Appendix 4 presents more worked examples, including VAS-pain, KOOS-Qol, and the 36 item short form survey-mental component summary (SF-36-MCS).
For WOMAC-pain, the data were obtained from the PROMID inventory (up to 2018).33 We describe the detailed process for generating the database elsewhere.33 Briefly, we searched all relevant database to summarise all available anchor based MIDs of PROMs from primary studies.33 We identified 13 studies (up to 2018) estimating MIDs for WOMAC-pain. By using different chosen thresholds on the anchor, different anchors, and more than one set of participants with different conditions or different analytical anchor based methods to estimate MID within one study, the 13 studies generated 67 estimates, of which authors expressed 45 MID estimates in absolute terms and 22 MID estimates relative to baseline scores (appendix 6).33 Our approach suggests selecting the optimal MIDs from absolute MIDs (appendix 1). We thus conducted the selection for the 45 absolute MIDs. Appendix 5 presents the entire selection process and appendix 7 provides the relevant data for conducting the selection. We describe the selection process below.
Typically, the WOMAC has five items for pain subdomain assessed on a 0-4 scale with a total score ranging from 0 to 20. Although using the same instrument, authors used different scoring system or made conversion of the scores, and thus expressed the results with different scales (scales ranging from 0 to 10, 0 to 20, 0 to 50, and 0 to 100, appendix 6). Therefore, we transformed each estimate into a scale of 0-100. After this transformation, the MIDs ranged from 0 to 35 (appendix 7).
After assessing MID credibility (fig 1 (step 1)), no estimates met five “definitely yes/yes/definitely closely related” ratings across the five core credibility criteria (appendices 5 and 7). By relaxing the criteria (fig 1 (step 4)), we found five estimates meeting five “definitely yes/yes/definitely closely related” or “to a great extent” ratings (appendices 5 and 7). Table 1 presents the absolute value of these most credible MID estimates. Using our definition for consistency (box 1), these most credible MID estimates were consistent (fig 1 (step 7); appendix 5). Because the median of the most credible estimates (28.1) was not near the median of the whole distribution (12.5) (box 1; fig 1 (step 12); appendix 5), we postulated that the MIDs for WOMAC-pain could be context dependent. We therefore explored the possibility that contextualised factors could explain MID variability.
Optimal anchor based MIDs expressed in absolute terms for WOMAC-pain (up to 2018)
Because the number of most credible MID estimates (n=5) was not sufficient for the exploration (fig 1 (step 33); appendix 5), we instead used all available estimates (fig 1 (step 35); appendix 5). We explored the impact of patient condition (knee v hip complaints) on the variability of the MID estimates by comparing the medians of the two groups using the Wilcoxon rank sum test. We found that the patients’ condition did not explain the variability. We then explored whether the intervention (surgical v non-surgical intervention) explained the variability and found a significant difference between MIDs generated in surgical versus non-surgical intervention settings (P=0.009, table 1; appendices 5 and 7).
Therefore, the selection and application of the optimal MID for WOMAC-pain was context dependent—the optimal MID differed depending on whether the patients were undergoing surgical or non-surgical intervention (fig 1 (step 44); appendix 5). In other words, we found two optimal MIDs: one selected from the MIDs estimated under the context of surgical intervention that should be exclusively used in the context of surgical intervention; and the other selected from the MIDs estimated under the context of non-surgical intervention that should be exclusively used in the context of non-surgical intervention.
Under the context of surgical intervention, we found that the most credible estimates were those meeting five “definitely yes/yes/definitely closely related” or “to a great extent” ratings across the five core credibility criteria; including estimates 29.26, 29.9, 20.5, 28.1, and 23.5, with a median of 28.1 (table 1). The most credible estimates for non-surgical intervention were those meeting four such ratings across the five core credibility criteria; including 11.8, 12.9, 6.4, 8.3, 13.51, 8.74, 15, 8.74, 7.09, and 4.1, with a median of 8.7 (table 1). The corresponding optimal MID for surgical intervention was 28.1 applicable to the context of surgical intervention, and the optimal MID for non-surgical intervention was 8.7 applicable to the context of non-surgical intervention.
Discussion
Based on expert experience, a systematic survey, and example based refinement, we have developed a systematic step-by-step approach for selecting an optimal anchor based MID from various MID estimates of a given PROM. We have successfully applied the approach to several PROMs in the MID inventory33 (appendices 4-5).
This selection approach is based on explaining the variability of all available MID estimates for the PROM of interest. We prioritise the methodological rigor of MID estimation and, through credibility assessment,3132 select the most credible MID estimates (fig 1 (step 1)). If the most credible MIDs fall in a relatively narrow range, investigators should choose the median as the optimal MID.
If, however, the most credible MIDs are consistent but their median differs substantially from the median of all the MIDs, or the most credible MIDs are inconsistent, credibility alone cannot explain the variation and contextualised factors influencing MID estimates could exist.34 Our approach mandates further exploring any contextualised factors to explain the variability among all the MIDs (fig 1 (step 30)). The potential contextualised factors that deserve a consideration could come from the suggestions of previous researchers and include intervention (eg, surgical v conservative treatments), patient condition (eg, knee v hip osteoarthritis), baseline disease severity, patient age, follow-up duration, socioeconomic status, geography, and sex.34
When the process identifies contextualised factors, the median of the most credible MIDs under a specific context represents the optimal MID, resulting in context dependent MIDs. If the process fails to identify contextualised factors that explain MID estimate variability, investigators should still select the median of MIDs with the highest credibility as the optimal MID and apply it to all contexts.
Strength and limitations
Strengths of this study included the range and depth of expertise of the study team, the systematic survey that informed the process,34 iterative modification of the selection approach based on expert feedback and the application of the approach to PROMs in our MID inventory,33 and the resulting transparent workable process. The approach worked well in selecting optimal MIDs for common PROMs in the inventory33 (appendices 4-5).
Among the limitations, the selection process is complex and can be burdensome. Before navigating the selection process, users should collect all available MID estimates for the PROM of interest. Our MID inventory PROMID (https://www.promid.org) can also provide the necessary material.
Suboptimal reporting of MID estimation studies (eg, lack of reporting the upper and lower limits of the PROM scale and all the relevant information about MID credibility) can lead to difficulties in the selection. Because the data for exploring contextualised factors might be limited, some factors that are important but not measured or reported by authors could be responsible for differences between MIDs. If such unmeasured or unreported differences exist and are substantial, they would limit the applicability of the optimal MID selected by our approach. The selection might also not work well when only a few yet divergent MID estimates are available. Having established an optimal MID, when new estimates emerge, investigators might need to review the process. The selection approach includes thresholds that are arbitrary. For instance, we considered a difference of less than an absolute value of 10% score of the PROM scale as a relatively narrow range (ie, the definition of “consistency” and “near the median”; box 1).
Our steering committee, a small group of experts, might not be representative. The committee members, however, were diverse in geography and sex, of whom several members (GHG, MTK, CBT, TAF, BT) have extensive expertise in health measurement research. Finally, owing to no consensus on the standard use of optimal MID, future insights on anchor based MID could require the modification of the selection approach.
Implications
In recent years, PROMs have become increasingly popular in clinical practice and clinical trials.373839404142 PROMs provide crucial information regarding treatment efficacy from patients’ perspectives that cannot be captured by other outcomes. Along with the use of PROMs, the number of anchor based MID estimates, as well as the demand for a suitable MID to aid the interpretation of treatment effects, has increased considerably.20212241 Failure to identify an optimal MID might result in serious misinterpretations of PROM results. Further, available MID estimates often vary widely,424344 presenting a dilemma for those conducting clinical trials, authors of systematic reviews, guideline developers, clinicians, funders, and policy makers in deciding how to choose the best MID. The selection approach described here resolves this problem by providing a logical and systematic way to select an optimal MID for a PROM when multiple discrepant MIDs exist.
The selection process is geared towards explaining the variability among available MID estimates and where appropriate, to provide one optimal MID—ie, the estimate taking the median of the selected estimates in a relatively narrow range.13 Two stages—the methodological rigor and the generalisability (ie, factors influencing the MID application)—frame the selection process. The selection covers the important factors about estimation and application of anchor based MIDs, but does not look at the analytical methods used to estimate anchor based MIDs (eg, mean change method, receiver operating characteristics method).34
We could have used a formal Delphi process to choose candidate items to inform the first draft of the selection approach that subsequently underwent iterative example based refinement. But the choice of candidate items was straightforward, and any appreciable differences that would have emerged from the Delphi panel are unlikely (appendix 1, eTable 1).
Typically, if an intervention has more associated burdens and adverse effects, people will require a larger effect or improvement and thus a larger MID. This scenario is what the data of WOMAC-pain demonstrated: the MIDs for surgery were larger than non-surgery. Thus, to use responder analysis for analysing the benefits of the interventions, researchers should dichotomise the participants using the MIDs specific to the intervention they received. When researchers use an MID in the process to choose a target difference to calculate sample size4546 and interpret results measured by mean difference for clinical trials,46 they should take the interventions into account. For example, for a trial measuring the effects of surgery versus non-surgery on WOMAC-pain, because the treatment difference between the interventions would reflect the demand of an improvement for surgery, use of the optimal MID for surgical intervention would be more appropriate, and the sample size calculated accordingly.
In our worked examples, only a small proportion of MIDs had high credibility (eg, of 45 published estimates for WOMAC-pain, only five were highly credible). This finding highlights the need for more high quality studies to establish new credible MID estimates and better reporting of MID studies.36 The criteria in our credibility tool31 provide key methodological rigor for developing trustworthy anchor based MIDs.
Ideally, after selection, the optimal MID would not change as new evidence emerges and thus become a unique standard to aid the interpretation for a given PROM. Therefore, generating new estimates for such a PROM would be a poor use of limited research resources. The larger the number of high credibility, consistent MIDs, the more compelling the case that a definitive optimal MID has been established. What threshold one should use for this conclusion remains, however, open to debate and therefore another potential area for subsequent research.
Acknowledgments
We thank Mark Phillips, Bradley C Johnston, Dena Zeraatkar, Meha Bhatt, Xuejing Jin, Romina Brignardello-Petersen, Olivia Urquhart, Farid Foroutan, Stefan Schandelmaier, Hector Pardo-Hernandez, Robin WM Vernooij, Hsiaomin Huang, Linan Zeng, Yamna Rizwan, Reed Siemieniuk, Lyubov Lytvyn, Zhikang Ye, Liam Yao, Vanessa Wong, Donald L Patrick, Shanil Ebrahim, Gihad Nesrallah, Holger J Schunemann, Mohit Bhandari, and Lehana Thabane for their contributions on the MID inventory and credibility instrument projects; and Anila Qasim for her contributions to maintain the Patient Reported Outcome Minimal Important Difference (PROMID) Database, an MID inventory online platform (https://www.promid.org).
The Credibility instrument for judging the trustworthiness of minimal important difference estimates, authored by Devji et al, and the Minimal Important Difference Inventory, authored by Carrasco-Labra et al, are the copyright of McMaster University (copyright 2018, McMaster University, Hamilton, ON, Canada). The instrument and inventory have been provided under license from McMaster University and must not be copied, distributed, or used in any way without the prior written consent of McMaster University. Contact the McMaster Industry Liaison Office at McMaster University (milo{at}mcmaster.ca) for licensing details.
Footnotes
Contributors: GHG, TD, and YW initiated the project. AC-L, BT, CBT, GHG, MTK, MW, TD, TAF, and YW provided insights on the selection framework and contributed to the consensus of the items used to develop the selection approach. AC-L, TD, YW, and GHG drafted the details of the selection approach. All authors provided feedback for the revision of the selection approach. YW was responsible for revising the selection approach until all authors reached agreements. YW wrote the initial draft of the manuscript and all other authors reviewed and revised the manuscript draft. All authors approved the final version of the manuscript. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication. YW and GHG is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: No specific funding was given to this study.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support for the submitted work. AC-L, GHG, and TD have a patent issued for the Credibility instrument for judging the minimal important difference and the Patient Reported Outcome Minimal Important Difference (PROMID) Database (https://www.promid.org). AC-L, CBT, GHG, MW, TAF, and TD report grants, pending patents, personal fees, roles in advisory board or leadership in the committees outside the submitted work; no other relationships or activities that could appear to have influenced the submitted work.
Provenance and peer review: Not commissioned, externally peer reviewed.