FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare
BMJ 2025; 388 doi: https://doi.org/10.1136/bmj-2024-081554 (Published 05 February 2025) Cite this as: BMJ 2025;388:e081554
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Dear Editor
The article "FUTURE-AI: International Consensus Guideline for Trustworthy and Deployable Artificial Intelligence in Healthcare" offers an invaluable contribution to the establishment of a robust framework for proper AI deployment in healthcare. The guideline significantly contributes to the ongoing discourse on AI adoption in healthcare by advocating the FAIR principles (1). However, while the guideline offers valuable recommendations, a few critical aspects warrant scrutiny.
Firstly, the article offers little empirical validation of the proposed guidelines. Despite the expert consensus being important, the inadequacy of real-world implementation data constrains its applicability. Therefore, pilot implementations ought to have been incorporated to provide a preview of the practical feasibility and significance of the proposed framework (2).
Secondly, the generalisability of the guidelines remains questionable, especially regarding African healthcare settings. There are marked variations in AI applications in healthcare across different healthcare systems, socioeconomic niches, and even specialties. Low- and middle-income countries still face numerous barriers such as inadequate digital infrastructure, poorly developed regulatory frameworks, and data scarcity (3). Thus, addressing these concerns would promote the guidelines’ global applicability.
Moreover, the article fails to emphasize the dynamic nature of AI models. AI models exhibit remarkable dynamism as they are exposed to newer data. Therefore, there is a serious concern about mechanisms for continuous validation and upgrading of AI algorithms to ensure sustainability.
Lastly, the guideline inadequately addresses the risk of algorithmic bias. Despite the general acknowledgment of bias as a concern, the article fails to propose concrete mitigation solutions. Bias in AI, often, stems from imbalanced training datasets, variations in epidemiological characteristics of diseases, and clinical practice patterns (5). More clearer recommendations on ensuring diversity in the training dataset, effective bias audits, and regulatory oversight to avert health disparities, especially in underrepresented countries.
In conclusion, the FUTURE-AI guideline is a valuable contribution to the discourse on AI in healthcare. This guideline sets a promising trajectory for research and developments in AI as far as healthcare goes. However, future research should address concerns of real-world validation, algorithmic bias, and continuity in monitoring and upgrading AI models.
References
1. Lekadir K, Frangi AF, Porras AR, Glocker B, Cintas C, Langlotz CP, et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025 Feb 5;e081554.
2. Pearson N, Naylor PJ, Ashe MC, Fernandez M, Yoong SL, Wolfenden L. Guidance for conducting feasibility and pilot studies for implementation trials. Pilot and Feasibility Studies [Internet]. 2020 Oct 31;6(1). Available from: https://pilotfeasibilitystudies.biomedcentral.com/articles/10.1186/s4081...
3. Kaboré SS, Ngangue P, Soubeiga D, Barro A, Pilabré AH, Bationo N, et al. Barriers and facilitators for the sustainability of digital health interventions in low and middle-income countries: A systematic review. Frontiers in Digital Health. 2022 Nov 28;4.
4. Min A. Artificial Intelligence and Bias: Challenges, Implications, and Remedies. Journal of Social Research [Internet]. 2023 Oct 5;2(11):3808–17. Available from: https://ijsr.internationaljournallabs.com/index.php/ijsr/article/view/1477
Competing interests: No competing interests
Dear Editor,
I commend Lekadir et al. for developing the FUTURE-AI framework, which provides essential guidance for artificial intelligence (AI) implementation in healthcare. The framework's six principles - Fairness, Universality, Traceability, Usability, Robustness, and Explainability - establish fundamental criteria for responsible AI development and deployment. However, current generative AI systems present challenges in meeting these guidelines, particularly regarding proprietary models prevalent in the technology sector.
Contemporary AI systems, specifically large language models (LLMs) and generative models, demonstrate inherent limitations in achieving full compliance with FUTURE-AI recommendations. These limitations are most evident in meeting Traceability and Explainability requirements, such as comprehensive documentation of model properties and systematic auditing processes. For example, DeepSeek R1, despite its open architecture, maintains limited transparency regarding training data and methodological processes.
The current development paradigm of generative AI involves continuous model updates and data integration, necessitating substantial financial resources. This iterative approach, while essential for maintaining optimal performance, presents challenges to traditional research standards emphasizing complete transparency. I propose that demanding absolute transparency may impede innovation while limiting access to advanced computational tools.
Future research priorities should include the development of evaluation frameworks specifically designed for proprietary models. These frameworks should emphasize output assessment and behavioral analysis rather than internal architecture examination. Key components should include:
1. Standardized external validation protocols, including systematic stress testing
2. Clinical outcome evaluation through rigorous trials
3. Comprehensive bias detection methodologies across demographic groups
4. Systematic audit procedures for maintaining accountability
This approach enables responsible AI implementation while acknowledging practical constraints in healthcare settings. It provides a bridge between transparency requirements and technological advancement, ensuring maintenance of safety, efficacy, and fairness standards.
Sincerely,
Competing interests: No competing interests
Dear Editor
The FUTURE-AI international consensus guideline is a welcome contribution to the critical discussion surrounding trustworthy AI in healthcare. Its broad scope and collaborative nature are commendable. However, several key limitations warrant attention to maximize its impact and practical utility.
Patient Engagement: Moving Beyond Tokenism
While FUTURE-AI emphasizes stakeholder engagement, its approach to patient involvement is superficial. Simply including patients in surveys or focus groups, as suggested in Table 3, is insufficient. Meaningful engagement requires actively soliciting patient values, preferences, and concerns about AI's role in their care. Specific methodologies-such as semi-structured interviews and patient-led scenario testing-are essential to deeply understand how patients envision AI within their healthcare journeys. Without such depth, there is a risk of developing technically sound tools that fail to resonate with the very people they are intended to serve.
Explainability: Demystifying the Black Box
Explainability is paramount for building trust in AI-driven healthcare. Although FUTURE-AI mentions visual explanations, it lacks concrete guidance on specific explainability methods. Clinicians require a clear understanding of why an AI arrives at a particular conclusion. The omission of techniques like LIME, SHAP, and counterfactual explanations hinders practical application and perpetuates the "black box" problem-undermining clinician confidence and potentially leading to misuse.
Data Governance and Quality: The Foundation of Trust
FUTURE-AI’s treatment of data governance and quality, while touching on privacy and security, is surprisingly thin. The reliability of AI models depends critically on the quality and integrity of the training data. Explicit recommendations for data profiling, outlier detection, version control, and lineage documentation are essential. Without a robust data governance framework, the risk of deploying biased or flawed AI models remains unacceptably high, raising serious ethical and clinical concerns.
Continuous Monitoring and Adaptation: Ensuring Long-Term Validity
Healthcare is a dynamic environment, and AI models must adapt accordingly. While FUTURE-AI acknowledges post-market follow-up, it inadequately addresses the challenges of model drift and the need for adaptive learning. Specific strategies for detecting performance degradation due to evolving data or clinical practices are crucial. The guideline should advocate for adaptive learning techniques that allow models to evolve and maintain their accuracy over time.
Contextual Adaptability: One Size Does Not Fit All
FUTURE-AI’s generic approach is designed for broad applicability; however, it overlooks the critical need for contextual adaptation. The relevance of specific recommendations can vary significantly across different medical use cases. Emphasizing the importance of tailoring the framework to specific contexts-including addressing unique types of bias, clinical settings, and explainability requirements-is essential. Without such customization, there is a risk of endorsing a "one-size-fits-all" approach that ultimately falls short in addressing real-world challenges.
FUTURE-AI provides a valuable starting point for advancing trustworthy AI in healthcare, its narrow focus in patient engagement, explainability, data governance, continuous monitoring, and contextual adaptability must be addressed.
Competing interests: No competing interests
Dear Editor,
The FUTURE-AI framework represents a significant step toward establishing principles for trustworthy artificial intelligence in healthcare. However, its heavy emphasis on committee-based validation and oversight inadvertently creates barriers for smaller healthcare practices, potentially exacerbating existing healthcare disparities.
The framework's reliance on extensive human committees for validation, while well-intentioned, assumes institutional resources typically found only in large academic medical centers. This oversight model becomes impractical for community hospitals, rural clinics, and independent practices that often serve underserved populations. Paradoxically, this could create a two-tiered system where advanced AI capabilities remain concentrated in well-resourced academic centers, while smaller practices - which could benefit most from AI assistance - face insurmountable implementation barriers.
Modern AI systems offer an alternative approach: automated validation pipelines that can continuously monitor model performance, detect biases, generate documentation, and ensure regulatory compliance. Such systems can provide more comprehensive and consistent oversight than human committees while making trustworthy AI implementation feasible across diverse healthcare settings. This automation-first approach could help realize FUTURE-AI's principles of Fairness, Universality, Traceability, Usability, Robustness, and Explainability more consistently and at scale.
As we advance toward AI-enabled healthcare, we must ensure our implementation frameworks promote rather than hinder accessibility. Incorporating automated validation alongside traditional oversight could help democratize access to trustworthy AI across the full spectrum of healthcare settings.
Rahul Chaudhary, MD, MBA, MSCS candidate
Cardiologist, VA Pittsburgh Health System and University of Pittsburgh Medical Center, Pittsburgh, PA, USA
Physician-scientist, University of Pittsburgh, Pittsburgh, PA, USA
Director, AI-HEART Lab, Pittsburgh, PA, USA
Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Competing interests: No competing interests
FUTURE-AI requires integration with the AI regulations in clinical practice
Dear Editor,
We read with great interest the manuscript of Lekadir et al. [1]. The authors have provided a comprehensive framework based on six guiding principles—Fairness, Universality, Traceability, Usability, Robustness, and Explainability—for trustworthy AI systems in healthcare. The structured methodology and interdisciplinary collaboration are creditable.
Upon closer examination, we identified several subgroups discussing trustworthy and deployable AI. The authors highlighted biases concerning demographic attributes, such as age, gender, and ethnicity, and technical attributes, such as site and machine variations. These efforts align with the core principles of fairness and robustness.
Nevertheless, a more extensive examination of pertinent ISO/IEC standards, such as ISO/IEC TR 24027:2021 [2], ISO/IEC TR 24028:2020 [3], ISO/IEC TR 24029-1:2021 [4], ISO/IEC DIS 24029-2:2022 [5], and ISO/IEC 23894:2023 [6], uncovers various supplementary aspects that are not comprehensively addressed in the BMJ article. The guidelines cover multiple elements of AI trustworthiness beyond biases, such as robustness, traceability, and risk management. Significantly, the subsequent items were either lacking or missing in the proposed framework:
Fairness
Automation Bias: The propensity for human decision-makers to favor automated suggestions, even when erroneous, was not explicitly addressed under traceability or explainability (ISO/IEC TR 24027:2021 [2]).
In-group and Out-group Homogeneity Bias: These biases can significantly impact AI fairness by influencing how model generalizations are applied across demographic groups (ISO/IEC TR 24027:2021 [2]).
Universality
Non-representative Sampling: ISO/IEC TR 24027:2021 [2] highlights the risk of non-representative datasets, which may hinder model applicability across diverse populations.
Societal Bias: The paper overlooks societal biases that may limit the generalizability of AI models.
Traceability
Incomplete Documentation: ISO/IEC TR 24028:2020 [3] emphasizes the importance of traceability through clear documentation and data lineage, which was not extensively covered.
Lack of Formal Robustness Metrics: As ISO/IEC TR 24029-1:2021 [4] outlined, traceability requires formal performance documentation that was not discussed.
Absence of Stability and Sensitivity Analysis: ISO/IEC DIS 24029-2:2022 [5] introduces stability, sensitivity, relevance, and reachability measures for robustness traceability that were not mentioned.
Usability
Human-Computer Interaction (HCI) Factors: As outlined in ISO/IEC TR 24028:2020 [3], the usability dimension lacks discussion on how interface design and explainability impact end-user adoption.
Robustness
ISO/IEC TR 24029-1:2021 [4] and ISO/IEC DIS 24029-2:2022 [5] define multiple subgroups of robustness that were either missing or underexplored:
Statistical Methods: Including interpolation performance measures, contrastive robustness metrics, and performance under perturbations.
Formal Methods: Encompassing model checking, abstract interpretation, and solver-based validation to mathematically ensure system reliability.
Empirical Methods: Field trials and a posteriori testing for real-world robustness validation were not explicitly discussed.
Explainability
Model Transparency: ISO/IEC TR 24028:2020 [3] identifies the need for both ex-ante and ex-post explanations to build trust, which the BMJ paper did not sufficiently explore.
Moreover, when examining the ISO/IEC 23894:2023 [6] recommendations for AI risk management, along with ISO/IEC TR 24029-1:2021 [4] and ISO/IEC DIS 24029-2:2022 [5] related to neural network robustness and ISO/IEC TR 24028:2020 [3] concerning AI trust, additional deficiencies surface. These standards address the need for formal techniques in robustness evaluation, early bias identification during the AI lifecycle, and the incorporation of transparency and privacy factors during the design phase.
We suggest that upcoming versions of the FUTURE-AI framework include these ISO-standardized categories and tackle the associated standards for risk management, robustness, and trust. A comprehensive methodology employing ISO/IEC 23894:2023 [2], ISO/IEC TR 24029-1:2021 [4], ISO/IEC DIS 24029-2:2022 [5], and ISO/IEC TR 24028:2020 [3] will improve the thoroughness of bias detection and mitigation techniques throughout every stage of the AI lifecycle, that will result in better application of AI models in clinical practice.
References:
[1] Vasey B, Schlegl T, Raimondi S, et al. FUTURE-AI: International Consensus Guideline for Trustworthy and Deployable Artificial Intelligence in Healthcare. BMJ 2024: e081554. doi:10.1136/bmj-2024-081554.
[2] ISO/IEC TR 24027:2021. Information technology—Artificial intelligence (AI)—Bias in AI systems and AI-aided decision making. Geneva: International Organization for Standardization; 2021.
[3] ISO/IEC TR 24028:2020. Information technology—Artificial intelligence—Overview of trustworthiness in artificial intelligence. Geneva: International Organization for Standardization; 2020.
[4] ISO/IEC TR 24029-1:2021. Artificial intelligence (AI)—Assessment of the robustness of neural networks—Part 1: Overview. Geneva: International Organization for Standardization; 2021.
[5] ISO/IEC DIS 24029-2:2022. Artificial intelligence (AI)—Assessment of the robustness of neural networks—Part 2: Methodology for the use of formal methods. Geneva: International Organization for Standardization; 2022.
[6] ISO/IEC 23894:2023. Information technology—Artificial intelligence—Guidance on risk management. Geneva: International Organization for Standardization; 2023.
Competing interests: None declared.
Funding Support: This research was funded by the Beatriu de Pinós postdoctoral program from the Office of the Secretary of Universities and Research from the Ministry of Business and Knowledge of the Government of Catalonia program: 2020 BP 00261, and by the LLavor A grant, funded by the Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR), Generalitat de Catalunya, under the Programa Indústria del Coneixement: PREPARE (2024 LLAV 00083).
Competing interests: No competing interests