AI Assurance Frameworks for Digital Health Developers

Assurance Methods

Introduction

The Coalition of Health AI (CHAI) will unveil their Testing and Evaluation (T&E) Framework in greater detail at their Global Summit on October 19, 2024, held during the HLTH event. This presents an opportune moment to examine the programs that influenced the T&E Framework and compare them to MITRE’s systematic approach for validating AI-enabled systems. Analyzing these frameworks contextualizes CHAI’s T&E Framework and aids developers in selecting a framework that aligns with the ONC’s HTI-1 (b)(11) requirements.

In December 2023, CHAI published a call-to-action paper titled “A Nationwide Network of Health AI Assurance Laboratories” in JAMA. While this article provided some insights into the CHAI framework’s objectives and structure, a deeper exploration of Stanford University’s Fair, Useful, and Reliable AI Models (FURM) and Duke University Health System’s ABCDS (Algorithm-Based Clinical Decision Support) is necessary to comprehend the initial phase of CHAI’s T&E Framework.

Each of these frameworks offers a unique approach to evaluating, implementing, and governing AI/ML tools in healthcare. While they share common goals of risk management and quality assurance, they differ in their specific methodologies, focus areas, and organizational structures.

This comparative analysis aims to provide healthcare professionals, policymakers, and AI developers with a comprehensive overview of current best practices in AI assurance. By understanding the strengths and unique features of each framework, stakeholders can make informed decisions about which approach might best suit their specific needs and contexts.

The following sections will delve into the key components, processes, and principles of each framework, highlighting their contributions to the evolving field of AI governance in healthcare.

Stanford University's Fair, Useful, and Reliable AI Models (FURM)

The main points of the framework for evaluating Fair, Useful, and Reliable AI Models (FURM) in healthcare systems are:

Components of the FURM assessment:
- Problem, need and use case definition
- Usefulness estimates by simulation
- Financial projections
- Ethical considerations
The assessment process involves:
- Discovering assurance needs
- Characterizing and prioritizing risks
- Evaluating risks
- Managing risks
Key outputs include:
- An AI-enabled system that has been evaluated and whose risks are being managed
- An assurance plan specifying how the system is assured and maintained over its lifecycle
- Specification of the level of assurance for functions and capabilities
- Knowledge gained that can be reused for other assurance analyses
The process uses simulations to estimate achievable utility of proposed AI model-guided workflows, considering factors like:
- Technical model performance
- Capacity constraints of the deployment setting
- Potential utility of the workflow
Financial projections assess sustainability, looking at revenue drivers, cost drivers, and sensitivities
Ethical considerations examine issues like responsibility, equity, traceability, reliability, governance, non-maleficence, and autonomy
The framework emphasizes the importance of iterating on the assessment process until desired assurance levels are achieved
It produces an assurance plan that documents the entire process and guides ongoing management of the AI system
The approach aims to bridge the gap between AI model development and achievable real-world benefit in healthcare settings

In summary, the FURM framework provides a structured approach to comprehensively evaluate AI models for healthcare applications across technical, financial, ethical and practical dimensions before deployment. It emphasizes iterative assessment and ongoing management throughout the system lifecycle.

Duke University's Algorithm-Based Clinical Decision Support (ABCDS) Framework

The ABCDS governance framework for AI/ML tools created and deployed at Duke University Health System, as described in the document, has several key components:

Framework Overview:
- Combines regulatory best practices with an integrated view of the model lifecycle
- Engages development teams to locally deploy AI/ML models of varying conditions or risk
ABCDS Lifecycle Phases: a) Model Development b) Silent Evaluation c) Effectiveness Evaluation d) General Deployment
Checkpoints:
- Rigorous governance structure with expert multidisciplinary subcommittee reviews at 3 checkpoints between phases
- Checkpoints (G0, G1, G2, Gm) ensure readiness for progression to the next lifecycle phase
Model Registration and Triage:
- All algorithms used for patient care are registered and triaged
- Categorization based on knowledge transparency: Standard of Care Models, Knowledge-Based Models, and Data-Driven Models
- Risk-based triage approach determines the level of review required
ABCDS Oversight Committee:
- Executive-level committee providing institution-wide oversight and governance
- Defines ABCDS lifecycle, checkpoints, and manages model deployments
- Includes representatives from various departments and subcommittees
Subcommittees:
- Evaluation Subcommittee
- Implementation and Monitoring Subcommittee
- Regulatory Subcommittee
Key Principles:
- Focus on quality (by design, validation, and control through monitoring)
- Emphasis on clinical impact and patient-focused outcomes
- Consideration of health equity and algorithmic fairness
Roles and Responsibilities:
- Clearly defined roles for various stakeholders including development teams, clinical owners, business owners, and committee members
Implementation:
- In effect since January 2021
- As of October 2021, 52 models in the portfolio across various clinical service lines and ABCDS lifecycle stages

This framework aims to ensure the safe, effective, and ethical deployment of AI/ML tools in the healthcare system by providing a structured approach to governance throughout the lifecycle of these tools. It emphasizes the importance of continuous evaluation, risk management, and alignment with clinical needs and outcomes.

MITRE's Repeatable Process for Assuring AI-enabled Systems

MITRE’s repeatable process for assuring AI-enabled systems, as described in the document, consists of several key components:

Definition of AI Assurance:
- A process for discovering, assessing, and managing risk throughout the life cycle of an AI-enabled system
- Aims to ensure the system operates effectively to benefit stakeholders
AI Assurance Process Steps: a) Discover Assurance Needs b) Characterize and Prioritize Risks c) Evaluate Risks d) Manage Risks
AI Assurance Plan:
- A comprehensive artifact that codifies all information generated during the assurance process
- Includes Assurance Process Management, System Characterization, and Life Cycle Assurance Implementation components
Supporting Laboratory Infrastructure:
- MITRE’s AI Assurance and Discovery Lab integrates capabilities to support the process
- Includes tools like AI Assurance Needs Discovery Protocol, AI Assurance Knowledge Base, LLM Secure Integrated Research Environment, and more
Stakeholder Roles:
- Defines roles for AI Developers, End Users/Operators, Program Offices, Standards Bodies, Testers, Regulators, and Monitors
- Outlines responsibilities for each role in different application focuses (Development, Acquisition, Certification, Deployment)
Application Focus Areas:
- Development
- Acquisition
- Certification
- Deployment
Key Principles:
- Emphasis on mission context and sector-specific considerations
- Recognition of the nascent state of AI assurance science and engineering
- Advocacy for significant government and industry investments and public-private partnerships
Outputs of the AI Assurance Process:
- An evaluated AI-enabled system with managed risks
- An assurance plan for maintaining assurance over the system’s life cycle
- Specification of assurance levels for system functions and capabilities
- Reusable knowledge for future assurance analyses
Iterative Nature:
- The process can be reinvoked during any phase of the system’s life cycle as needed
Flexibility:
- The process can be tailored to different stages of AI development and different assurance goals

This repeatable process aims to provide a structured, comprehensive approach to assuring AI-enabled systems across their entire lifecycle, adaptable to various sectors and applications while emphasizing continuous risk management and stakeholder involvement.

Conclusion

The emergence of AI in healthcare has necessitated robust frameworks for evaluation, implementation, and governance. This analysis has explored three significant approaches: Stanford’s FURM, Duke’s ABCDS, and MITRE’s AI Assurance Process. Each framework offers unique strengths in addressing the complex challenges of integrating AI into healthcare systems.

Stanford’s FURM framework provides a comprehensive approach that balances technical performance with real-world utility, financial sustainability, and ethical considerations. Its emphasis on simulation and iterative assessment offers a pragmatic path to bridging the gap between AI development and practical healthcare application.

Duke’s ABCDS framework stands out for its structured lifecycle approach and rigorous governance model. By incorporating multiple checkpoints and expert reviews, it ensures thorough vetting of AI tools throughout their development and deployment. The framework’s risk-based triage system and clear delineation of roles and responsibilities provide a solid foundation for institutional AI governance.

MITRE’s AI Assurance Process offers a broader, more adaptable framework applicable beyond healthcare. Its focus on continuous risk management and stakeholder involvement throughout the AI system’s lifecycle aligns well with the dynamic nature of AI development and deployment. The emphasis on creating a comprehensive AI Assurance Plan and supporting laboratory infrastructure demonstrates a forward-thinking approach to long-term AI governance.

As AI continues to transform healthcare, these frameworks will play a vital role in shaping a future where AI-enabled systems are not only powerful and efficient but also trustworthy, equitable, and aligned with the core values of healthcare delivery. Upcoming posts will show how developers can implement the MITRE process to support the ONC HTI-1 regulatory requirements for predictive decision support interventions.