Ensuring Trustworthy AI in Healthcare
Key governance structures that countries should implement to facilitate and regulate AI adoption
AI development has raised concerns about the amplification of bias, loss of privacy, social harms, disinformation, and harmful changes to the quality and availability of care and employment. In response to growing concerns, companies, governments, and potential adopters of AI technologies have described principles they believe should be followed in order to make a product safe and trustworthy. An example of this is the NHS’s “guide to good practice for digital and data-driven health technologies”. However, we need systems in place that go beyond high-level principles to make sure that AI is developed in a responsible way.
Governance is a process where the stakeholders involved in a collective problem deliberate and make a set of decisions about how to create and sustain social norms and established practices. It creates a framework to guide people on how to manage risk and behave ethically.
A good example of effective governance, described in Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims, is the airline industry:
People who get on airplanes don’t trust an airline manufacturer because of its PR campaigns about the importance of safety — they trust it because of the accompanying infrastructure of technologies, norms, laws, and institutions for ensuring airline safety.
— Brundage, et al. http://arxiv.org/abs/2004.07213
Aircraft manufacturers around the world follow a common, agreed-upon set of guidelines and regulations to make sure that all planes meet a set of safety standards.
Given the many public examples of AI behaving badly to date, members of the public are concerned about the risks of AI development and do not trust organizations to govern themselves effectively. Defining standards is key to effective governance and trust. Current AI development guidelines usually agree on some general principles but disagree over the details of what should be done in practice. For example, transparency is important, but what does it look like, and how is it achieved? Is it through open data, open code, explainable predictions…?
To lay out effective governance mechanisms, you first need to define what is important when developing and evaluating a system. The Centre for the Fourth Industrial Revolution at the World Economic Forum put together a set of principles for developing chatbots for health, which provides an excellent example of a well-defined and comprehensive set of principles (summarized below):
- Safety: The device should not cause harm to patients
- Efficacy: The device should be tailored to users and provide a proven benefit
- Data protection: Data should be collected with consent, safeguarded, and disposed of properly
- Human agency: The device allows for oversight, and freedom of choice by patient and practitioner
- Accountability: Device behavior should be auditable and some entity should be responsible for the algorithm’s behaviour
- Transparency: Humans must always be aware if they are interacting with an AI, and its limitations should be made clear
- Fairness: Training data should be representative of the population, and device behavior should not be prejudiced against any group
- Explainability: Decisions must be explained in an understandable way to intended users
- Integrity: Decisions should be limited to those based on reliable, high-quality evidence/data, ethically sourced data, and data collected for a clearly defined purpose
- Inclusiveness: The device should be accessible to all intended users, with particular consideration of excluded/vulnerable groups
Principles require actions in order to be implemented. Going beyond the set of principles laid out by RESET, the guidelines then outline a set of actions around each principle.
The actions are broken down by:
- Which principle is being considered
- Who is responsible (developers, providers, and regulators)
- What phase they should be done in (development, deployment, and scaling)
- What type of device is being developed (stratified by the potential risk of mistakes)
- Whether they are optional, suggested, or required
Different tools can be used to ensure AI is developed responsibly, each targeting different stakeholders and carrying a different level of power, from norms to laws. The approaches below are an adaptation and expansion of the mechanisms described in this paper:
These describe mechanisms and practices adopted by the organizations that develop or provide AI as a medical device. They focus on approaches that try to influence people’s behavior, by centering values, incentives, and accountability in the development and deployment processes.
- Guides to good practice: produced by developers, providers, or regulators to establish actions developers can take to build trustworthy AI. Often, however, these guides remain focussed on high-level values and are open to interpretation in terms of execution
- Algorithmic risk/impact assessments: these are designed to assess the possible societal impacts of an algorithmic system before or after the system is in use
- Third-party auditing: a structured process by which an organization’s behavior is assessed for consistency with expected or required behavior in that industry. This can also involve an objective assessment of the AI’s performance against standard metrics
- Red-teaming: often performed by dedicated “red teams” that make an effort to find flaws and vulnerabilities in a plan, organization, or system. In the case of medical AI, this could involve adversarial attacks, or exploring case studies that may reveal bias
- Bias and safety bounties: these give people outside the organization a method and incentives for raising concerns about specific AI systems in a formalized way
- Sharing of AI incidents: currently, this is seen mainly in the form of investigative journalism but could be practiced more widely by developers and providers themselves, to improve societal understanding of how AI can behave in unexpected or undesired ways
These describe specific efforts to hard-code good practices into AI development.
- Audit trails: creating a traceable log of steps taken in the development and testing of an AI system operation, or in its behavior after deployment
- Interpretability: engineering an explanation of the AI’s decision-making process into the reporting process, which aids in understanding and scrutiny of the AI system’s characteristics
- Privacy-preserving ML: software features that protect the security of input data, model output, and the model itself
These describe physical computing resources and their properties.
- Secure hardware: hardware that, through its design, can provide an assurance of data security and privacy
- High-precision compute measurement: tracking power usage during the development and deployment process to improve the value and comparability of claims about the impacts of AI on the environment and sustainability
- Compute support for academia: providing widely available, powerful computing resources to improve the ability of academics to evaluate claims about large-scale AI systems from big companies
- Do we need to be more deliberately pursuing AI literacy? With whom? (regulators, decision-makers, the public)
- What legal powers may be missing to enable regulatory inspection of algorithmic systems?
- How do we include marginalized and minority groups who may be most impacted by negative effects in the conversation?
A guide to good practice for digital and data-driven health technologies
Across the country and around the globe, digital innovators are helping us deliver our commitment to the digital…
Evaluating digital health products
It's important to conduct evaluations for all digital health products. Evaluation can: help you to demonstrate your…
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale…