How should artificial intelligence be used in Australian health care? Recommendations from a citizens’ jury

To support a diverse sample of Australians to make recommendations about the use of artificial intelligence (AI) technology in health care.


I
n January 2024, the Australian government published its interim response to a consultation on "safe and responsible" artificial intelligence (AI) in Australia. 1 The consultation had the aim of determining how to govern this transformational technology in a manner that preserves public trust, mitigates risk, and supports safe and responsible practices.In clinical care, AI could bring great benefits and serious risks. 2 Australia currently lags behind other countries in health care AI development, deployment, and governance, 3 and health carespecific strategies are needed, 4,5 as recognised by the Australian Medical Association. 6vernance of rapidly emerging health technologies such as AI is at a crossroads. 7Traditional governance is slow; the speed and global diffusion of technological development are continuously increasing.Traditional governance paradigms focus on individual risk, but novel technologies can pose significant societal risks (eg, exacerbating inequality, workforce disruption).Traditional governance strategies exclude many of the people affected, including technology users and communities. 7New approaches are needed to complement existing governance strategies.
Deliberative democratic methods, such as citizens' juries, enable community members to influence health policy making. 8These robust methods share certain characteristics: participants are selected to reflect population diversity; they are asked to make recommendations regarding a specific question; they are provided high quality relevant information and have extensive opportunities to ask questions; and they work together to reach recommendations that take trade-offs between competing advantages, disadvantages, and values into consideration. 8til 2023, no deliberative process with national representation had considered how AI should be used in health care.We therefore convened a national citizens' jury to discuss the use of artificial intelligence in Australian health care.

Methods
We convened a national citizens' jury to discuss the question: "Under what circumstances, if any, should artificial intelligence be used in Australian health systems to detect or diagnose disease?" (Supporting Information, part 1).The aim of deliberative democratic methods, developed in political science and government, is to enhance democracy by involving communities in developing the laws or policies that affect them.Deliberative recruitment and sampling methods have a political rather than an epidemiological logic; the aims are to provide all members of a community equal opportunity to participate and to reflect community diversity.These aims are typically achieved by random ballot invitation followed by stratified selection according to demographic criteria to select a mini-public, or diverse small group, that is asked to make decisions on behalf of the broader public (Supporting Information, part 2).

Juror recruitment
The independent, not-for-profit, deliberative democracy recruitment agency, Sortition Foundation (https:// www.sorti tionf ounda tion.org), recruited thirty Australian residents for this jury.To ensure that each Australian resident had an equal chance of being invited, Sortition Foundation mailed invitations to 6000 households randomly selected from the Australia Post database in February 2023.The invitation described the topic with a brief explanatory background, details about what would be required of participants, information about the nature of community juries, and a detailed participant information statement (Supporting Information, part 3).The number of invitations sent to each state and territory was proportional to its population size.One adult (18 years or older) from each invited household was eligible for participation.
People with direct involvement in AI development or implementation, in clinical occupations, or unable to speak English in a group were excluded from selection.From 109 unique eligible respondents (response rate, 1.8%), Sortition Foundation used an algorithm 9 for the stratified random selection of 31 participants according to gender, age, ancestry, highest level of education, location of residence (state/territory; urban, regional, rural).After selection, two jurors opted to not participate; one replacement person was invited, for a total of 30 jurors.
Each juror received $1015 as compensation for their participation, and we booked and paid for travel, accommodation, and all meals for the face-to-face meeting.Extensive efforts were made to enable participation, including lending computer devices, Zoom training, assisting with logistics, and providing funding for special travel needs.

Jury planning and procedure
The entire jury process took 18 days (16 March -2 April 2023): fifteen days online and three days face-to-face in Sydney (Box 1). 10 We shared video and documents via the secure VisionsLive bulletin board platform (https:// visio nslive.com/ onlin e-bulle tin-boards), and jurors interacted via message boards.Synchronous online discussions were undertaken via Zoom.Facilitation was led by author SMC (an experienced deliberation facilitator); CD (experienced in deliberation) and LC, YSJA, EF, and BF (qualitative researchers with deliberation knowledge) also acted as facilitators.

Research
The procedure followed six core steps for deliberative processes: understanding purpose, relationship building, skill development, information inputs, group dialogue and deliberation, group decision making, and closing. 11Some activities focused on process, such as structured greeting or reflection activities, learning about cognitive bias, and how to ask critical questions. 11Plenary and small group activities alternated with one another; small groups were frequently randomly re-allocated for cross-fertilisation of perspectives.
Each juror spent at least six hours on jury-related activity across the fifteen-day online period; most contributed much more than six hours.Online activity included watching information videos, asking the experts questions, receiving answers, and interacting with other jurors in three 90-minute online meetings and on the dedicated private bulletin board.Materials generated for and by the process are available online 10 and in the Supporting Information, parts 4 to 8, including a participant booklet sent to participants before the jury process (background information and four diagnostic or screening case studies), and four 10-15-minute online video presentations by four content experts (authors FM, KJLB, IAS, WAR), the drafts of each of which were reviewed by the three other content experts (Box 1).All jurors watched all four videos.Questions for the experts were developed by jurors online and answered by the experts online.After ten days, jurors identified remaining knowledge gaps, and the research team located appropriate resources for closing them (eg, systematic reviews, websites).
During the three-day face-to-face meeting, jurors met for about 18 hours in total (Box 1).Observers from several organisations with a professional interest in AI or consumer engagement were present for the three-day face-to-face meeting; a formal protocol and agreement minimised their influence on deliberations.
During face-to-face small group discussions, jurors recorded their deliberations in templates.The four speakers directly answered final questions at the end of the first face-to-face day.
On the second day, a world café-style session 12 helped jurors discuss and record their insights about the benefits, harms, and bias and fairness of AI in health care. 13Jurors then developed a list of questions that might require recommendations, which the research team sorted into draft categories; the entire jury finalised the category list together.
The jury then drafted recommendations in their own words in each of the revised categories, working in self-selected working groups (four to seven people) and drawing on written records of their earlier discussions.All jurors provided input through iterative cycles of plenary feedback, re-drafting, and voting.
A recommendation was included in the report if at least 24 jurors supported it.A subgroup of jurors presented the final recommendations to the observers and experts in a closing ceremony.

Analysis
Recommendations and reasons were transcribed and are reported as supplied by the jury; we have added minor edits in square brackets to ease reading.

Results
The demographic characteristics of the jury were similar to those of the Australian population (Box 2).Two jurors participated online but could not attend the face-to-face meetings because of acute illness; 28 jurors participated in the final deliberations.
Jurors hoped that AI might make health care more efficient, improve systems performance and outcomes and therefore The jury understood that health care AI could cause harm, but was not prohibitionist, instead asserting the right of all Australians to access to AI (recommendation 4) and proposed conditions for its legitimate use, including the need to balance harms and benefits (recommendation 3) (Box 4).
Each recommendation achieved support from at least 24 jurors; all but recommendations 4 and 11 achieved unanimous support.Two jurors expressed concern about extending rights beyond Australian citizens and residents (recommendation 4); one juror objected to making heterogenous datasets mandatory (recommendation 11) because specialised datasets could be appropriate for people from minority groups.This latter disagreement reflected a shared commitment to promoting health equity, but different views on how it should be achieved.

Discussion
We report the first nationally representative deliberative democratic process for developing general recommendations about the use of AI in health care.The recommendations provide decision makers a clear indication of the values and priorities of a well informed and diverse Australian mini-public.Our study illustrates the feasibility of robust public engagement and deliberation for guiding AI development and implementation.
Health care decision makers and clinicians should pay attention to the core features of the recommendations and reasons advanced for them, particularly the two most frequent concerns: evaluation, integrity and transparency; and fairness.Jurors called for mandatory reporting of unfavourable outcomes, performance, misuse, and benefits, robust data and evidence, and ongoing evaluation to guarantee safety, effectiveness, appropriate scope of application, and training data selection, and to ensure that benefits outweigh harms and health system performance is preserved (recommendations 3, 6-10, 12-14).
Jurors emphasised that all people, including people from minority backgrounds, should benefit from AI, that exacerbation of inequity be avoided, diverse values be respected, and training data be representative (recommendations 1-5, 11, 13, 15).
Five further principles informed several recommendations: making decision makers accountable for the performance of AI health care systems (recommendations 7-9, 15); supporting community understanding of and involvement in the governance of health care AI (2, 12, 15); preserving choice, rights and autonomy in health care systems (3-5); managing conflicts of interest and ensuring independence in health care AI governance and implementation (2, 12); and support and training for clinicians in the use of AI (3, 6).
The few previous studies similar to ours were all undertaken in the United Kingdom.In 2019, two five-day, 18-person citizens' juries in Manchester and Coventry discussed the question, "Should AI systems give an explanation, even if that results in less accurate decisions?";jurors expressed a preference for accuracy only in health scenarios. 19In 2018, a four-day, 29-person citizens' jury from England and Wales deliberated the question, "Under what conditions, if any, is it appropriate to use automated decision systems?"; 20 in 2020, a 50-person Citizens' Biometrics Council from Bristol and Manchester discussed (for 60 hours over nine months) "What is or isn't OK when it comes to the use of biometric technologies?" 21Jurors in the latter two discussions emphasised the need to avert bias, and called for robust frameworks for responsibility, oversight, and accountability, independent evaluation, monitoring and auditing, and consent (eg, the option of declining the use of biometric technologies). 20,21Although these processes were not focused on health, their recommendations resonate with those of our jury.
The most fundamental recommendation in our study was the call for a health AI charter and an independent decisionmaking body.This is more ambitious than a framework or code of conduct, and would provide AI-specific oversight in health.There are other examples of AI-specific legislation or regulation, most notably the European Union AI Act. 22mplementing this recommendation would require identifying potential system barriers and developing an operational plan and supportive policy.Some elements recommended by the jury (eg, evaluation of training data) are currently undertaken within the "software as a medical device" approach to AI regulation of the Therapeutic Goods Administration. 23owever other elements, such as examining the effects of AI systems on patient outcomes, clinicians, and health systems, should be incorporated into health care quality and safety processes and governance processes. 2r jury proposed responsibilities for people across the health care system, including: • individual clinicians: understanding and evaluating AI as used in health care, including its shortcomings, and ensuring that training data are relevant to local people; • clinical training and accreditation bodies: ensuring that clinicians are knowledgeable about the use and limits of AI systems; • patients' representatives: advocating patients' rights, the provision of quality information to patients, and standards for AI use, as well as holding decision makers to account; • health care organisations and service providers: auditing AI systems for integrity, performance, and bias in local populations before procurement, managing conflicts of interest, considering the use of open source software, ensuring the ongoing monitoring of overall health system performance; • researchers and evaluators: auditing datasets for representativeness, rigorously and independently evaluating AI system performance in clinical care, and embedding ongoing monitoring and feedback; and • health departments and agencies: building public understanding of health care AI and incorporating public voices into decision making about AI in health care.
The jurors emphasised collective concerns related to system integrity, fairness, accountability, and community involvement, reinforcing the need for governance that considers societal aspects beyond risks to individuals. 7They also emphasised rigorous evaluation and fairness, aspects that may be neglected by commercial producers of health care AI.Reported breakthroughs in health care machine learning have often not been supported by more methodologically rigorous scrutiny, 19 and evaluations of health care AI have often focused on overall accuracy rather than bias or fairness. 13The jury's recommendations suggest that a well informed public might reject these approaches as unjustifiable.

Limitations
Best practice methods were applied to recruiting and selecting jurors (invitation by random ballot, stratified sampling according to selected population demographic characteristics).However, as deliberative democratic processes require substantial interest and commitment from participants, selection bias was

Category/Recommendations Reasons
Technical governance and standards 9. Upon submission to the regulator, an AI system must provide information on its intended purpose and efficacy, its training dataset, flaws and limitations of use.
• To make clear to all involved on what AI does and doesn't do.
10.For AI systems to be approved in Australia, they must perform equal to or better than current standard health care practice.
• To ensure accuracy and specificity of the detection performance of an AI system.Any approved system needs to meet a high standard and threshold.• This provides measurable standards which can be applied across all future and proposed AI systems entering clinical settings.This standard must be maintained and re-evaluated at regular intervals once in use.

Data governance and use
11.It is important that AI training datasets must strive to be adequately representative and inclusive to capture Australia's multiculturalism and diversity.
• Australia has a wide variety of cultural, gender, and ethnic groups.The representation of these groups should be captured in these datasets to train and set up parameters of an AI.
Open source software 12. Encourage and consider having AI software in health be free and open source software to ensure transparency, public ownership, financial integrity, collaboration, security, privacy and trust.
• Transparency and quality control: The technology should be transparent in its inner workings, flaws and limitations, changes over time now and in the future.
• Public ownership/intellectual property: The technology should be owned by the public, not private companies.We should avoid creating and supporting monopolies.
• Financial integrity: The technology should avoid relying too much on companies to maintain financial integrity, to avoid being dictated by financial motivations.
• Collaboration: The technology should benefit from the improvements, reduced cost and reduction of bias that collaboration can provide.• Security, privacy and trust: free and open source software is known to be highly secure when implemented properly and helps privacy/trust.

Evaluation and assessment
13.We recommend that research used to underpin the use of AI in health care must be peer-assessed in an unbiased, independent, and robust manner.Australian data, with a sample representative of the population, should be used, but overseas data can be used when justified.
• To confirm [or] verify developers' claims of AI system performance.
• To maintain a standard of quality for healthcare in Australia 14. Research assessing the performance of AI screening tools should reflect real world clinical practice and follow standardised procedures in trial design.Data analysis and reporting should be transparent, and conclusions should reflect system performance.
• The evaluation process should be transparent to ensure validity

Education and communication
15.We recommend that there is a comprehensive and fully funded community education program.This will ensure that the community is brought along with developments in and the application of AI in health.This should be located within a broader program of general digital health literacy that recognises particular community needs such as age, gender, ethnicity etc.
• To ensure the community is informed and educated on current AI developments.Also, children are exposed to AI and digital health through school-based learning programs.This will aid understanding of future development of AI in health, ensuring greater participation.• Community education and awareness raising will ensure community can hold authorities such as regulators to account, as they will be knowledgeable about raising these concerns and reporting instances of non-compliance.• A community that is educated about AI in health might have less fear and be able represent their individuals more effectively.
* Recommendations and reasons were transcribed and are reported as supplied by the jury; we have added minor edits in square brackets to ease reading.◆

Research
inevitable; people who agreed to participate may have been more civic-minded and interested in the discission topic than Australians in general.However, all jurors actively participated, and the diversity of views expressed reflected the diversity of the jury.Our jury size was adequate for effective deliberation; 20 to 50 people is typical for this type of study. 24While larger juries can seem more representative, they require more resources, individual jurors may be less active because they assume others will represent their views, blocs can form, and effectively including everyone in deliberations becomes more difficult 20 (Supporting Information, part 2).
The focus of the study question was screening and diagnosis, but the jurors expressed final recommendations regarding AI in health care generally, although the evidence they were provided was more limited.Jurors considered four case studies about how AI might be used in medical practice; their judgements may have differed had they been presented different cases.The jurors' recommendations are reported verbatim, and reflect the limited time available to prepare their wording.

Conclusion
A nationally representative citizens' jury can express informed community views about how AI in health care should be developed, used, and governed.Few deliberative democratic processes have considered such questions, but these methods could guide clinicians, policy makers, AI researchers and developers, and health service users to develop approaches that can support the trustworthiness of this technology.

1 "Under what circumstances, if any, should artificial intelligence be used in Australian health systems to detect or diagnose disease?": jury schedule Time Activities Core steps Week 1
Group dialogue and deliberation, group decision making, presentation, closing This project was approved by the University of Wollongong Health and Medical Human Research Ethics Committee (2022/314).
14ta recorded in templates during the world café conversations were transcribed into Excel by author LC; SMC and YSJA applied inductive qualitative analysis to independently develop, name, and apportion data to clusters of the jurors' main concerns, resolving differences via consensus.Our report complies with CJCheck guidelines.14Ethicsapproval*Population data is for sex.† Population data proportions are for people aged 18 years or older.‡ Census has "Australian" as a response option (30% of respondents); we assumed that this category included people with British or Irish ancestry and multiple ancestry.◆

3 Summary conversations about benefits, harms, and fairness of AI in health care that underpinned recommendation development Theme cluster Juror concerns How important are the potential benefits of using AI for screening and diagnosis? What benefits are most important? Why are those benefits important?
Cluster 1: Increased access, greater productivity, and reduced costs Greater productivity through streamlined workflows and automation.Reduced pressure on health services, better allocation of clinician time for delivering higher quality care, increased access to care, including in rural communities, after infrastructure is established, reduced costs, less invasive testing, enabling more testing, easier access to necessary tests.Cluster 2: Improved clinician performance and care outcomes, increasing confidence in health care More timely and accurate diagnosis and better prevention and cure by AI-enabled systems.AI could mitigate human bias.Improved clinician performance would improve patient care and caregiver experience, build confidence, and support greater trust in health care and in AI itself.Data-rich health services could promote a culture of data sharing, support information sharing and knowledge, identify new causes of disease, and better direct resources and research.

important are the potential harms or dangers of using AI for screening and diagnosis? What harms are most important? Why are those harms important? Cluster
4: Alienation, dehumanisation, and distrust Reduced human contact and empathy and inability to replicate complex human responses in health care, seeding patient distrust in health care.Population distrust of AI systems, reduced confidence, effect on doctor-patient relationships and flow-on effects of distrust on others.Patients may miss out on beneficial AI-supported health care because of mistrust.Cluster 5: Governance, commercial, and systems risks Lack of transparency and review, commercial ownership restricting access to information and reducing public control, unclear lines of responsibility, greater dependence on data accuracy, insurance risks (eg, premium increases), increased costs, more brittle health systems, broader social harms.Cluster 6: Human costs of poor AI performance AI errors resulting in psychological and physical harm to patients because of deficiencies in training data, failure to communicate decisions probabilistically, and false screening results (eg, false positive results leading to unnecessary alerts or recalls).

How can we respond to the potential for bias or unfair outcomes from AI for screening and diagnosis? What principles should guide our responses?
Cluster 14: Data quality and managementManaging the quality of data used, maintaining data sources and ensuring data appropriate for the question being askedCluster 15: Principles, solutions and need for guidance Other possible actions/principles in response to bias.Dominated by the need for strong and proactive governance (prior to implementation) and accountability.Other principles included the need for AI-supported systems to perform at least as well as humans do now, effective advocacy and inclusion of patient perspectives, complete separation from the insurance industry, safeguards against commercial in-confidence algorithmic systems, and ensuring that misuse of private data is prosecuted.