This site is intended for healthcare professionals

Systemic Bias in AI Models May Undermine Diagnostic Accuracy

Heidi Splete


Systematically biased artificial intelligence (AI) models did not improve clinicians' accuracy in diagnosing hospitalized patients, based on data from more than 450 clinicians.

"Artificial Intelligence (AI) could support clinicians in their diagnostic decisions of hospitalized patients but could also be biased and cause potential harm," said Sarah Jabbour, MSE, a PhD candidate in computer science and engineering at the University of Michigan, Ann Arbor, in an interview.

"Regulatory guidance has suggested that the use of AI explanations could mitigate these harms, but the effectiveness of using AI explanations has not been established," she said.

To examine whether AI explanations can be effective in mitigating the potential harms of systemic bias in AI models, Jabbour and colleagues conducted a randomized clinical vignette survey study. The survey was administered between April 2022 and January 2023 across 13 states, and the study population included hospitalist physicians, nurse practitioners, and physician assistants. The results were published in JAMA.

Participants were randomized to AI predictions with AI explanations (226 clinicians) or without AI explanations (231 clinicians).

The primary outcome was diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease, defined as the number of correct diagnoses over the total number of assessments, the researchers wrote.

The clinicians viewed nine clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians viewed two vignettes with no AI model input to establish baseline diagnostic accuracy. They made three assessments in each vignette, one for each diagnosis. The order of the vignettes was two without AI predictions (to establish baseline diagnostic accuracy), six with AI predictions, and one with a clinical consultation by a hypothetical colleague. The vignettes included standard and systematically biased AI models.

The baseline diagnostic accuracy was 73% for the diagnoses of pneumonia, heart failure, and chronic obstructive pulmonary disease. Clinicians' accuracy increased by 2.9% when they viewed a standard diagnostic AI model without explanations and by 4.4% when they viewed models with AI explanations.

However, clinicians' accuracy decreased by 11.3% after viewing systematically biased AI model predictions without explanations compared with baseline, and biased AI model predictions with explanations decreased accuracy by 9.1%.

The decrease in accuracy with systematically biased AI predictions without explanations was mainly attributable to a decrease in the participants' diagnostic specificity, the researchers noted, but the addition of explanations did little to improve it, the researchers said.

Potentially Useful but Still Imperfect

The findings were limited by several factors including the use of a web-based survey, which differs from surveys in a clinical setting, the researchers wrote. Other limitations included the younger than average study population, and the focus on the clinicians making treatment decisions, vs other clinicians who might have a better understanding of the AI explanations.

"In our study, explanations were presented in a way that were considered to be obvious, where the AI model was completely focused on areas of the chest X-rays unrelated to the clinical condition," Jabbour told Medscape Medical News. "We hypothesized that if presented with such explanations, the participants in our study would notice that the model was behaving incorrectly and not rely on its predictions. This was surprisingly not the case, and the explanations when presented alongside biased AI predictions had seemingly no effect in mitigating clinicians' overreliance on biased AI," she said.

"AI is being developed at an extraordinary rate, and our study shows that it has the potential to improve clinical decision-making. At the same time, it could harm clinical decision-making when biased," Jabbour told Medscape Medical News. "We must be thoughtful about how to carefully integrate AI into clinical workflows, with the goal of improving clinical care while not introducing systematic errors or harming patients," she added.

Looking ahead, "There are several potential research areas that could be explored," said Jabbour. "Researchers should focus on careful validation of AI models to identify biased model behavior prior to deployment. AI researchers should also continue including and communicating with clinicians during the development of AI tools to better understand clinicians' needs and how they interact with AI," she said. "This is not an exhaustive list of research directions, and it will take much discussion between experts across disciplines such as AI, human computer interaction, and medicine to ultimately deploy AI safely into clinical care."

Don't Overestimate AI

"With the increasing use of artificial intelligence and machine learning in other spheres, there has been an increase in interest in exploring how they can be utilized to improve clinical outcomes," said Suman Pal, MD, assistant professor in the division of hospital medicine at the University of New Mexico, Albuquerque, in an interview. "However, concerns remain regarding the possible harms and ways to mitigate them," said Pal, who was not involved in the current study.

In the current study, "It was interesting to note that explanations did not significantly mitigate the decrease in clinician accuracy from systematically biased AI model predictions," Pal said.

"For the clinician, the findings of this study caution against overreliance on AI in clinical decision-making, especially because of the risk of exacerbating existing health disparities due to systemic inequities in existing literature," Pal told Medscape Medical News.

"Additional research is needed to explore how clinicians can be better trained in identifying both the utility and the limitations of AI and into methods of validation and continuous quality checks with integration of AI into clinical workflows," he noted.

The study was funded by the National Heart, Lung, and Blood Institute. The researchers had no financial conflicts to disclose. Pal had no financial conflicts to disclose.

Heidi Splete is a freelance medical journalist with 20 years of experience.



Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.