ChatGPT Gives Incorrect Answers About Eosinophilic Esophagitis

Carolyn Crist

November 16, 2023

ChatGPT, an artificial intelligence (AI) tool, appears to provide a mix of accurate and inaccurate information to common questions about eosinophilic esophagitis (EoE), according to a new study.

In particular, the chatbot reported an incorrect association between EoE and cancer. In addition, the automated responses had low readability and high complexity, which could pose a health literacy barrier for users.

"These findings have implications for both clinicians and patients, as this technology is not adequately trained to answer clinical questions about EoE, and use of ChatGPT in clinical practice or patient education could result in misinformation," Corey Ketchem, MD, a gastroenterology fellow at the University of Pennsylvania Perelman School of Medicine, Philadelphia, and colleagues write.

"We currently cannot rely on AI chatbots to educate patients or providers, with limitations remaining about its current clinical applicability," they write.

The study was published online in Clinical Gastroenterology and Hepatology.

Testing ChatGPT on EoE Questions

Since ChatGPT’s release in November 2022, researchers have expressed interest in potential AI applications in medicine, including colonoscopy preparation, gastroesophageal reflux disease management, and gastrointestinal knowledge assessment.

In the same vein, Ketchem and colleagues evaluated the ChatGPT tool and its responses to EoE-related questions from the patient perspective. The researchers developed 40 common questions about EoE in three categories: general topics, complications, and therapeutics.

Because ChatGPT is marketed as a conversational chatbot that uses context, the research team posed questions in two ways — individually, with a new session for each question, and sequentially, with all questions asked in a single session. The researchers also asked for references and evaluated readability with the Flesch-Kincaid reading ease and grade level scores.

Based on known EoE literature and guidelines, the research team scored the responses on a 4-point scale for scientific correctness and patient educational value. From low to high, they scored a 0 for completely incorrect or inadequate responses, 1 for mixed correct and incorrect responses, 2 for correct but inadequate responses, and 3 for correct and comprehensive responses. A comprehensive response would equal a reply by an experienced esophagologist.

Use With Caution

Among the individual responses, 54% received a low score of 1 for scientific correctness, and 44% received a 1 for educational value. Similarly, among the sequential responses, 49% received a 1 for scientific correctness, and 40% received a 1 for educational value.

Interestingly, the individual responses related to EoE general topics had the most correct and comprehensive scores, with 25% of chatbot answers receiving the top score of 3 for scientific correctness and 38% receiving a 3 for educational value. Among the sequential responses, the complications category had the highest percentage of 3 scores — with 29% for scientific correctness and 57% for educational value.

However, ChatGPT’s responses to a question about whether EoE causes cancer falsely suggested an increased risk of esophageal adenocarcinoma, the authors write. In addition, when asked whether someone could die from EoE, ChatGPT suggested a correlation between EoE and Barrett’s esophagus.

Beyond that, the research team evaluated the references and found that all responses had at least one incorrect reference, by author, title, number identifiers, or inactive links.

The scores for reading ease and grade level also suggested high complexity — with readers needing some level of post-high school education to understand the responses.

For now, the authors conclude, ChatGPT requires clinical oversight in the case of EoE.

"Since ChatGPT utilizes existing text to formulate its responses, it is possible that rare diseases or those with less robust literature are more subject to its previously observed limitations," they write. "Additionally, ChatGPT’s application to medical education has shown shortcomings and in conjunction with our observed scientific inaccuracies, this tool should be used with caution by physicians when informing care of EoE patients."

The study received no specific funding, but several authors are supported by National Institutes of Health grants. The authors report no relevant financial relationships.

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.

processing....