Is It Possible to Predict the Risk for Pancreatic Cancer?

Between 1990 and 2018, the incidence rate of pancreatic cancer increased in France by 2.7% per year on average for men and, more significantly, by 3.8% annually for women. Its prognosis remains very poor, with a net 5-year survival rate of under 10%. The classic risk factors (diabetes, smoking, obesity, alcohol) are well known. Likewise, a family history of the disease increases the risk for developing pancreatic cancer ninefold. It is estimated that 90% of the predisposing genes involved in the development of pancreatic cancer have yet to be identified.

In this context, any new approach for early diagnosis of early-stage pancreatic cancer is welcome. A US-based team seeks to develop and validate a model allowing clinicians to predict the risk for pancreatic cancer.

Data and AI

Using electronic health record (EHR) data from a multi-institutional federated network combining 55 hospitals over 13 years, a team of researchers from the Massachusetts Institute of Technology in Cambridge developed a neural network (PrismNN) and logistic regression analysis (PrismLR) to predict pancreatic duct adenocarcinoma (PDAC) risk at 6-18 months before diagnosis for patients 40 years or older.

With 35,387 PDAC cases, 1,500,081 controls, and 87 features per patient, PrismNN obtained a test area under the curve (AUC) of 0.826 (PrismLR: 0.800). PrismNN's average internal-external validation AUCs were 0.740 for locations, 0.828 for races, and 0.789 for time. PrismNN sensitivity was just 35.9%. Its specificity was 95.3%.

Patients with a family history of or genetic predisposition to PDAC make up approximately 2%-10% of all cases in the literature. Identification of the genes involved in the development of cancer is expected, thanks to the development of high-throughput sequencing methods. In the meantime, we know to use MRI and endoscopic ultrasound scanning to monitor high-risk patients with precancerous lesions (intraductal papillary mucinous neoplasms, mucinous cystadenomas). However, in the general population, screening of a small de novo neoplasm (< 1 cm) is not based on any biological testing.

Prospective Validation Needed

The Prism deep learning algorithm uses 87 pieces of data from EHR comprising diagnosis, medicinal products, disparate laboratory data, and demographic data, from a population that is ethnically and geographically diverse. This model maintained its high specificity and accuracy throughout the internal-external validation.

This novel method detected 3.5-fold more patients than the current criteria used to identify patients for PDAC screening programs at similar risk levels and at 6-18 months before to onset. Nevertheless, this is a retrospective study (development and validation) in which certain ethnic groups and persons from disadvantaged socioeconomic backgrounds were underrepresented and, most of all, in which the biological parameters used were heterogeneous (from blood electrolytes to glucose levels). The study also included disparate data, with family history of the disease given without any genetic data, and there was no recognition of the respective weight of each item in the final positive prediction.

Validation in a prospective cohort is more essential than ever, as the sensitivity of this deep learning–based digital approach (around 35.9%) remains insufficient when considering individual application without an appropriate genetic biomarker. This factor makes the practical application of individual screening impossible for the time being, outside of institutions affiliated with a network using this technology. Still, the readability of the technology remains unclear unless it is known how many successive layers were used in this IT-based model.

We are, nonetheless, on the precipice of a new era in which artificial intelligence is set to transform the structure of EHRs and the integration of relevant biological and genetic data into algorithms capable of predicting the risk for certain occult cancers that remain symptomless in their early stages.

To conclude, this deep learning model has provided a specific basis for detecting individuals at high risk for PDAC, although the sensitivity is low. The 87 data items entered into the digital model are highly heterogeneous. Artificial intelligence needs to advance further to translate algorithmic advances into robust biomedical breakthroughs.

This article was translated from JIM, which is part of the Medscape professional network.

Follow Medscape on Facebook, X (formerly known as Twitter), Instagram, and YouTube

Comments

Commenting is limited to medical professionals. To comment please Log-in.

Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.