Researchers from Harvard Medical School, the University of Copenhagen, VA Boston Healthcare System, Dana-Farber Cancer Institute, and the Harvard T.H. Chan School of Public Health have made a breakthrough in pancreatic cancer detection. Their study, published in Nature Medicine, reveals that an artificial intelligence (AI) tool can identify individuals at the highest risk of pancreatic cancer up to three years before diagnosis, using only their medical records. This development could potentially revolutionize population screening for pancreatic cancer, which currently lacks effective methods for early detection.
Pancreatic cancer is a particularly deadly form of cancer, and its incidence is expected to rise. While individuals with a family history or specific genetic mutations are currently screened for pancreatic cancer, this targeted approach may miss other cases that do not fall into these categories.
The AI tool developed by the researchers aims to address this gap. By analyzing medical records, it can identify individuals who are at a significantly higher risk of developing pancreatic cancer. This information can guide clinicians in determining who should undergo further testing, including potentially invasive and expensive procedures. The tool has the potential to improve clinical decision-making and expedite the detection of pancreatic cancer, allowing for earlier treatment and better patient outcomes.
If implemented on a larger scale, this AI-based approach could have a significant impact on the detection and treatment of pancreatic cancer, ultimately extending patients’ lifespans. The study highlights the potential of AI in population-based screening and emphasizes the importance of identifying high-risk individuals who would benefit most from additional testing.
“Patients, families, and the healthcare system as a whole bear a significant burden from various types of cancer, particularly those that are difficult to detect and treat in the early stages,” stated Søren Brunak, co-senior investigator of the study and director of research at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen. “AI-based screening presents an opportunity to change the course of pancreatic cancer, an aggressive disease known for its challenges in early diagnosis and prompt treatment when chances of success are highest.”
In this recent study, the AI algorithm underwent training using two distinct datasets comprising a total of 9 million patient records from Denmark and the United States. The researchers tasked the AI model with identifying subtle indicators based on the information contained in these records. By analyzing combinations of disease codes and their timing, the model successfully predicted which patients were likely to develop pancreatic cancer in the future. Notably, many of the symptoms and disease codes analyzed were not directly linked to the pancreas.
To assess the AI models’ efficacy in detecting individuals at high risk of developing the disease, the researchers tested different versions of the algorithm across various time frames: 6 months, one year, two years, and three years.
Overall, each iteration of the AI algorithm outperformed current population-wide estimates of pancreatic cancer incidence, which denotes the frequency of the condition’s development in a given population over a specific timeframe. The researchers expressed confidence that the model’s predictive accuracy is on par with, if not superior to, existing genetic sequencing tests, which are typically accessible only to a small subset of patients in available datasets.
The ‘angry organ
Unlike certain common cancers such as breast, cervical, and prostate cancers that benefit from straightforward and effective screening techniques like mammograms, Pap smears, and blood tests, pancreatic cancer poses challenges in terms of screening and testing. Currently, physicians primarily rely on family history and genetic mutations as indicators of future risk, but these methods often miss a significant number of patients. The AI tool developed in this study has a key advantage—it can be applied to any patient with accessible health records and medical history, regardless of their known family history or genetic predisposition.
This is particularly important because many individuals at high risk for pancreatic cancer may not be aware of their genetic predisposition or family history. Without symptoms or clear indications of being at elevated risk, clinicians may hesitate to recommend more advanced and expensive tests like CT scans, MRI, or endoscopic ultrasound. These tests often require invasive procedures to obtain a biopsy, and the pancreas, being deep inside the abdomen, is a challenging organ to access and can easily become inflamed, earning it the nickname “the angry organ.”
By identifying individuals at the highest risk for pancreatic cancer, an AI tool would enable clinicians to target the appropriate population for testing while sparing others from unnecessary procedures and tests. Currently, only a small percentage of pancreatic cancer cases are diagnosed in the early stages, leading to low survival rates. Approximately 44 percent of individuals diagnosed early survive five years, while the rate drops to 2 to 9 percent for those with advanced-stage tumors that have spread beyond their origin site.
Despite advancements in surgical techniques, chemotherapy, and immunotherapy, the survival rate for pancreatic cancer remains low. Therefore, in addition to sophisticated treatments, there is a critical need for improved screening, more precise testing, and early diagnosis. The AI-based approach presented in this study serves as a vital initial step in addressing these needs and improving outcomes for pancreatic cancer patients.
Previous diagnoses portend future risk
The researchers conducted the current study by creating multiple versions of the AI model and training them on the health records of 6.2 million patients from Denmark’s national health system. These records spanned a period of 41 years, and among the patients, 23,985 eventually developed pancreatic cancer. During the training process, the algorithm identified patterns that indicated future risk of pancreatic cancer based on the progression of diseases and conditions over time.
For example, the presence of conditions like gallstones, anemia, type 2 diabetes, and other gastrointestinal issues suggested a higher risk of developing pancreatic cancer within three years. Additionally, inflammation of the pancreas was a strong predictor of future pancreatic cancer within a shorter timeframe of two years.
The researchers emphasize that none of these individual diagnoses should be considered definitive or causative of pancreatic cancer on their own. However, the sequence and pattern in which they occur over time can provide valuable clues for an AI-based surveillance model, enabling physicians to monitor high-risk individuals more closely or conduct appropriate testing.
To validate the AI model’s performance, the researchers tested it on a completely new set of patient records from the U.S. Veterans Health Administration. This dataset comprised nearly 3 million records spanning 21 years and included 3,864 individuals diagnosed with pancreatic cancer. The tool demonstrated slightly lower predictive accuracy when applied to the U.S. dataset.
This discrepancy can be attributed to the shorter duration of data collection and the demographic differences between the two datasets. The Danish dataset represented the entire population of Denmark, while the U.S. dataset focused on current and former military personnel within the Veterans’ Affairs system. However, when the algorithm was trained again from scratch using the U.S. dataset, its predictive accuracy improved.
This highlights two key factors: the importance of training AI models with high-quality and comprehensive data, and the necessity of accessing large representative datasets of clinical records at a national and international level. To ensure globally applicable models, AI algorithms should be trained on local health data to capture the specific characteristics of different populations.
Source: Harvard Medical School