AI Cancer Diagnosis Shows Bias, But New Framework Offers Hope for Equitable Care
Table of Contents
- AI Cancer Diagnosis Shows Bias, But New Framework Offers Hope for Equitable Care
- The Foundation of Cancer Diagnosis: Pathology’s Role
- When AI Sees More Than Expected: Uncovering Hidden Bias
- FAIR-Path: A Framework for Fairness
- Putting Cancer AI to the Test: Identifying Performance Gaps
- Why Bias Appears in Pathology AI: Three Key Contributors
- A New Approach to Reducing Bias: The Power of FAIR-Path
A new study reveals that artificial intelligence systems used to diagnose cancer from pathology slides exhibit significant bias, with diagnostic accuracy varying across different demographic groups. Researchers have identified key reasons for this disparity and developed a novel approach, called FAIR-Path, that substantially reduces these differences, emphasizing the critical need for routine bias evaluation in medical AI to ensure fair and reliable cancer care for all.
The Foundation of Cancer Diagnosis: Pathology’s Role
For decades, pathology has been a cornerstone of cancer diagnosis and treatment. Pathologists meticulously examine extremely thin slices of human tissue under a microscope, searching for visual indicators of cancer – its presence, type, and stage. To a trained specialist, analyzing a tissue sample – often a swirling pattern of pink and purple cells – is akin to grading an anonymous exam; the slide provides crucial disease information without revealing the patient’s identity.
This assumption of objectivity is challenged by the increasing integration of artificial intelligence into pathology labs. A recent study led by researchers at Harvard Medical School demonstrates that pathology AI models can inadvertently infer demographic details directly from tissue slides. This unexpected capability introduces the potential for bias in cancer diagnosis across diverse patient populations.
After evaluating several widely used AI models designed for cancer identification, researchers found that diagnostic accuracy was not consistent across all patients. Performance varied based on self-reported race, gender, and age. The team then investigated the underlying causes of these disparities.
FAIR-Path: A Framework for Fairness
To address this critical issue, the researchers developed FAIR-Path, a framework that significantly reduced bias in the tested AI models. “Reading demographics from a pathology slide is thought of as a ‘mission impossible’ for a human pathologist, so the bias in pathology AI was a surprise to us,” explained a senior author of the study, Kun-Hsing Yu, associate professor of biomedical informatics at the Blavatnik Institute at HMS and HMS assistant professor of pathology at Brigham and Women’s Hospital.
Yu stressed that recognizing and correcting bias in medical AI is paramount, as it directly impacts diagnostic accuracy and patient outcomes. The success of FAIR-Path suggests that improving fairness in cancer pathology AI – and potentially other medical AI tools – may not necessitate extensive system overhauls. The research was published December 16 in Cell Reports Medicine.
Putting Cancer AI to the Test: Identifying Performance Gaps
Yu and his colleagues examined bias in four commonly used pathology AI models currently under development for cancer diagnosis. These deep-learning systems were trained on extensive collections of labeled pathology slides, enabling them to learn biological patterns and apply that knowledge to new samples.
The team assessed the models using a large, multi-institutional dataset encompassing pathology slides from 20 different types of cancer. Across all four models, consistent performance gaps emerged. The AI systems demonstrated lower accuracy for specific demographic groups defined by race, gender, and age. For instance, the models struggled to differentiate lung cancer subtypes in African American patients and male patients. Reduced accuracy was also observed when classifying breast cancer subtypes in younger patients, and in detecting breast, renal, thyroid, and stomach cancers in certain demographic groups. Overall, these disparities appeared in approximately 29 percent of the diagnostic tasks analyzed.
According to Yu, these errors stem from the AI systems extracting demographic information from the tissue images and then relying on patterns associated with those demographics when making diagnostic decisions. The findings were unexpected. “Because we would expect pathology evaluation to be objective,” Yu said. “When evaluating images, we don’t necessarily need to know a patient’s demographics to make a diagnosis.”
This observation prompted a crucial question: Why was pathology AI failing to uphold the same standard of objectivity?
Why Bias Appears in Pathology AI: Three Key Contributors
The research team identified three primary factors contributing to the observed bias. First, training data are often imbalanced, with tissue samples being more readily obtainable from certain demographic groups than others. This imbalance hinders the AI models’ ability to accurately diagnose cancers in underrepresented groups, including populations defined by race, age, or gender.
However, Yu noted that “the problem turned out to be much deeper than that.” In several instances, the models performed worse for specific demographic groups even when sample sizes were comparable.
Further analysis revealed differences in disease incidence. Some cancers are more prevalent in specific populations, allowing AI models to achieve higher accuracy for those groups. Consequently, the same models may struggle to diagnose cancers in populations where those diseases are less common.
The researchers also discovered that AI models can detect subtle molecular differences across demographic groups. For example, the systems might identify mutations in cancer driver genes and utilize them as shortcuts for cancer type classification – a strategy that can reduce accuracy in populations where those mutations are less frequent.
“We found that because AI is so powerful, it can differentiate many obscure biological signals that cannot be detected by standard human evaluation,” Yu explained. Over time, this can lead AI models to prioritize signals more closely linked to demographics than to the disease itself, thereby diminishing diagnostic performance across diverse patient groups.
Collectively, Yu stated, these findings demonstrate that bias in pathology AI is influenced not only by the quality and balance of training data but also by how the models are trained to interpret visual information.
A New Approach to Reducing Bias: The Power of FAIR-Path
Following the identification of the bias sources, the researchers focused on developing corrective measures. They created FAIR-Path, a framework built upon contrastive learning, a machine-learning technique. This approach modifies AI training to emphasize critical distinctions – such as differences between cancer types – while minimizing attention to less relevant differences, including demographic characteristics.
When FAIR-Path was applied to the tested models, diagnostic disparities decreased by approximately 88 percent. “We show that by making this small adjustment, the models can learn robust features that make them more generalizable and fairer across different populations,” Yu said.
This result is encouraging, he added, because it suggests that significant reductions in bias are achievable even without perfectly balanced or fully representative training datasets.
Looking ahead, Yu and his team are collaborating with institutions globally to investigate pathology AI bias in regions with varying demographics, clinical practices, and laboratory settings. They are also exploring the adaptability of FAIR-Path to situations with limited data. Another area of interest is understanding how AI-driven bias contributes to broader disparities in healthcare and patient outcomes.
Ultimately, Yu concluded, the goal is to develop pathology AI systems that assist human experts by delivering fast, accurate, and equitable diagnoses for all patients. “I think there’s hope that if we are more aware of and careful about how we design AI systems, we can build models that perform well in every population,” he said.
