Machine Learning Accurately Predicts Liver Cancer Risk with Routine Tests

by Grace Chen

A new machine learning model shows promise in predicting an individual’s risk of developing hepatocellular carcinoma (HCC), the most common form of liver cancer, with a high degree of accuracy. The model, detailed recently in the journal Cancer Discovery, analyzes readily available patient data – demographics, electronic health records, and routine blood test results – to identify those at increased risk, potentially enabling earlier detection and improved outcomes.

Currently, liver cancer screening guidelines primarily focus on individuals with confirmed cirrhosis or severe liver disease. However, many cases occur in people without these diagnosed conditions, leaving a significant portion of the at-risk population unscreened. This new tool aims to broaden that net, identifying individuals who might benefit from earlier intervention. The ability to predict risk using easily accessible data is particularly significant, as it could be implemented in a variety of healthcare settings, even those with limited resources.

The research, led by Carolin Schneider, MD, an assistant professor at RWTH Aachen University in Germany, and Jakob Kather, MD, MSc, a professor of clinical artificial intelligence at the Technical University of Dresden, Germany, utilized data from the UK Biobank, a large-scale biomedical database and research resource. The UK Biobank contains health information on over 500,000 individuals in the United Kingdom, including 538 cases of HCC. Notably, a substantial portion – 69% – of these HCC cases were identified in patients *without* prior diagnoses of cirrhosis, viral hepatitis, or other chronic liver diseases, highlighting the potential of the model to identify previously undetected risk.

Identifying Risk Factors Beyond Traditional Guidelines

Liver cancer risk is multifaceted, extending beyond established factors like cirrhosis and hepatitis. Being male, a history of smoking, and heavy alcohol consumption are all known contributors, according to Jan Clusmann, MD, the first author of the study and a clinician-scientist at the Technical University of Dresden. “With so many factors impacting risk, there is an urgent need for effective tools to help clinicians identify high-risk patients,” he explained. Machine learning, with its ability to process and analyze complex datasets, offers a potential solution to this challenge.

The researchers trained their model using a “random forest architecture,” a technique that combines hundreds of decision trees to arrive at a prediction. Each “tree” assesses a series of variables from patient data, and the final risk assessment is based on the collective results. This approach enhances the model’s robustness and interpretability. The team developed separate models for different data types – demographics, electronic health records, blood tests, genomics, and metabolomics – and then combined them in stages, assessing performance at each step.

Model Performance and Comparison to Existing Tools

The most accurate model, designated “Model C,” combined demographic information, electronic health record data, and routine blood test results, achieving an area under the receiver operating characteristic (AUROC) curve of 0.88. The AUROC score measures the model’s ability to distinguish between individuals with and without HCC, with a score of 1 representing perfect accuracy. Importantly, adding more complex and expensive data like genomics and metabolomics did not significantly improve the model’s performance, suggesting that readily available clinical data is sufficient for accurate risk prediction.

To benchmark their model, the researchers compared its performance to existing liver cancer risk assessment tools, including the FIB-4, APRI, and NFS scores (used to assess liver fibrosis) and the aMAP score. They found that their model consistently outperformed these existing methods, identifying more true cases of HCC while minimizing false positives. Further refinement, known as an “ablation experiment,” reduced the number of clinical features used to just 15, yet the simplified model still demonstrated superior performance compared to the established scores.

Generalizability and Future Implications

The study also addressed the critical issue of generalizability. The model was initially trained on data primarily from individuals of white European ancestry in the UK Biobank. To assess its applicability to more diverse populations, the researchers validated it using data from the All of Us research program in the United States, which includes substantial representation from historically underrepresented groups. The model maintained robust performance even within the non-white subgroup of the All of Us cohort, suggesting broad applicability across different ethnicities.

“Our study highlights the potential of a simple, easily utilized machine learning model to improve risk stratification for HCC using only routinely collected clinical data,” said Dr. Schneider. “If validated in additional populations, our model would enable primary care physicians to efficiently identify at-risk patients and refer them to liver cancer screening. This could enable earlier detection and improved outcomes for patients with this aggressive disease.”

The researchers acknowledge limitations, including the retrospective nature of the study and the relatively low proportion of patients with viral hepatitis in the training data. Further prospective validation studies are needed to confirm the model’s performance in diverse clinical settings and to refine its accuracy. However, the findings represent a significant step forward in the development of more effective strategies for liver cancer prevention and early detection.

Disclaimer: This article provides information for general knowledge and informational purposes only, and does not constitute medical advice. It is essential to consult with a qualified healthcare professional for any health concerns or before making any decisions related to your health or treatment.

The next step for this research involves prospective validation studies in larger, more diverse patient populations. Researchers will continue to refine the model and explore its potential integration into clinical workflows. Readers interested in learning more about liver cancer and risk factors can visit the American Cancer Society website. Share this article to help raise awareness about the potential of machine learning in improving liver cancer outcomes.

You may also like

Leave a Comment