Machine Learning Framework Improves High-Precision Serum Tumor Biomarker Detection

TIME:2026-06-02

Recently, a research team from Hefei Institutes of Physical Science, Chinese Academy of Sciences, in collaboration with Hefei Cancer Hospital, developed an interpretable stacked ensemble learning framework for detecting serum tumor biomarkers. By combining the framework with surface-enhanced Raman spectroscopy (SERS), the team achieved high-precision quantitative analysis of up to 12 biomarkers in serum.

The findings were published in Analytical Chemistry.

Accurate detection of serum tumor biomarkers is important for early cancer screening. However, serum is a highly complex biological system, where signals from different molecules often overlap and interfere with each other, making accurate quantification difficult. In addition, many existing machine learning methods function as “black boxes,” making their prediction process difficult to interpret in clinical applications.

To address these challenges, the team developed the framework for SERS data analysis. The framework not only predicts biomarker concentrations but also helps explain which spectral features contribute to the results. It integrates three machine learning models—support vector regression (SVR), extreme gradient boosting (XGBoost), and partial least squares regression (PLSR)—through an elastic net-based meta-model to improve prediction stability and accuracy. The researchers also introduced a LASSO-based feature selection method, reducing data dimensionality by 75.3% and improving computational efficiency.

The framework showed strong performance in quantifying 12 tumor biomarkers, including AFP, CEA, CA19-9, and CA125. All biomarkers achieved R² values above 0.9, with ferritin and SCCA reaching 0.981 and 0.988, respectively.

To improve interpretability, the researchers further applied Shapley Additive Explanations (SHAP) to connect key Raman spectral peaks with molecular vibration features. The analysis helped reveal how factors such as glycosylation, matrix interference, and spectral overlap affect prediction accuracy.  “It’s like opening the ‘black box’ of the framework, making the prediction process easier to interpret,” said Wu Boyu, a member of the team.

This study provides a general and interpretable framework for high-precision multi-biomarker detection in complex biological samples, with potential applications in early cancer screening and precision medicine, according to the team.

Article link: http://pubs.acs.org/doi/abs/10.1021/acs.analchem.5c04589

Figure 1. Workflow of SERS spectral acquisition, data processing, and ISEM-based quantitative analysis for serum tumor biomarkers.

Figure 2. Quantitative regression performance of the interpretable stacked ensemble model for 12 serum tumor biomarkers.