Studia Medyczne

Abstract

4/2025 vol. 41
Original paper

Predictors of distant metastasis in colorectal cancer: a multimodal statistical and machine‑learning analysis

  1. Collegium Medicum, Jan Kochanowski University, Kielce, Poland 2Meduniv Sp. z o.o., Kielce, Poland
Medical Studies 2025; 41 (4): 344–349
Online publish date: 2025/12/30
View full text
Confronting perimenopausal women’s knowledge of coronary heart disease with their health behaviours. Controversial role of hormone replacement therapy in the protection of coronary heart disease

Introduction

Distant metastasis (M1) is the principal determinant of outcome in colorectal cancer (CRC). Identifying robust molecular and clinical predictors at diagnosis may improve risk stratification.

Aim of the research

To develop and compare complementary statistical and machine‑learning models for predicting metastatic status and to synthesize convergent predictors.

Material and methods

In a single-centre cohort (N = 54, M1 = 26, M0 = 28), we modelled a binary outcome – any type of CRC metastasis (M_any) using multivariable logistic regression (LR), Random Forest (RF), LASSO-penalized logistic regression, and Support Vector Machine (SVM). In a secondary analysis, we fitted a multinomial logistic regression with three categories Early_No_Mets (pT1-2M0), Advanced_No_Mets (pT3-4M0), and Metastatic (M1). Discrimination was summarized as AUC; classification metrics used native validation schemes (LR in-sample; RF out-of-bag [OOB]; SVM 10-fold cross-validation [CV]). Reporting follows TRIPOD guidance.

Results

LR identified TP53 mutation as the strongest predictor (OR = 3.47; 95% CI: 0.92–14.5; p = 0.073; AUC = 0.713). RF achieved OOB error of 40.74% (accuracy = 59.3%); top features were NRAS, TP53_pathway, TP53, WNT_pathway, and n_genes. LASSO (10‑fold CV) retained NRAS, TP53, and age (coefficients +1.15, +0.85, −0.02). SVM yielded AUC = 0.678, accuracy = 55.6%. The multinomial model revealed complete separation for NRAS in the Advanced_No_Mets group, precluding standard MLE inference.

Conclusions

TP53 (gene/pathway) is a consistent risk signal across methods; NRAS carries high importance in ensemble/regularized models. Overall discrimination is modest, consistent with a small sample size; findings are hypothesis‑generating and warrant validation.

>
Share
without publication fees