Medical Studies
eISSN: 2300-6722
ISSN: 1899-1874
Medical Studies/Studia Medyczne
Current issue Archive Manuscripts accepted About the journal Supplements Editorial board Abstracting and indexing Subscription Contact Instructions for authors Publication charge Ethical standards and procedures
Editorial System
Submit your Manuscript
4/2025
vol. 41
 
Share:
Share:
abstract:
Original paper

Predictors of distant metastasis in colorectal cancer: a multimodal statistical and machine‑learning analysis

Wojciech Lewitowicz
1
,
Monika Kozlowska-Geller
1
,
Monika Wawszczak-Kasza
1
,
Agnieszka Plusa
1
,
Marcin Zaremba
1
,
Karol Romaszko
1
,
Piotr Lewitowicz
1

  1. Collegium Medicum, Jan Kochanowski University, Kielce, Poland 2Meduniv Sp. z o.o., Kielce, Poland
Medical Studies 2025; 41 (4): 344–349
Online publish date: 2025/12/30
View full text Get citation
 
PlumX metrics:
Introduction
Distant metastasis (M1) is the principal determinant of outcome in colorectal cancer (CRC). Identifying robust molecular and clinical predictors at diagnosis may improve risk stratification.

Aim of the research
To develop and compare complementary statistical and machine‑learning models for predicting metastatic status and to synthesize convergent predictors.

Material and methods
In a single-centre cohort (N = 54, M1 = 26, M0 = 28), we modelled a binary outcome – any type of CRC metastasis (M_any) using multivariable logistic regression (LR), Random Forest (RF), LASSO-penalized logistic regression, and Support Vector Machine (SVM). In a secondary analysis, we fitted a multinomial logistic regression with three categories Early_No_Mets (pT1-2M0), Advanced_No_Mets (pT3-4M0), and Metastatic (M1). Discrimination was summarized as AUC; classification metrics used native validation schemes (LR in-sample; RF out-of-bag [OOB]; SVM 10-fold cross-validation [CV]). Reporting follows TRIPOD guidance.

Results
LR identified TP53 mutation as the strongest predictor (OR = 3.47; 95% CI: 0.92–14.5; p = 0.073; AUC = 0.713). RF achieved OOB error of 40.74% (accuracy = 59.3%); top features were NRAS, TP53_pathway, TP53, WNT_pathway, and n_genes. LASSO (10‑fold CV) retained NRAS, TP53, and age (coefficients +1.15, +0.85, −0.02). SVM yielded AUC = 0.678, accuracy = 55.6%. The multinomial model revealed complete separation for NRAS in the Advanced_No_Mets group, precluding standard MLE inference.

Conclusions
TP53 (gene/pathway) is a consistent risk signal across methods; NRAS carries high importance in ensemble/regularized models. Overall discrimination is modest, consistent with a small sample size; findings are hypothesis‑generating and warrant validation.

keywords:

colorectal cancer, metastasis, TP53, NRAS, logistic regression, random forest, LASSO, SVM, TRIPOD

Quick links
© 2026 Termedia Sp. z o.o.
Developed by Bentus.