AI Model for Forecasting Early Biochemical Recurrence of Prostate Cancer Following Robotic-Assisted Radical Prostatectomy

August 22, 2025 Alex Parker

The European Association of Urology advises adjuvant radiation therapy post-RP for patients exhibiting adverse pathology⁷. However, a systematic review regarding risk stratification in patients with recurrence after RP has shown no substantial evidence that the proposed risk factors are suitable for tailoring treatment in clinical practice. Additionally, after RP, the only factors with moderate evidence were pT3 stage, Gleason grade 4, surgical margins, and pre-salvage PSA levels over 0.5 ng/ml¹⁸.

There is limited evidence regarding the application of machine learning (ML) tools in evaluating patients with relapses after RP¹⁹. Accurately classifying patients who experience biochemical recurrence (BCR) is crucial, as adjuvant therapies can have adverse effects. Unfortunately, the current prognostic tools exhibit low discriminative power^20,21.

Wong analyzed 19 variables from 338 patients who underwent RALP, with 25 experiencing BCR over a 12-month follow-up. This analysis resulted in area under the curve (AUC) values of 0.903, 0.924, and 0.940 using the k-nearest neighbors (kNN), random forest (RF), and reinforcement learning (RL) algorithms, respectively. However, these models were derived from a small database, raising concerns about potential overfitting of the ML algorithms employed²². A similar study by Eski looked at 37 variables in 368 RALP patients, with 73 experiencing BCR during a 35-month follow-up. The authors achieved better results with kNN, yielding AUCs of 0.93 for kNN, 0.95 for RF, and 0.93 for RL. However, the study was poorly designed, making the results unreliable²³.

Tan examined 18 variables in 1,130 patients who underwent RALP, with 176 experiencing BCR across a 70-month follow-up. Three ML models were employed to forecast BCR, utilizing data validation and a 70/30 split strategy. The following outcomes were noted: 0.823 precision and an AUC of 0.894 with naïve Bayes, 0.838 precision and an AUC of 0.887 using RF, and 0.810 precision and an AUC of 0.852 with SVM at 60 months post-BCR. Tan also juxtaposed statistical risk models with ML models, discovering the ML models were comparable to Kattan’s nomogram²⁴.

Lastly, Lee embarked on an ML study involving 13 variables from 5,114 patients who underwent RP, 1,207 of whom experienced BCR over a 60-month follow-up. The study employed an 80/20 split strategy for the data, producing the following results: 0.719 precision and an AUC of 0.805 using RF, 0.705 precision with an AUC of 0.796 via ANN, and 0.740 precision with an AUC of 0.803 utilizing RL²⁵.

The studies mentioned have typically indicated that tree-based algorithms present solid prediction capabilities. In this investigation, the use of RF as the primary model yielded consistent outcomes. Transitioning to the XGBoost ensemble method enhanced the performance metrics, confirming its superiority as the optimal model in this context. XGBoost surpassed DNN, likely due to its effective regularization, early stopping, and adeptness at managing tabular structured data with moderate sample sizes.

The capacity to visualize variable significance, as demonstrated in the analysis of the XGBoost-based model, highlights the ability of these algorithms to pinpoint crucial features for BCR prediction in prostate cancer patients. It is vital for healthcare professionals to comprehend and trust the model’s decisions in a clinical context. Additionally, the robustness of the XGBoost algorithm in validating a novel patient dataset endorses its generalizability, making it potentially useful in dynamic clinical settings.

Previous oncology research has underscored the proficiency of tree-based algorithms with ensemble methods in managing intricate datasets and unveiling nonlinear patterns, thereby enhancing the prediction of clinically significant events^26,27,28. This could lead to a more precise assessment of the recurrence risk in post-RALP patients with prostate cancer.

In this project, the analysis of decision curves demonstrates the superiority of the AI model built on XGBoost compared to the CAPRA-S model in predicting BCR. While CAPRA-S provides moderate performance, the net clinical advantage of the XGBoost model, as evidenced in the decision curve, indicates that utilizing this model could prevent both overtreatment and undertreatment of patients. These results align with earlier studies showcasing that ML models can exceed traditional models like CAPRA in complex clinical scenarios²⁴. The model’s outputs were categorized into clinically actionable risk groups: > 80% as high risk (necessitating intensified PSA monitoring and early salvage evaluation), < 20% as low risk (standard surveillance), and 20–80% as intermediate risk (individualized follow-up). These thresholds correspond to the net benefit range observed in the decision curve analysis (Fig. 1), where the XGBoost model notably outshone CAPRA-S, especially for patients with recurrence probabilities between 30% and 70%. In this intermediate-risk category, where clinical choices can often be ambiguous, the model offers enhanced stratification, promoting more targeted interventions and potentially mitigating overtreatment.

The primary limitation of this study lies in its retrospective, single-center design, which could constrain the generalizability of the findings to other populations and clinical contexts. Additional challenges include possible over-regularization arising from class sub-sampling during model training, dependence on the accuracy and completeness of electronic health record coding, and the lack of genomic variables and imaging data, which could enhance the model’s predictive capabilities. Nevertheless, these findings reinforce the importance of considering advanced ML strategies for assessing and predicting clinical outcomes in post-RALP patients with BCR, offering a personalized perspective for treatment and counseling. Future research should aim to optimize these models and clarify their limitations in clinical practice. Next steps will involve external validation using public datasets such as TCGA-PRAD, along with implementing transfer learning techniques to analyze model adaptability when integrating molecular or imaging biomarkers. Ongoing collaboration between clinical specialists and data scientists is crucial for achieving effective medical decision-making.

While the model encompassed patients aged 18 to 80 years, certain subgroups, such as older adults, may be underrepresented. This presents potential issues of algorithmic bias, as unequal distributions of demographic or clinical characteristics might impact prediction reliability across varied populations. Future research should prioritize diverse and representative samples to ensure equitable clinical applicability of AI-based models.

The clinical relevance of our model for predicting BCR following RP is enhanced by incorporating postoperative Gleason scores, lymphovascular invasion, and tumor percentage, all of which have shown strong connections to prostate cancer prognosis. These predictors not only boost the model’s interpretability but also align with prevailing clinical assessment protocols, making the model’s results readily actionable for healthcare providers. The XGBoost model generates individualized probabilities for BCR in the first 24 months post-surgery. In practice, this probability can categorize patients into higher and lower BCR risk groups. A higher predicted risk advocates for more proactive management, including earlier consideration for salvage radiotherapy before significant PSA increases, quarterly PSA monitoring instead of bi-annual, and shared-decision counseling concerning adjuvant androgen deprivation therapy or clinical trial enrollment. Conversely, a lower predicted risk warrants a more conservative approach, scheduling less frequent imaging and laboratory follow-ups to lessen patient burden and circumvent unnecessary interventions. As the score is derived from clinicopathological variables that are already present in the electronic health record, it can be automatically generated at discharge and shown alongside established tools like CAPRA-S, providing clinicians with additional data-driven support for personalizing postoperative care.

To promote real-world implementation, we have developed a prototype web-based risk calculator utilizing Python and Streamlit. (Online Appendix 3, Fig. 1) This tool is actively used internally by physicians in our Urology Department for prospective validation in new post-RARP patients. This pilot application facilitates real-time risk estimation at the point of care and enables comparisons between model predictions and actual outcomes to iteratively evaluate clinical utility.

Further research across varied populations and clinical environments is critical to authenticate the model’s robustness and broaden its applicability in diverse clinical contexts.

Source link