Development and External Validation of a Machine Learning–Based Risk Score for Stent Outcomes in Post–Bariatric Leak Management: The “Alexandria-Bari-Stent” Tool – HRI

PUBLICATIONS

Development and External Validation of a Machine Learning–Based Risk Score for Stent Outcomes in Post–Bariatric Leak Management: The “Alexandria-Bari-Stent” Tool

There are no prediction models of stent outcomes for leaks after metabolic and bariatric surgery (MBS). The current study developed an artificial intelligence–based model to predict post-MBS stent failure.

Prospectively maintained database of patients with post-MBS leaks was used for model development (Center I, N = 250); external validation employed patients from another hospital (Center II, N = 150). Outcome definition was failure of the first (primary/initial) stent implantation to resolve the leak, i.e., lack of primary closure. Ranking of variables was performed, 11 machine learning algorithms were tested, the best model was selected, and a stent failure point-based risk scoring system was derived, with further external validation, calibration, and decision curve analysis.

The development cohort (training sample, Center I) had 27.6% failed stents/72.4% successes; the external validation cohort (Center II) had 30% failures/70% successes. The Lasso logistic regression model exhibited the best performance. Eight variables contributed to the model’s predictive performance (obstructive sleep apnea, hypertension, diabetes, hepatomegaly, hyperlipidemia, body mass index, Niti-S18 stent, gastrojejunal anastomosis leak), and nine others had varying contributions (revisional surgery, Niti-S23 stent, time to stent implantation, leak size > 1 cm, age, Roux-en-Y gastric bypass surgery, esophagogastric junction leak, Hanaro 21 stent, male sex). The clinical point-based stent failure risk system showed that scores ≤ 7 had very low failure risk (<1%), scores 8–47 = low risk (1–5%), 48–77 = moderate risk (5.1–15%), 78–117 = high risk (15.1–50%), and scores ≥198 were associated with extremely high failure risk (>96%). The model’s external validation demonstrated excellent discriminatory power, distinguishing between patients with/without the outcome with 0.85 area under the ROC curve (95% CI: 0.76–0.93), 80% sensitivity (95% CI: 65.4-90.4%), 82.9% specificity (95% CI: 74.3-89.5%), and 66.7% positive predictive value (95% CI: 52.4–79.0%). The negative predictive value was 90.6% (95% CI: 82.9–95.6%) indicating that the model was particularly effective at identifying patients unlikely to fail. Area under the precision-recall curve was 0.81 (95% CI: 0.70–0.89) indicating strong performance in identifying true positives while minimizing false positives. Calibration was acceptable (Brier score = 0.15). Decision curve analysis demonstrated higher net benefit when used in clinical decision-making across a broad range of threshold probabilities (0.10–0.80) compared to treating all patients or treating none.