Expert Surgeons and Deep Learning Models Can Predict the Outcome of Surgical Hemorrhage from One Minute of Video

Dhiraj J Pangal, Guillaume Kugener, Yichao Zhu, Aditya Sinha, Vyom Unadkat, David J Cote, Ben Strickland, Martin Rutkowski, Andrew Hung, Animashree Anandkumar, X.Y. Han, Vardan Papyan, Bozena Wrobel , Gabriel Zada, Daniel A Donoho

Preprint in medRxiv
DOI: https://doi.org/10.1101/2022.01.22.22269640

Abstract

Background: Major vascular injury resulting in uncontrolled bleeding is a catastrophic and often fatal complication of minimally invasive surgery. At the outset of these events, surgeons do not know how much blood will be lost or whether they will successfully control the hemorrhage (achieve hemostasis). We evaluate the ability of a deep learning neural network (DNN) to predict hemostasis control ability using the first minute of surgical video and compare model performance with human experts viewing the same video.

Methods: The publicly available SOCAL dataset contains 147 videos of attending and resident surgeons managing hemorrhage in a validated, high-fidelity cadaveric simulator. Videos are labeled with outcome and blood loss (mL). The first minute of 20 videos was shown to four, blinded, fellowship trained skull-base neurosurgery instructors, and to SOCALNet (a DNN trained on SOCAL videos). SOCALNet architecture included a convolutional network (ResNet) identifying spatial features and a recurrent network identifying temporal features (LSTM). Experts independently assessed surgeon skill, predicted outcome and blood loss (mL). Outcome and blood loss predictions were compared with SOCALNet.

Results: Expert inter-rater reliability was 0.95. Experts correctly predicted 14/20 trials (Sensitivity: 82%, Specificity: 55%, Positive Predictive Value (PPV): 69%, Negative Predictive Value (NPV): 71%). SOCALNet correctly predicted 17/20 trials (Sensitivity 100%, Specificity 66%, PPV 79%, NPV 100%) and correctly identified all successful attempts.

Expert predictions of the highest and lowest skill surgeons and expert predictions reported with maximum confidence were more accurate. Experts systematically underestimated blood loss (mean error −131 mL, RMSE 350 mL, R² 0.70) and fewer than half of expert predictions identified blood loss > 500mL (47.5%, 19/40). SOCALNet had superior performance (mean error −57 mL, RMSE 295mL, R² 0.74) and detected most episodes of blood loss > 500mL (80%, 8/10).

In validation experiments, SOCALNet evaluation of a critical on-screen surgical maneuver and high/low-skill composite videos were concordant with expert evaluation.

Conclusion: Using only the first minute of video, experts and SOCALNet can predict outcome and blood loss during surgical hemorrhage. Experts systematically underestimated blood loss, and SOCALNet had no false negatives. DNNs can provide accurate, meaningful assessments of surgical video. We call for the creation of datasets of surgical adverse events for quality improvement research.