Purpose Modeling and recognition of surgical activities poses an interesting research problem. Although a number of recent works studied automatic recognition of surgical activities, generalizability of these works across different tasks and different datasets remains a challenge. We introduce a modality that is robust to scene variation, based on spatial temporal graph representations of surgical tools in videos for surgical activity recognition. Methods To show its effectiveness, we model and recognize surgical gestures with the proposed modality. We construct spatial graphs connecting the joint pose estimations of surgical tools. Then, we connect each joint to the corresponding joint in the consecutive frames forming inter-frame edges representing the trajectory of the joint over time. We then learn hierarchical spatial temporal graph representations using Spatial Temporal Graph Convolutional Networks (ST-GCN). Results Our experimental results show that learned spatial temporal graph representations of surgical videos perform well in surgical gesture recognition even when used individually. We experiment with the Suturing task of the JIGSAWS dataset where the chance baseline for gesture recognition is 10%. Our results demonstrate 68% average accuracy which suggests a significant improvement. Conclusions Our experimental results show that our model learns meaningful representations.These learned representations can be used either individually, in cascades or as a complementary modality in surgical activity recognition, therefore provide a benchmark. To our knowledge, our paper is the first to use spatial temporal graph representations based on pose estimations of surgical tools in surgical activity recognition.
The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical I mage Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit.
Objective: A median of 14.4% of patient undergone at least one adverse event during surgery and a third of them are preventable. The occurrence of adverse events forces surgeons to implement corrective strategies and, thus, deviate from the standard surgical process. Therefore, it is clear that the automatic identification of adverse events is a major challenge for patient safety. In this paper, we have proposed a method enabling us to identify such deviations. We have focused on identifying surgeons' deviations from standard surgical processes due to surgical events rather than anatomic specificities. This is particularly challenging, given the high variability in typical surgical procedure workflows. Methods: We have introduced a new approach designed to automatically detect and distinguish surgical process deviations based on multi-dimensional non-linear temporal scaling with a hidden semi-Markov model using manual annotation of surgical processes. The approach was then evaluated using cross-validation. Results: The best results have over 90% accuracy. Recall and precision were superior at 70%. We have provided a detailed analysis of the incorrectly-detected observations. Conclusion: Multi-dimensional non-linear temporal scaling with a hidden semi-Markov model provides promising results for detecting deviations. Our error analysis of the incorrectly-detected observations offers different leads in order to further improve our method. Significance: Our method demonstrated the feasibility of automatically detecting surgical deviations that could be implemented for both skill analysis and developing situation awareness-based computer-assisted surgical systems.
In this paper, we address the open research problem of surgical gesture recognition using motion cues from video data only. We adapt Optical flow ConvNets initially proposed by Simonyan et al.. While Simonyan uses both RGB frames and dense optical flow, we use only dense optical flow representations as input to emphasize the role of motion in surgical gesture recognition, and present it as a robust alternative to kinematic data. We also overcome one of the limitations of Optical flow ConvNets by initializing our model with cross modality pre-training. A large number of promising studies that address surgical gesture recognition highly rely on kinematic data which requires additional recording devices. To our knowledge, this is the first paper that addresses surgical gesture recognition using dense optical flow information only. We achieve competitive results on JIGSAWS dataset, moreover, our model achieves more robust results with less standard deviation, which suggests optical flow information can be used as an alternative to kinematic data for the recognition of surgical gestures.
Lack of training data hinders automatic recognition and prediction of surgical activities necessary for situation-aware operating rooms. We propose using knowledge transfer to compensate for data deficit and improve prediction. We used two approaches to extract and transfer surgical process knowledge. First, we encoded semantic information about surgical terms using word embedding which boosted learning process. Secondly, we passed knowledge between different clinical datasets of neurosurgical procedures using transfer learning. Transfer learning was shown to be more effective than a simple combination of data, especially for less similar procedures. The combination of two methods provided 22% improvement of activity prediction. We also made several pertinent observations about surgical practices.