Abstract:Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.
Abstract:Due to its non-invasive and painless characteristics, wireless capsule endoscopy has become the new gold standard for assessing gastrointestinal disorders. Omissions, however, could occur throughout the examination since controlling capsule endoscope can be challenging. In this work, we control the magnetic capsule endoscope for the coverage scanning task in the stomach based on reinforcement learning so that the capsule can comprehensively scan every corner of the stomach. We apply a well-made virtual platform named VR-Caps to simulate the process of stomach coverage scanning with a capsule endoscope model. We utilize and compare two deep reinforcement learning algorithms, the Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) algorithms, to train the permanent magnetic agent, which actuates the capsule endoscope directly via magnetic fields and then optimizes the scanning efficiency of stomach coverage. We analyze the pros and cons of the two algorithms with different hyperparameters and achieve a coverage rate of 98.04% of the stomach area within 150.37 seconds.