Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Helen Mei-Ling Meng

Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

Jun 05, 2025

Hongyu Wang, Yonghao Long, Yueyao Chen, Hon-Chi Yip, Markus Scheppach, Philip Wai-Yan Chiu, Yeung Yam, Helen Mei-Ling Meng, Qi Dou

Figure 1 for Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

Figure 2 for Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

Figure 3 for Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

Figure 4 for Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

Abstract:Endoscopic Submucosal Dissection (ESD) is a well-established technique for removing epithelial lesions. Predicting dissection trajectories in ESD videos offers significant potential for enhancing surgical skill training and simplifying the learning process, yet this area remains underexplored. While imitation learning has shown promise in acquiring skills from expert demonstrations, challenges persist in handling uncertain future movements, learning geometric symmetries, and generalizing to diverse surgical scenarios. To address these, we introduce a novel approach: Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE). Our method models expert behavior through a joint state action distribution, capturing the stochastic nature of dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model into policy learning, iDPOE ensures efficient training and sampling, leading to more accurate predictions and better generalization. Additionally, we enhance the model's ability to generalize to geometric symmetries by embedding equivariance into the learning process. To address state mismatches, we develop a forward-process guided action inference strategy for conditional sampling. Using an ESD video dataset of nearly 2000 clips, experimental results show that our approach surpasses state-of-the-art methods, both explicit and implicit, in trajectory prediction. To the best of our knowledge, this is the first application of imitation learning to surgical skill development for dissection trajectory prediction.

Via

Access Paper or Ask Questions

Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

Nov 17, 2022

Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng

Figure 1 for Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

Figure 2 for Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

Figure 3 for Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

Figure 4 for Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

Abstract:The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome these challenges, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss. The margin improves the speaker representation's discrimination ability. We introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed a gradient-based multi-objective optimization approach to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great speaker discrimination across languages.

* Submitted to ICASSP 2023, 5 pages, 2 figures

Via

Access Paper or Ask Questions