Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zekai Liang

Real-time Rendering-based Surgical Instrument Tracking via Evolutionary Optimization

Mar 12, 2026

Hanyang Hu, Zekai Liang, Florian Richter, Michael C. Yip

Abstract:Accurate and efficient tracking of surgical instruments is fundamental for Robot-Assisted Minimally Invasive Surgery. Although vision-based robot pose estimation has enabled markerless calibration without tedious physical setups, reliable tool tracking for surgical robots still remains challenging due to partial visibility and specialized articulation design of surgical instruments. Previous works in the field are usually prone to unreliable feature detections under degraded visual quality and data scarcity, whereas rendering-based methods often struggle with computational costs and suboptimal convergence. In this work, we incorporate CMA-ES, an evolutionary optimization strategy, into a versatile tracking pipeline that jointly estimates surgical instrument pose and joint configurations. Using batch rendering to efficiently evaluate multiple pose candidates in parallel, the method significantly reduces inference time and improves convergence robustness. The proposed framework further generalizes to joint angle-free and bi-manual tracking settings, making it suitable for both vision feedback control and online surgery video calibration. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method significantly outperforms prior approaches in both accuracy and runtime.

Via

Access Paper or Ask Questions

Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Mar 07, 2025

Zekai Liang, Zih-Yun Chiu, Florian Richter, Michael C. Yip

Figure 1 for Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Figure 2 for Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Figure 3 for Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Figure 4 for Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Abstract:Robot pose estimation is a challenging and crucial task for vision-based surgical robotic automation. Typical robotic calibration approaches, however, are not applicable to surgical robots, such as the da Vinci Research Kit (dVRK), due to joint angle measurement errors from cable-drives and the partially visible kinematic chain. Hence, previous works in surgical robotic automation used tracking algorithms to estimate the pose of the surgical tool in real-time and compensate for the joint angle errors. However, a big limitation of these previous tracking works is the initialization step which relied on only keypoints and SolvePnP. In this work, we fully explore the potential of geometric primitives beyond just keypoints with differentiable rendering, cylinders, and construct a versatile pose matching pipeline in a novel pose hypothesis space. We demonstrate the state-of-the-art performance of our single-shot calibration method with both calibration consistency and real surgical tasks. As a result, this marker-less calibration approach proves to be a robust and generalizable initialization step for surgical tool tracking.

Via

Access Paper or Ask Questions

CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Sep 16, 2024

Jingpei Lu, Zekai Liang, Tristin Xie, Florian Ritcher, Shan Lin, Sainan Liu, Michael C. Yip

Figure 1 for CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Figure 2 for CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Figure 3 for CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Figure 4 for CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera

Abstract:Camera-to-robot calibration is crucial for vision-based robot control and requires effort to make it accurate. Recent advancements in markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration. While the existing markerless pose estimation methods have demonstrated impressive accuracy without the need for cumbersome setups, they rely on the assumption that all the robot joints are visible within the camera's field of view. However, in practice, robots usually move in and out of view, and some portion of the robot may stay out-of-frame during the whole manipulation task due to real-world constraints, leading to a lack of sufficient visual features and subsequent failure of these approaches. To address this challenge and enhance the applicability to vision-based robot control, we propose a novel framework capable of estimating the robot pose with partially visible robot manipulators. Our approach leverages the Vision-Language Models for fine-grained robot components detection, and integrates it into a keypoint-based pose estimation network, which enables more robust performance in varied operational conditions. The framework is evaluated on both public robot datasets and self-collected partial-view datasets to demonstrate our robustness and generalizability. As a result, this method is effective for robot pose estimation in a wider range of real-world manipulation scenarios.

* 7 pages, 5 figures, project website: https://sites.google.com/ucsd.edu/ctrnet-x

Via

Access Paper or Ask Questions