Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John S. Zelek

University of Waterloo

DreamPose3D: Hallucinative Diffusion with Prompt Learning for 3D Human Pose Estimation

Nov 12, 2025

Jerrin Bright, Yuhao Chen, John S. Zelek

Abstract:Accurate 3D human pose estimation remains a critical yet unresolved challenge, requiring both temporal coherence across frames and fine-grained modeling of joint relationships. However, most existing methods rely solely on geometric cues and predict each 3D pose independently, which limits their ability to resolve ambiguous motions and generalize to real-world scenarios. Inspired by how humans understand and anticipate motion, we introduce DreamPose3D, a diffusion-based framework that combines action-aware reasoning with temporal imagination for 3D pose estimation. DreamPose3D dynamically conditions the denoising process using task-relevant action prompts extracted from 2D pose sequences, capturing high-level intent. To model the structural relationships between joints effectively, we introduce a representation encoder that incorporates kinematic joint affinity into the attention mechanism. Finally, a hallucinative pose decoder predicts temporally coherent 3D pose sequences during training, simulating how humans mentally reconstruct motion trajectories to resolve ambiguity in perception. Extensive experiments on benchmarked Human3.6M and MPI-3DHP datasets demonstrate state-of-the-art performance across all metrics. To further validate DreamPose3D's robustness, we tested it on a broadcast baseball dataset, where it demonstrated strong performance despite ambiguous and noisy 2D inputs, effectively handling temporal consistency and intent-driven motion variations.

Via

Access Paper or Ask Questions

SAGOnline: Segment Any Gaussians Online

Aug 11, 2025

Wentao Sun, Quanyun Wu, Hanqing Xu, Kyle Gao, Zhengsen Xu, Yiping Chen, Dedong Zhang, Lingfei Ma, John S. Zelek, Jonathan Li

Abstract:3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation, yet achieving efficient and consistent 3D segmentation remains challenging. Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously. We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes that addresses these limitations through two key innovations: (1) a decoupled strategy that integrates video foundation models (e.g., SAM2) for view-consistent 2D mask propagation across synthesized views; and (2) a GPU-accelerated 3D mask generation and Gaussian-level instance labeling algorithm that assigns unique identifiers to 3D primitives, enabling lossless multi-object tracking and segmentation across views. SAGOnline achieves state-of-the-art performance on NVOS (92.7% mIoU) and Spin-NeRF (95.2% mIoU) benchmarks, outperforming Feature3DGS, OmniSeg3D-gs, and SA3D by 15--1500 times in inference speed (27 ms/frame). Qualitative results demonstrate robust multi-object segmentation and tracking in complex scenes. Our contributions include: (i) a lightweight and zero-shot framework for 3D segmentation in Gaussian scenes, (ii) explicit labeling of Gaussian primitives enabling simultaneous segmentation and tracking, and (iii) the effective adaptation of 2D video foundation models to the 3D domain. This work allows real-time rendering and 3D scene understanding, paving the way for practical AR/VR and robotic applications.

* 19 pages, 10 figures

Via

Access Paper or Ask Questions

PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting

Aug 01, 2025

Wentao Sun, Hanqing Xu, Quanyun Wu, Dedong Zhang, Yiping Chen, Lingfei Ma, John S. Zelek, Jonathan Li

Figure 1 for PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting

Figure 2 for PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting

Figure 3 for PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting

Figure 4 for PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting

Abstract:We introduce PointGauss, a novel point cloud-guided framework for real-time multi-object segmentation in Gaussian Splatting representations. Unlike existing methods that suffer from prolonged initialization and limited multi-view consistency, our approach achieves efficient 3D segmentation by directly parsing Gaussian primitives through a point cloud segmentation-driven pipeline. The key innovation lies in two aspects: (1) a point cloud-based Gaussian primitive decoder that generates 3D instance masks within 1 minute, and (2) a GPU-accelerated 2D mask rendering system that ensures multi-view consistency. Extensive experiments demonstrate significant improvements over previous state-of-the-art methods, achieving performance gains of 1.89 to 31.78% in multi-view mIoU, while maintaining superior computational efficiency. To address the limitations of current benchmarks (single-object focus, inconsistent 3D evaluation, small scale, and partial coverage), we present DesktopObjects-360, a novel comprehensive dataset for 3D segmentation in radiance fields, featuring: (1) complex multi-object scenes, (2) globally consistent 2D annotations, (3) large-scale training data (over 27 thousand 2D masks), (4) full 360{\deg} coverage, and (5) 3D evaluation masks.

* 22 pages, 9 figures

Via

Access Paper or Ask Questions

Complete Autonomous Robotic Nasopharyngeal Swab System with Evaluation on a Stochastically Moving Phantom Head

Aug 23, 2024

Peter Q. Lee, John S. Zelek, Katja Mombaur

Figure 1 for Complete Autonomous Robotic Nasopharyngeal Swab System with Evaluation on a Stochastically Moving Phantom Head

Figure 2 for Complete Autonomous Robotic Nasopharyngeal Swab System with Evaluation on a Stochastically Moving Phantom Head

Figure 3 for Complete Autonomous Robotic Nasopharyngeal Swab System with Evaluation on a Stochastically Moving Phantom Head

Figure 4 for Complete Autonomous Robotic Nasopharyngeal Swab System with Evaluation on a Stochastically Moving Phantom Head

Abstract:The application of autonomous robotics to close-contact healthcare tasks has a clear role for the future due to its potential to reduce infection risks to staff and improve clinical efficiency. Nasopharyngeal (NP) swab sample collection for diagnosing upper-respiratory illnesses is one type of close contact task that is interesting for robotics due to the dexterity requirements and the unobservability of the nasal cavity. We propose a control system that performs the test using a collaborative manipulator arm with an instrumented end-effector to take visual and force measurements, under the scenario that the patient is unrestrained and the tools are general enough to be applied to other close contact tasks. The system employs a visual servo controller to align the swab with the nostrils. A compliant joint velocity controller inserts the swab along a trajectory optimized through a simulation environment, that also reacts to measured forces applied to the swab. Additional subsystems include a fuzzy logic system for detecting when the swab reaches the nasopharynx and a method for detaching the swab and aborting the procedure if safety criteria is violated. The system is evaluated using a second robotic arm that holds a nasal cavity phantom and simulates the natural head motions that could occur during the procedure. Through extensive experiments, we identify controller configurations capable of effectively performing the NP swab test even with significant head motion, which demonstrates the safety and reliability of the system.

* 18 pages, 26 figures. Supplementary files may be found at https://uwaterloo.ca/scholar/pqjlee/supplementary-files-complete-autonomous-robotic-nasopharyngeal-swab-system-evaluation

Via

Access Paper or Ask Questions

Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

Aug 22, 2024

Peter Q. Lee, John S. Zelek, Katja Mombaur

Figure 1 for Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

Figure 2 for Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

Figure 3 for Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

Figure 4 for Robotic Eye-in-hand Visual Servo Axially Aligning Nasopharyngeal Swabs with the Nasal Cavity

Abstract:The nasopharyngeal (NP) swab test is a method for collecting cultures to diagnose for different types of respiratory illnesses, including COVID-19. Delegating this task to robots would be beneficial in terms of reducing infection risks and bolstering the healthcare system, but a critical component of the NP swab test is having the swab aligned properly with the nasal cavity so that it does not cause excessive discomfort or injury by traveling down the wrong passage. Existing research towards robotic NP swabbing typically assumes the patient's head is held within a fixture. This simplifies the alignment problem, but is also dissimilar to clinical scenarios where patients are typically free-standing. Consequently, our work creates a vision-guided pipeline to allow an instrumented robot arm to properly position and orient NP swabs with respect to the nostrils of free-standing patients. The first component of the pipeline is a precomputed joint lookup table to allow the arm to meet the patient's arbitrary position in the designated workspace, while avoiding joint limits. Our pipeline leverages semantic face models from computer vision to estimate the Euclidean pose of the face with respect to a monocular RGB-D camera placed on the end-effector. These estimates are passed into an unscented Kalman filter on manifolds state estimator and a pose based visual servo control loop to move the swab to the designated pose in front of the nostril. Our pipeline was validated with human trials, featuring a cohort of 25 participants. The system is effective, reaching the nostril for 84% of participants, and our statistical analysis did not find significant demographic biases within the cohort.

* 12 pages, 13 figures

Via

Access Paper or Ask Questions

Collaborative Robot Arm Inserting Nasopharyngeal Swabs with Admittance Control

Aug 21, 2024

Peter Q. Lee, John S. Zelek, Katja Mombaur

Figure 1 for Collaborative Robot Arm Inserting Nasopharyngeal Swabs with Admittance Control

Figure 2 for Collaborative Robot Arm Inserting Nasopharyngeal Swabs with Admittance Control

Figure 3 for Collaborative Robot Arm Inserting Nasopharyngeal Swabs with Admittance Control

Figure 4 for Collaborative Robot Arm Inserting Nasopharyngeal Swabs with Admittance Control

Abstract:The nasopharyngeal (NP) swab sample test, commonly used to detect COVID-19 and other respiratory illnesses, involves moving a swab through the nasal cavity to collect samples from the nasopharynx. While typically this is done by human healthcare workers, there is a significant societal interest to enable robots to do this test to reduce exposure to patients and to free up human resources. The task is challenging from the robotics perspective because of the dexterity and safety requirements. While other works have implemented specific hardware solutions, our research differentiates itself by using a ubiquitous rigid robotic arm. This work presents a case study where we investigate the strengths and challenges using compliant control system to accomplish NP swab tests with such a robotic configuration. To accomplish this, we designed a force sensing end-effector that integrates with the proposed torque controlled compliant control loop. We then conducted experiments where the robot inserted NP swabs into a 3D printed nasal cavity phantom. Ultimately, we found that the compliant control system outperformed a basic position controller and shows promise for human use. However, further efforts are needed to ensure the initial alignment with the nostril and to address head motion.

* 13 pages, 9 figures. See https://uwaterloo.ca/scholar/pqjlee/collaborative-robot-arm-inserting-nasopharyngeal-swabs-admittance-control for supplementary data

Via

Access Paper or Ask Questions

POPCat: Propagation of particles for complex annotation tasks

Jun 24, 2024

Adam Srebrnjak Yang, Dheeraj Khanna, John S. Zelek

Figure 1 for POPCat: Propagation of particles for complex annotation tasks

Figure 2 for POPCat: Propagation of particles for complex annotation tasks

Figure 3 for POPCat: Propagation of particles for complex annotation tasks

Figure 4 for POPCat: Propagation of particles for complex annotation tasks

Abstract:Novel dataset creation for all multi-object tracking, crowd-counting, and industrial-based videos is arduous and time-consuming when faced with a unique class that densely populates a video sequence. We propose a time efficient method called POPCat that exploits the multi-target and temporal features of video data to produce a semi-supervised pipeline for segmentation or box-based video annotation. The method retains the accuracy level associated with human level annotation while generating a large volume of semi-supervised annotations for greater generalization. The method capitalizes on temporal features through the use of a particle tracker to expand the domain of human-provided target points. This is done through the use of a particle tracker to reassociate the initial points to a set of images that follow the labeled frame. A YOLO model is then trained with this generated data, and then rapidly infers on the target video. Evaluations are conducted on GMOT-40, AnimalTrack, and Visdrone-2019 benchmarks. These multi-target video tracking/detection sets contain multiple similar-looking targets, camera movements, and other features that would commonly be seen in "wild" situations. We specifically choose these difficult datasets to demonstrate the efficacy of the pipeline and for comparison purposes. The method applied on GMOT-40, AnimalTrack, and Visdrone shows a margin of improvement on recall/mAP50/mAP over the best results by a value of 24.5%/9.6%/4.8%, -/43.1%/27.8%, and 7.5%/9.4%/7.5% where metrics were collected.

* 10 pages, 5 figures, Accepted in "Conference on Robots and Vision 2024"

Via

Access Paper or Ask Questions

Multi Player Tracking in Ice Hockey with Homographic Projections

May 22, 2024

Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek

Abstract:Multi Object Tracking (MOT) in ice hockey pursues the combined task of localizing and associating players across a given sequence to maintain their identities. Tracking players from monocular broadcast feeds is an important computer vision problem offering various downstream analytics and enhanced viewership experience. However, existing trackers encounter significant difficulties in dealing with occlusions, blurs, and agile player movements prevalent in telecast feeds. In this work, we propose a novel tracking approach by formulating MOT as a bipartite graph matching problem infused with homography. We disentangle the positional representations of occluded and overlapping players in broadcast view, by mapping their foot keypoints to an overhead rink template, and encode these projected positions into the graph network. This ensures reliable spatial context for consistent player tracking and unfragmented tracklet prediction. Our results show considerable improvements in both the IDsw and IDF1 metrics on the two available broadcast ice hockey datasets.

* Accepted at the Conference on Robots and Vision (CRV), 2024

Via

Access Paper or Ask Questions

Evaluating deep tracking models for player tracking in broadcast ice hockey video

May 22, 2022

Kanav Vats, Mehrnaz Fani, David A. Clausi, John S. Zelek

Figure 1 for Evaluating deep tracking models for player tracking in broadcast ice hockey video

Figure 2 for Evaluating deep tracking models for player tracking in broadcast ice hockey video

Figure 3 for Evaluating deep tracking models for player tracking in broadcast ice hockey video

Figure 4 for Evaluating deep tracking models for player tracking in broadcast ice hockey video

Abstract:Tracking and identifying players is an important problem in computer vision based ice hockey analytics. Player tracking is a challenging problem since the motion of players in hockey is fast-paced and non-linear. There is also significant player-player and player-board occlusion, camera panning and zooming in hockey broadcast video. Prior published research perform player tracking with the help of handcrafted features for player detection and re-identification. Although commercial solutions for hockey player tracking exist, to the best of our knowledge, no network architectures used, training data or performance metrics are publicly reported. There is currently no published work for hockey player tracking making use of the recent advancements in deep learning while also reporting the current accuracy metrics used in literature. Therefore, in this paper, we compare and contrast several state-of-the-art tracking algorithms and analyze their performance and failure modes in ice hockey.

* Accepted to Link\"oping Hockey Analytics Conference (LINHAC). arXiv admin note: substantial text overlap with arXiv:2110.03090

Via

Access Paper or Ask Questions

Causal Discovery from Sparse Time-Series Data Using Echo State Network

Jan 09, 2022

Haonan Chen, Bo Yuan Chang, Mohamed A. Naiel1, Georges Younes, Steven Wardell, Stan Kleinikkink, John S. Zelek

Figure 1 for Causal Discovery from Sparse Time-Series Data Using Echo State Network

Figure 2 for Causal Discovery from Sparse Time-Series Data Using Echo State Network

Figure 3 for Causal Discovery from Sparse Time-Series Data Using Echo State Network

Figure 4 for Causal Discovery from Sparse Time-Series Data Using Echo State Network

Abstract:Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network, which is a type of reservoir computer (i.e., used for chaotic system modeling) for Causal discovery. We evaluate the performance of our proposed system against three other off-the-shelf causal discovery algorithms, namely, structural expectation-maximization, sub-sampled linear auto-regression absolute coefficients, and multivariate Granger Causality with vector auto-regressive using the Tennessee Eastman chemical dataset; we report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms, demonstrating the viability of our approach to discover causal relationships in a complex system with missing entries.

Via

Access Paper or Ask Questions