Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomoyasu Shimada

Clip-level Uncertainty and Temporal-aware Active Learning for End-to-End Multi-Object Tracking

May 11, 2026

Riku Inoue, Shogo Sato, Kazuhiko Murasaki, Tomoyasu Shimada, Toshihiko Nishimura, Ryuichi Tanida

Abstract:Multi-Object Tracking (MOT) in dynamic environments relies on robust temporal reasoning to maintain consistent object identities over time. Transformer-based end-to-end MOT models achieve strong performance by explicitly modeling temporal dependencies, yet training them requires extensive bounding-box and identity annotations. Given the high labeling cost and strong redundancy in videos, Active Learning (AL) is an effective approach to improve annotation efficiency. However, existing AL methods for MOT primarily operate at the frame level, which is structurally misaligned with modern end-to-end trackers whose inference and training rely on multi-frame clips. To bridge this gap, we formulate clip-level active learning and propose Clip-level Uncertainty and Temporal-aware Active Learning (CUTAL). In contrast to frame-based approaches, CUTAL scores each clip using uncertainty metrics derived from multi-frame predictions to capture inter-frame correspondence ambiguities, while enforcing temporal diversity to select an informative and non-redundant subset. Experiments show that CUTAL achieves stronger overall performance than baselines at the same label budgets across MeMOTR and SambaMOTR. Notably, CUTAL achieves performance comparable to full supervision for MeMOTR on both datasets using only 50% of the labeled training data.

* Accepted to 2026 IEEE International Conference on Image Processing (ICIP). Copyright 2026 IEEE. Published in 2026 IEEE International Conference on Image Processing (ICIP), scheduled for 13-17 September 2026 in Tampere, Finland

Via

Access Paper or Ask Questions

IPCD: Intrinsic Point-Cloud Decomposition

Nov 13, 2025

Shogo Sato, Takuhiro Kaneko, Shoichiro Takeda, Tomoyasu Shimada, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura

Abstract:Point clouds are widely used in various fields, including augmented reality (AR) and robotics, where relighting and texture editing are crucial for realistic visualization. Achieving these tasks requires accurately separating albedo from shade. However, performing this separation on point clouds presents two key challenges: (1) the non-grid structure of point clouds makes conventional image-based decomposition models ineffective, and (2) point-cloud models designed for other tasks do not explicitly consider global-light direction, resulting in inaccurate shade. In this paper, we introduce \textbf{Intrinsic Point-Cloud Decomposition (IPCD)}, which extends image decomposition to the direct decomposition of colored point clouds into albedo and shade. To overcome challenge (1), we propose \textbf{IPCD-Net} that extends image-based model with point-wise feature aggregation for non-grid data processing. For challenge (2), we introduce \textbf{Projection-based Luminance Distribution (PLD)} with a hierarchical feature refinement, capturing global-light ques via multi-view projection. For comprehensive evaluation, we create a synthetic outdoor-scene dataset. Experimental results demonstrate that IPCD-Net reduces cast shadows in albedo and enhances color accuracy in shade. Furthermore, we showcase its applications in texture editing, relighting, and point-cloud registration under varying illumination. Finally, we verify the real-world applicability of IPCD-Net.

* Accepted in WACV2026

Via

Access Paper or Ask Questions

Memory-Efficient Point Cloud Registration via Overlapping Region Sampling

Oct 29, 2024

Tomoyasu Shimada, Kazuhiko Murasaki, Shogo Sato, Toshihiko Nishimura, Taiga Yoshida, Ryuichi Tanida

Abstract:Recent advances in deep learning have improved 3D point cloud registration but increased graphics processing unit (GPU) memory usage, often requiring preliminary sampling that reduces accuracy. We propose an overlapping region sampling method to reduce memory usage while maintaining accuracy. Our approach estimates the overlapping region and intensively samples from it, using a k-nearest-neighbor (kNN) based point compression mechanism with multi layer perceptron (MLP) and transformer architectures. Evaluations on 3DMatch and 3DLoMatch datasets show our method outperforms other sampling methods in registration recall, especially at lower GPU memory levels. For 3DMatch, we achieve 94% recall with 33% reduced memory usage, with greater advantages in 3DLoMatch. Our method enables efficient large-scale point cloud registration in resource-constrained environments, maintaining high accuracy while significantly reducing memory requirements.

* accepted for IEEE International Conference on Visual Communications and Image Processing 2024 (VCIP2024)

Via

Access Paper or Ask Questions