Abstract:Multi-camera tracking with overlapping fields of view typically relies on centralized fusion, which creates computational bottlenecks that prevent deployment at scale. We present MV3DT, a fully distributed framework for real-time multi-view 3D tracking that achieves accurate identity propagation and occlusion recovery through peer-to-peer coordination, eliminating the need for central aggregation. Each camera node executes a lightweight modular pipeline comprising monocular 3D perception, distributed multi-view association, and collaborative fusion via lightweight messaging. MV3DT achieves 94.3% IDF1 and 93.3% MOTA on WILDTRACK, competitive with state-of-the-art centralized methods, while demonstrating superior scalability by sustaining 30 FPS on 100 cameras with less than 10 ms inter-camera latency and only 2.2% communication overhead. MV3DT operates in a zero-shot regime given camera calibrations, requiring no scene-specific learning and making it directly deployable in new environments. These results establish MV3DT as a practical solution for real-time multi-view tracking in large-scale overlapping camera networks.




Abstract:Multi-camera Association (MCA) is the task of identifying objects and individuals across camera views and is an active research topic, given its numerous applications across robotics, surveillance, and agriculture. We investigate a novel multi-camera multi-target association algorithm based on dense pixel correspondence estimation with a Transformer-based architecture and underlying detection-based masking. After the algorithm generates a set of corresponding keypoints and their respective confidence levels between every pair of detections in the camera views are computed, an affinity matrix is determined containing the probabilities of matches between each pair. Finally, the Hungarian algorithm is applied to generate an optimal assignment matrix with all the predicted associations between the camera views. Our method is evaluated on the WILDTRACK Seven-Camera HD Dataset, a high-resolution dataset containing footage of walking pedestrians as well as precise annotations and camera calibrations. Our results conclude that the algorithm performs exceptionally well associating pedestrians on camera pairs that are positioned close to each other and observe the scene from similar perspectives. On camera pairs with orientations that are drastically different in distance or angle, there is still significant room for improvement.