Zero Shot Video Object Detection


FMVP: Masked Flow Matching for Adversarial Video Purification

Add code
Jan 05, 2026
Viaarxiv icon

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Add code
Dec 22, 2025
Viaarxiv icon

EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos

Add code
Aug 17, 2025
Viaarxiv icon

OpenNav: Open-World Navigation with Multimodal Large Language Models

Add code
Jul 24, 2025
Figure 1 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 2 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 3 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 4 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Viaarxiv icon

Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety

Add code
Apr 18, 2025
Viaarxiv icon

Context in object detection: a systematic literature review

Add code
Mar 29, 2025
Viaarxiv icon

Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness

Add code
May 13, 2025
Viaarxiv icon

Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline

Add code
Apr 16, 2025
Viaarxiv icon

Perception Encoder: The best visual embeddings are not at the output of the network

Add code
Apr 17, 2025
Figure 1 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 2 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 3 for Perception Encoder: The best visual embeddings are not at the output of the network
Figure 4 for Perception Encoder: The best visual embeddings are not at the output of the network
Viaarxiv icon

IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval

Add code
Apr 01, 2025
Viaarxiv icon