Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Sep 30, 2022

Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova

Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Share this with someone who'll enjoy it:

Abstract:We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models. F-VLM simplifies the current multi-stage training pipeline by eliminating the need for knowledge distillation or detection-tailored pretraining. Surprisingly, we observe that a frozen VLM: 1) retains the locality-sensitive features necessary for detection, and 2) is a strong region classifier. We finetune only the detector head and combine the detector and VLM outputs for each region at inference time. F-VLM shows compelling scaling behavior and achieves +6.5 mask AP improvement over the previous state of the art on novel categories of LVIS open-vocabulary detection benchmark. In addition, we demonstrate very competitive results on COCO open-vocabulary detection benchmark and cross-dataset transfer detection, in addition to significant training speed-up and compute savings. Code will be released.

* 19 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Paper and Code