Picture for Jiale Cao

Jiale Cao

Multi-Granularity Language-Guided Multi-Object Tracking

Add code
Jun 07, 2024
Figure 1 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 2 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 3 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 4 for Multi-Granularity Language-Guided Multi-Object Tracking
Viaarxiv icon

VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

Add code
Apr 15, 2024
Figure 1 for VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection
Figure 2 for VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection
Figure 3 for VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection
Figure 4 for VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection
Viaarxiv icon

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

Add code
Apr 11, 2024
Figure 1 for Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Figure 2 for Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Figure 3 for Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Figure 4 for Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Viaarxiv icon

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

Add code
Mar 29, 2024
Figure 1 for SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Figure 2 for SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Figure 3 for SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Figure 4 for SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Viaarxiv icon

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

Add code
Mar 19, 2024
Figure 1 for CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Figure 2 for CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Figure 3 for CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Figure 4 for CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Viaarxiv icon

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Add code
Nov 27, 2023
Figure 1 for SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Figure 2 for SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Figure 3 for SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Figure 4 for SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Viaarxiv icon

CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

Add code
Sep 22, 2023
Figure 1 for CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Figure 2 for CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Figure 3 for CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Figure 4 for CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Viaarxiv icon

Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects

Add code
Sep 22, 2023
Figure 1 for Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Figure 2 for Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Figure 3 for Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Figure 4 for Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Viaarxiv icon

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Add code
Sep 09, 2023
Figure 1 for A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos
Figure 2 for A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos
Figure 3 for A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos
Figure 4 for A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos
Viaarxiv icon

DFormer: Diffusion-guided Transformer for Universal Image Segmentation

Add code
Jun 08, 2023
Figure 1 for DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Figure 2 for DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Figure 3 for DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Figure 4 for DFormer: Diffusion-guided Transformer for Universal Image Segmentation
Viaarxiv icon