Picture for Jiale Cao

Jiale Cao

SNNSIR: A Simple Spiking Neural Network for Stereo Image Restoration

Add code
Aug 17, 2025
Viaarxiv icon

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

Add code
Aug 12, 2025
Viaarxiv icon

OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning

Add code
May 22, 2025
Viaarxiv icon

SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection

Add code
Apr 07, 2025
Figure 1 for SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
Figure 2 for SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
Figure 3 for SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
Figure 4 for SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
Viaarxiv icon

CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation

Add code
Nov 21, 2024
Figure 1 for CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Figure 2 for CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Figure 3 for CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Figure 4 for CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Viaarxiv icon

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Add code
Nov 07, 2024
Figure 1 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 2 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 3 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 4 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Viaarxiv icon

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

Add code
Oct 05, 2024
Viaarxiv icon

iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

Add code
Sep 05, 2024
Figure 1 for iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Figure 2 for iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Figure 3 for iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Figure 4 for iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Viaarxiv icon

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

Add code
Jul 24, 2024
Figure 1 for Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective
Figure 2 for Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective
Figure 3 for Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective
Figure 4 for Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective
Viaarxiv icon

Multi-Granularity Language-Guided Multi-Object Tracking

Add code
Jun 07, 2024
Figure 1 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 2 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 3 for Multi-Granularity Language-Guided Multi-Object Tracking
Figure 4 for Multi-Granularity Language-Guided Multi-Object Tracking
Viaarxiv icon