Picture for Yazhou Yao

Yazhou Yao

Taming SAM3 in the Wild: A Concept Bank for Open-Vocabulary Segmentation

Add code
Feb 06, 2026
Viaarxiv icon

Combating Noisy Labels through Fostering Self- and Neighbor-Consistency

Add code
Jan 19, 2026
Viaarxiv icon

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Add code
Jan 06, 2026
Viaarxiv icon

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

Add code
Apr 14, 2025
Viaarxiv icon

Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Add code
Mar 30, 2025
Figure 1 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 2 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 3 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 4 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Viaarxiv icon

Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels

Add code
Feb 27, 2025
Figure 1 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 2 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 3 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 4 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Viaarxiv icon

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Add code
Dec 25, 2024
Figure 1 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 2 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 3 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 4 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Viaarxiv icon

The Key of Understanding Vision Tasks: Explanatory Instructions

Add code
Dec 24, 2024
Figure 1 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 2 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 3 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 4 for The Key of Understanding Vision Tasks: Explanatory Instructions
Viaarxiv icon

FTMoMamba: Motion Generation with Frequency and Text State Space Models

Add code
Nov 26, 2024
Viaarxiv icon

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

Add code
Nov 25, 2024
Figure 1 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 2 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 3 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 4 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Viaarxiv icon