Picture for Yazhou Yao

Yazhou Yao

Combating Noisy Labels through Fostering Self- and Neighbor-Consistency

Add code
Jan 19, 2026
Viaarxiv icon

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Add code
Jan 06, 2026
Viaarxiv icon

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

Add code
Apr 14, 2025
Viaarxiv icon

Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Add code
Mar 30, 2025
Figure 1 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 2 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 3 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Figure 4 for Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Viaarxiv icon

Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels

Add code
Feb 27, 2025
Figure 1 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 2 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 3 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Figure 4 for Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels
Viaarxiv icon

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Add code
Dec 25, 2024
Figure 1 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 2 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 3 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 4 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Viaarxiv icon

The Key of Understanding Vision Tasks: Explanatory Instructions

Add code
Dec 24, 2024
Figure 1 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 2 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 3 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 4 for The Key of Understanding Vision Tasks: Explanatory Instructions
Viaarxiv icon

FTMoMamba: Motion Generation with Frequency and Text State Space Models

Add code
Nov 26, 2024
Viaarxiv icon

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

Add code
Nov 25, 2024
Figure 1 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 2 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 3 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Figure 4 for UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Viaarxiv icon

COMOGen: A Controllable Text-to-3D Multi-object Generation Framework

Add code
Sep 01, 2024
Figure 1 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 2 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 3 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Figure 4 for COMOGen: A Controllable Text-to-3D Multi-object Generation Framework
Viaarxiv icon