Picture for Yingjian Zhu

Yingjian Zhu

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering

Add code
Apr 07, 2026
Viaarxiv icon

SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation

Add code
Mar 02, 2026
Viaarxiv icon

Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles

Add code
May 22, 2025
Viaarxiv icon