Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

An Empirical Study of Federated Prompt Learning for Vision Language Model

Add code
May 29, 2025
Viaarxiv icon

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Add code
May 29, 2025
Figure 1 for ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Figure 2 for ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Figure 3 for ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Figure 4 for ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Viaarxiv icon

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

Add code
May 22, 2025
Figure 1 for O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
Figure 2 for O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
Figure 3 for O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
Figure 4 for O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
Viaarxiv icon

Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence

Add code
May 11, 2025
Figure 1 for Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Figure 2 for Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Figure 3 for Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Figure 4 for Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Viaarxiv icon

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

Add code
May 10, 2025
Viaarxiv icon

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

Add code
Apr 30, 2025
Viaarxiv icon

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Add code
Apr 22, 2025
Figure 1 for TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Figure 2 for TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Figure 3 for TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Figure 4 for TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Viaarxiv icon

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

Add code
Apr 16, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon