Picture for Bei Chen

Bei Chen

A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Add code
Oct 09, 2025
Figure 1 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 2 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 3 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 4 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Viaarxiv icon

Understanding DeepResearch via Reports

Add code
Oct 09, 2025
Figure 1 for Understanding DeepResearch via Reports
Figure 2 for Understanding DeepResearch via Reports
Figure 3 for Understanding DeepResearch via Reports
Figure 4 for Understanding DeepResearch via Reports
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Figure 1 for Generative Frame Sampler for Long Video Understanding
Figure 2 for Generative Frame Sampler for Long Video Understanding
Figure 3 for Generative Frame Sampler for Long Video Understanding
Figure 4 for Generative Frame Sampler for Long Video Understanding
Viaarxiv icon

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

Add code
Mar 10, 2025
Viaarxiv icon

Aria-UI: Visual Grounding for GUI Instructions

Add code
Dec 20, 2024
Figure 1 for Aria-UI: Visual Grounding for GUI Instructions
Figure 2 for Aria-UI: Visual Grounding for GUI Instructions
Figure 3 for Aria-UI: Visual Grounding for GUI Instructions
Figure 4 for Aria-UI: Visual Grounding for GUI Instructions
Viaarxiv icon

Yi-Lightning Technical Report

Add code
Dec 03, 2024
Figure 1 for Yi-Lightning Technical Report
Figure 2 for Yi-Lightning Technical Report
Figure 3 for Yi-Lightning Technical Report
Figure 4 for Yi-Lightning Technical Report
Viaarxiv icon

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

Add code
Oct 16, 2024
Figure 1 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 2 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 3 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 4 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Viaarxiv icon

Aria: An Open Multimodal Native Mixture-of-Experts Model

Add code
Oct 08, 2024
Figure 1 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 2 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 3 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 4 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Viaarxiv icon

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Add code
Jul 22, 2024
Figure 1 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 2 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 3 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 4 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon