Picture for Bei Chen

Bei Chen

A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Add code
Oct 09, 2025
Figure 1 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 2 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 3 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Figure 4 for A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Viaarxiv icon

Understanding DeepResearch via Reports

Add code
Oct 09, 2025
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Figure 1 for Generative Frame Sampler for Long Video Understanding
Figure 2 for Generative Frame Sampler for Long Video Understanding
Figure 3 for Generative Frame Sampler for Long Video Understanding
Figure 4 for Generative Frame Sampler for Long Video Understanding
Viaarxiv icon

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

Add code
Mar 10, 2025
Viaarxiv icon

Aria-UI: Visual Grounding for GUI Instructions

Add code
Dec 20, 2024
Viaarxiv icon

Yi-Lightning Technical Report

Add code
Dec 03, 2024
Figure 1 for Yi-Lightning Technical Report
Figure 2 for Yi-Lightning Technical Report
Figure 3 for Yi-Lightning Technical Report
Figure 4 for Yi-Lightning Technical Report
Viaarxiv icon

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

Add code
Oct 16, 2024
Figure 1 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 2 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 3 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Figure 4 for HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Viaarxiv icon

Aria: An Open Multimodal Native Mixture-of-Experts Model

Add code
Oct 08, 2024
Figure 1 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 2 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 3 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Figure 4 for Aria: An Open Multimodal Native Mixture-of-Experts Model
Viaarxiv icon

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Add code
Jul 22, 2024
Figure 1 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 2 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 3 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Figure 4 for LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon