Picture for Yuchong Sun

Yuchong Sun

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Add code
May 28, 2026
Viaarxiv icon

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Add code
May 26, 2026
Viaarxiv icon

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Add code
Mar 18, 2026
Viaarxiv icon

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Add code
Dec 28, 2025
Viaarxiv icon

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation

Add code
Dec 14, 2025
Viaarxiv icon

BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain

Add code
Sep 30, 2024
Figure 1 for BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain
Figure 2 for BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain
Figure 3 for BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain
Figure 4 for BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain
Viaarxiv icon

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions

Add code
Oct 11, 2023
Figure 1 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 2 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 3 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 4 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Viaarxiv icon

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Add code
Aug 22, 2023
Figure 1 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 2 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 3 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 4 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Viaarxiv icon

Translating Text Synopses to Video Storyboards

Add code
Dec 31, 2022
Figure 1 for Translating Text Synopses to Video Storyboards
Figure 2 for Translating Text Synopses to Video Storyboards
Figure 3 for Translating Text Synopses to Video Storyboards
Figure 4 for Translating Text Synopses to Video Storyboards
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Add code
Oct 12, 2022
Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon