Picture for Bin Wen

Bin Wen

RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation

Add code
Jun 07, 2025
Viaarxiv icon

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Add code
May 27, 2025
Viaarxiv icon

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Add code
May 05, 2025
Viaarxiv icon

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform

Add code
Apr 21, 2025
Viaarxiv icon

InstructEngine: Instruction-driven Text-to-Image Alignment

Add code
Apr 14, 2025
Viaarxiv icon

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Add code
Apr 09, 2025
Viaarxiv icon

Wan: Open and Advanced Large-Scale Video Generative Models

Add code
Mar 26, 2025
Viaarxiv icon

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs

Add code
Mar 13, 2025
Viaarxiv icon

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

RecipeGen: A Benchmark for Real-World Recipe Image Generation

Add code
Mar 07, 2025
Viaarxiv icon