Picture for Yuchen Duan

Yuchen Duan

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

Add code
Mar 12, 2026
Viaarxiv icon

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

Needle In A Multimodal Haystack

Add code
Jun 11, 2024
Viaarxiv icon

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Add code
Mar 07, 2024
Viaarxiv icon

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

Add code
Jun 22, 2023
Figure 1 for Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Figure 2 for Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Figure 3 for Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Figure 4 for Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
Viaarxiv icon

Vision Transformer Adapter for Dense Predictions

Add code
May 18, 2022
Figure 1 for Vision Transformer Adapter for Dense Predictions
Figure 2 for Vision Transformer Adapter for Dense Predictions
Figure 3 for Vision Transformer Adapter for Dense Predictions
Figure 4 for Vision Transformer Adapter for Dense Predictions
Viaarxiv icon