Picture for Zhengyuan Yang

Zhengyuan Yang

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation

Add code
May 29, 2025
Viaarxiv icon

Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

Add code
May 26, 2025
Figure 1 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 2 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 3 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 4 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Viaarxiv icon

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Add code
May 13, 2025
Figure 1 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 2 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 3 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 4 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Viaarxiv icon

SITE: towards Spatial Intelligence Thorough Evaluation

Add code
May 08, 2025
Viaarxiv icon

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Add code
Apr 24, 2025
Viaarxiv icon

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Add code
Apr 10, 2025
Figure 1 for SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Figure 2 for SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Figure 3 for SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Figure 4 for SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Viaarxiv icon

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Measurement of LLM's Philosophies of Human Nature

Add code
Apr 03, 2025
Viaarxiv icon

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Add code
Mar 26, 2025
Viaarxiv icon

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

Add code
Mar 26, 2025
Figure 1 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 2 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 3 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 4 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Viaarxiv icon