Picture for Yangyi Chen

Yangyi Chen

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Add code
Dec 15, 2025
Figure 1 for Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Figure 2 for Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Figure 3 for Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Figure 4 for Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Viaarxiv icon

Perception-Aware Policy Optimization for Multimodal Reasoning

Add code
Jul 08, 2025
Figure 1 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 2 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 3 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 4 for Perception-Aware Policy Optimization for Multimodal Reasoning
Viaarxiv icon

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

Add code
May 13, 2025
Viaarxiv icon

SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering

Add code
Feb 10, 2025
Figure 1 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 2 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 3 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 4 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Viaarxiv icon

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Add code
Jan 08, 2025
Figure 1 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 2 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 3 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Figure 4 for OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Viaarxiv icon

Scaling Laws for Predicting Downstream Performance in LLMs

Add code
Oct 11, 2024
Figure 1 for Scaling Laws for Predicting Downstream Performance in LLMs
Figure 2 for Scaling Laws for Predicting Downstream Performance in LLMs
Figure 3 for Scaling Laws for Predicting Downstream Performance in LLMs
Figure 4 for Scaling Laws for Predicting Downstream Performance in LLMs
Viaarxiv icon

A Single Transformer for Scalable Vision-Language Modeling

Add code
Jul 08, 2024
Viaarxiv icon

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

Add code
May 31, 2024
Figure 1 for SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Figure 2 for SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Figure 3 for SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Figure 4 for SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Viaarxiv icon

Executable Code Actions Elicit Better LLM Agents

Add code
Feb 01, 2024
Figure 1 for Executable Code Actions Elicit Better LLM Agents
Figure 2 for Executable Code Actions Elicit Better LLM Agents
Figure 3 for Executable Code Actions Elicit Better LLM Agents
Figure 4 for Executable Code Actions Elicit Better LLM Agents
Viaarxiv icon

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation

Add code
Nov 22, 2023
Figure 1 for ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Figure 2 for ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Figure 3 for ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Figure 4 for ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Viaarxiv icon