Picture for Yuxuan Wang

Yuxuan Wang

Sherman

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Add code
May 19, 2025
Viaarxiv icon

JAEGER: Dual-Level Humanoid Whole-Body Controller

Add code
May 10, 2025
Viaarxiv icon

Probing and Inducing Combinational Creativity in Vision-Language Models

Add code
Apr 17, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Viaarxiv icon

OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Add code
Mar 31, 2025
Viaarxiv icon

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Add code
Mar 29, 2025
Viaarxiv icon

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

Add code
Mar 26, 2025
Viaarxiv icon

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

Add code
Mar 19, 2025
Figure 1 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 2 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 3 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 4 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Viaarxiv icon

A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control

Add code
Mar 18, 2025
Viaarxiv icon

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Add code
Mar 14, 2025
Viaarxiv icon