Picture for Yu-Chiang Frank Wang

Yu-Chiang Frank Wang

Autoregressive Universal Video Segmentation Model

Add code
Aug 26, 2025
Viaarxiv icon

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Add code
Aug 25, 2025
Viaarxiv icon

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Add code
Jul 22, 2025
Viaarxiv icon

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Viaarxiv icon

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Add code
Jun 18, 2025
Viaarxiv icon

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Add code
Jun 13, 2025
Viaarxiv icon

Universal Speech Enhancement with Regression and Generative Mamba

Add code
May 27, 2025
Viaarxiv icon

UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing

Add code
May 14, 2025
Viaarxiv icon

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Add code
Mar 27, 2025
Viaarxiv icon

Segment Anything, Even Occluded

Add code
Mar 08, 2025
Viaarxiv icon