Picture for Yifan Yang

Yifan Yang

SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations

Add code
Oct 29, 2025
Viaarxiv icon

Towards Responsible Evaluation for Text-to-Speech

Add code
Oct 08, 2025
Figure 1 for Towards Responsible Evaluation for Text-to-Speech
Figure 2 for Towards Responsible Evaluation for Text-to-Speech
Figure 3 for Towards Responsible Evaluation for Text-to-Speech
Figure 4 for Towards Responsible Evaluation for Text-to-Speech
Viaarxiv icon

Diffusion^2: Turning 3D Environments into Radio Frequency Heatmaps

Add code
Oct 02, 2025
Viaarxiv icon

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Add code
Oct 02, 2025
Viaarxiv icon

InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios

Add code
Sep 26, 2025
Figure 1 for InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios
Figure 2 for InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios
Figure 3 for InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios
Figure 4 for InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios
Viaarxiv icon

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

Add code
Sep 10, 2025
Viaarxiv icon

FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction

Add code
Sep 04, 2025
Figure 1 for FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Figure 2 for FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Figure 3 for FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Figure 4 for FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Viaarxiv icon

P/D-Device: Disaggregated Large Language Model between Cloud and Devices

Add code
Aug 12, 2025
Viaarxiv icon

Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos

Add code
Aug 12, 2025
Figure 1 for Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos
Figure 2 for Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos
Figure 3 for Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos
Viaarxiv icon

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Add code
Jul 31, 2025
Figure 1 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 2 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 3 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 4 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Viaarxiv icon