Picture for Junjie Zheng

Junjie Zheng

Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

Add code
Apr 15, 2026
Viaarxiv icon

YingMusic-Singer: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Add code
Mar 25, 2026
Viaarxiv icon

R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion

Add code
Oct 23, 2025
Viaarxiv icon

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing

Add code
May 22, 2025
Viaarxiv icon

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

Add code
Apr 30, 2025
Viaarxiv icon

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Add code
Mar 31, 2025
Viaarxiv icon

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Add code
Mar 28, 2025
Figure 1 for DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Figure 2 for DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Figure 3 for DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Figure 4 for DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

Add code
Aug 01, 2024
Figure 1 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Figure 2 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Figure 3 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Figure 4 for Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Viaarxiv icon

A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data

Add code
May 25, 2016
Figure 1 for A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data
Figure 2 for A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data
Figure 3 for A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data
Figure 4 for A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data
Viaarxiv icon