Picture for Sung-Feng Huang

Sung-Feng Huang

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Add code
Apr 19, 2026
Viaarxiv icon

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Add code
Apr 06, 2026
Viaarxiv icon

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Add code
Mar 19, 2026
Viaarxiv icon

How Does Instrumental Music Help SingFake Detection?

Add code
Sep 18, 2025
Viaarxiv icon

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Figure 1 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 2 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 3 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 4 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Viaarxiv icon

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Add code
Jan 07, 2025
Figure 1 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 2 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 3 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Figure 4 for Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Viaarxiv icon

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Add code
Sep 25, 2024
Figure 1 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 2 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 3 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Figure 4 for Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Viaarxiv icon

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Add code
Jan 23, 2024
Viaarxiv icon

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Add code
Mar 21, 2023
Viaarxiv icon

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

Add code
Jun 27, 2022
Figure 1 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 2 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 3 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Figure 4 for Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Viaarxiv icon