Alert button
Picture for Changhe Song

Changhe Song

Alert button

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Bookmark button
Alert button
Mar 29, 2024
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

Figure 1 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 2 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 3 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Figure 4 for 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Viaarxiv icon

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Add code
Bookmark button
Alert button
Sep 04, 2023
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Figure 1 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 2 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 3 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Figure 4 for Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Viaarxiv icon

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Add code
Bookmark button
Alert button
Sep 04, 2023
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Figure 1 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 2 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 3 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 4 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Viaarxiv icon

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Add code
Bookmark button
Alert button
Aug 31, 2023
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

Figure 1 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Figure 2 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Figure 3 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Figure 4 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Viaarxiv icon

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Add code
Bookmark button
Alert button
Aug 19, 2022
Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

Figure 1 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 2 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 3 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Figure 4 for Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Viaarxiv icon

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

Add code
Bookmark button
Alert button
Apr 03, 2022
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng

Figure 1 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 2 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 3 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Figure 4 for Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Viaarxiv icon

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

Add code
Bookmark button
Alert button
Mar 31, 2022
Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

Figure 1 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 2 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 3 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Figure 4 for An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer
Viaarxiv icon

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

Add code
Bookmark button
Alert button
Mar 31, 2022
Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng

Figure 1 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 2 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 3 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Figure 4 for A Character-level Span-based Model for Mandarin Prosodic Structure Prediction
Viaarxiv icon

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Add code
Bookmark button
Alert button
Mar 24, 2022
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

Figure 1 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 2 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 3 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Figure 4 for Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Viaarxiv icon