Alert button
Picture for Haizhou Li

Haizhou Li

Alert button

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Add code
Bookmark button
Alert button
Sep 05, 2022
Jiadong Wang, Xinyuan Qian, Haizhou Li

Figure 1 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 2 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 3 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 4 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Viaarxiv icon

Speech Synthesis with Mixed Emotions

Add code
Bookmark button
Alert button
Aug 11, 2022
Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li

Figure 1 for Speech Synthesis with Mixed Emotions
Figure 2 for Speech Synthesis with Mixed Emotions
Figure 3 for Speech Synthesis with Mixed Emotions
Figure 4 for Speech Synthesis with Mixed Emotions
Viaarxiv icon

PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music

Add code
Bookmark button
Alert button
Jul 15, 2022
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Figure 1 for PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music
Figure 2 for PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music
Figure 3 for PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music
Figure 4 for PoLyScribers: Joint Training of Vocal Extractor and Lyrics Transcriber for Polyphonic Music
Viaarxiv icon

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

Add code
Bookmark button
Alert button
Jun 15, 2022
Rui Liu, Berrak Sisman, Björn Schuller, Guanglai Gao, Haizhou Li

Figure 1 for Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Figure 2 for Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Figure 3 for Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Figure 4 for Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Viaarxiv icon

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

Add code
Bookmark button
Alert button
May 09, 2022
Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li

Figure 1 for M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Figure 2 for M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Figure 3 for M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Figure 4 for M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Viaarxiv icon

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Add code
Bookmark button
Alert button
Apr 07, 2022
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Figure 1 for Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Figure 2 for Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Figure 3 for Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Figure 4 for Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Viaarxiv icon

Music-robust Automatic Lyrics Transcription of Polyphonic Music

Add code
Bookmark button
Alert button
Apr 07, 2022
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Figure 1 for Music-robust Automatic Lyrics Transcription of Polyphonic Music
Figure 2 for Music-robust Automatic Lyrics Transcription of Polyphonic Music
Figure 3 for Music-robust Automatic Lyrics Transcription of Polyphonic Music
Figure 4 for Music-robust Automatic Lyrics Transcription of Polyphonic Music
Viaarxiv icon

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Add code
Bookmark button
Alert button
Mar 31, 2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei

Figure 1 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 2 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 3 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 4 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Viaarxiv icon

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

Add code
Bookmark button
Alert button
Mar 31, 2022
Zexu Pan, Meng Ge, Haizhou Li

Figure 1 for A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Figure 2 for A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Figure 3 for A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Figure 4 for A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Viaarxiv icon

Speaker Extraction with Co-Speech Gestures Cue

Add code
Bookmark button
Alert button
Mar 31, 2022
Zexu Pan, Xinyuan Qian, Haizhou Li

Figure 1 for Speaker Extraction with Co-Speech Gestures Cue
Figure 2 for Speaker Extraction with Co-Speech Gestures Cue
Figure 3 for Speaker Extraction with Co-Speech Gestures Cue
Viaarxiv icon