Picture for Xiaohai Tian

Xiaohai Tian

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

Add code
Mar 26, 2025
Viaarxiv icon

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

Add code
Mar 19, 2025
Figure 1 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 2 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 3 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Figure 4 for Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Viaarxiv icon

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

Add code
Nov 27, 2024
Figure 1 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 2 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 3 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 4 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Figure 1 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 2 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 3 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 4 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Viaarxiv icon

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Add code
Jun 19, 2024
Figure 1 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 2 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 3 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 4 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Viaarxiv icon

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Add code
Jan 22, 2024
Figure 1 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 2 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 3 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Figure 4 for CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Viaarxiv icon

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Add code
May 19, 2023
Viaarxiv icon

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

Add code
Mar 13, 2023
Figure 1 for Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Figure 2 for Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Figure 3 for Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Figure 4 for Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Viaarxiv icon

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

Add code
Mar 13, 2023
Figure 1 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 2 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 3 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 4 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Viaarxiv icon

TTS-Guided Training for Accent Conversion Without Parallel Data

Add code
Dec 20, 2022
Viaarxiv icon