speech


Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Add code
Sep 26, 2025
Viaarxiv icon

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis

Add code
Sep 26, 2025
Viaarxiv icon

Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?

Add code
Sep 26, 2025
Figure 1 for Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Figure 2 for Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Figure 3 for Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Figure 4 for Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Viaarxiv icon

FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction

Add code
Sep 26, 2025
Viaarxiv icon

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Add code
Sep 26, 2025
Viaarxiv icon

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

Add code
Sep 26, 2025
Figure 1 for Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
Figure 2 for Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
Figure 3 for Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
Figure 4 for Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
Viaarxiv icon

EgoInstruct: An Egocentric Video Dataset of Face-to-face Instructional Interactions with Multi-modal LLM Benchmarking

Add code
Sep 26, 2025
Viaarxiv icon

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Add code
Sep 26, 2025
Viaarxiv icon

MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark

Add code
Sep 26, 2025
Viaarxiv icon

SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation

Add code
Sep 26, 2025
Viaarxiv icon