Picture for Xiaofei Wang

Xiaofei Wang

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Figure 1 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 2 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 3 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 4 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms

Add code
May 29, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Figure 1 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 2 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 3 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Figure 4 for Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
Viaarxiv icon

Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice Modeling

Add code
May 15, 2025
Viaarxiv icon

GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Add code
Mar 26, 2025
Figure 1 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 2 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 3 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 4 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Viaarxiv icon

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

Add code
Mar 26, 2025
Figure 1 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 2 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 3 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Figure 4 for Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Viaarxiv icon

Joint Modelling Histology and Molecular Markers for Cancer Classification

Add code
Feb 11, 2025
Figure 1 for Joint Modelling Histology and Molecular Markers for Cancer Classification
Figure 2 for Joint Modelling Histology and Molecular Markers for Cancer Classification
Figure 3 for Joint Modelling Histology and Molecular Markers for Cancer Classification
Figure 4 for Joint Modelling Histology and Molecular Markers for Cancer Classification
Viaarxiv icon

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Add code
Feb 04, 2025
Figure 1 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 2 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 3 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 4 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Viaarxiv icon

Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Add code
Jan 28, 2025
Figure 1 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 2 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 3 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 4 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Viaarxiv icon