Picture for Bingxin Li

Bingxin Li

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Figure 1 for Step-Audio 2 Technical Report
Figure 2 for Step-Audio 2 Technical Report
Figure 3 for Step-Audio 2 Technical Report
Figure 4 for Step-Audio 2 Technical Report
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Figure 1 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 2 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 3 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 4 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Viaarxiv icon

SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering

Add code
Apr 01, 2025
Figure 1 for SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
Figure 2 for SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
Figure 3 for SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
Figure 4 for SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon