Picture for Mingrui Chen

Mingrui Chen

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Viaarxiv icon

HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling

Add code
May 27, 2025
Viaarxiv icon

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning

Add code
May 19, 2025
Viaarxiv icon

The Binary and Ternary Quantization Can Improve Feature Discrimination

Add code
Apr 18, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

Add code
May 22, 2024
Figure 1 for Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Figure 2 for Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Figure 3 for Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Figure 4 for Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Viaarxiv icon

Vision Transformer with Sparse Scan Prior

Add code
May 22, 2024
Figure 1 for Vision Transformer with Sparse Scan Prior
Figure 2 for Vision Transformer with Sparse Scan Prior
Figure 3 for Vision Transformer with Sparse Scan Prior
Figure 4 for Vision Transformer with Sparse Scan Prior
Viaarxiv icon

RMT: Retentive Networks Meet Vision Transformers

Add code
Sep 20, 2023
Viaarxiv icon

Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions

Add code
Aug 14, 2023
Viaarxiv icon