Picture for Zixin Zhang

Zixin Zhang

HKUST

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Add code
Oct 29, 2025
Viaarxiv icon

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Add code
Oct 10, 2025
Viaarxiv icon

Sample-Efficient Online Control Policy Learning with Real-Time Recursive Model Updates

Add code
Sep 10, 2025
Viaarxiv icon

Step-Audio 2 Technical Report

Add code
Jul 24, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Figure 1 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 2 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 3 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Figure 4 for Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Viaarxiv icon

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Add code
May 23, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

Comateformer: Combined Attention Transformer for Semantic Sentence Matching

Add code
Dec 10, 2024
Figure 1 for Comateformer: Combined Attention Transformer for Semantic Sentence Matching
Figure 2 for Comateformer: Combined Attention Transformer for Semantic Sentence Matching
Figure 3 for Comateformer: Combined Attention Transformer for Semantic Sentence Matching
Figure 4 for Comateformer: Combined Attention Transformer for Semantic Sentence Matching
Viaarxiv icon

Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots

Add code
Sep 17, 2024
Figure 1 for Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots
Figure 2 for Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots
Figure 3 for Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots
Figure 4 for Robots with Attitude: Singularity-Free Quaternion-Based Model-Predictive Control for Agile Legged Robots
Viaarxiv icon

PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs

Add code
Aug 10, 2023
Figure 1 for PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
Figure 2 for PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
Figure 3 for PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
Figure 4 for PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
Viaarxiv icon