Picture for Shiliang Zhang

Shiliang Zhang

Efficient Multi-modal Long Context Learning for Training-free Adaptation

Add code
May 26, 2025
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

Adaptive Fault-tolerant Control of Underwater Vehicles with Thruster Failures

Add code
Apr 22, 2025
Viaarxiv icon

OmniAudio: Generating Spatial Audio from 360-Degree Video

Add code
Apr 21, 2025
Viaarxiv icon

Evolved Hierarchical Masking for Self-Supervised Learning

Add code
Apr 12, 2025
Viaarxiv icon

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval

Add code
Apr 10, 2025
Viaarxiv icon

Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation

Add code
Feb 07, 2025
Viaarxiv icon

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation

Add code
Jan 11, 2025
Viaarxiv icon

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Figure 1 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 2 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 3 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 4 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Viaarxiv icon

Hardware-in-the-loop Simulation Testbed for Geomagnetic Navigation

Add code
Dec 16, 2024
Viaarxiv icon