Picture for Yan Zhou

Yan Zhou

Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

Can Multimodal Large Language Models Understand Spatial Relations?

Add code
May 25, 2025
Viaarxiv icon

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Add code
May 05, 2025
Figure 1 for LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Figure 2 for LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Figure 3 for LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Figure 4 for LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Viaarxiv icon

SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

Add code
Mar 26, 2025
Figure 1 for SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain
Figure 2 for SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain
Figure 3 for SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain
Figure 4 for SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain
Viaarxiv icon

Accessing the Effect of Phyllotaxy and Planting Density on Light Use Efficiency in Field-Grown Maize using 3D Reconstructions

Add code
Mar 10, 2025
Figure 1 for Accessing the Effect of Phyllotaxy and Planting Density on Light Use Efficiency in Field-Grown Maize using 3D Reconstructions
Figure 2 for Accessing the Effect of Phyllotaxy and Planting Density on Light Use Efficiency in Field-Grown Maize using 3D Reconstructions
Figure 3 for Accessing the Effect of Phyllotaxy and Planting Density on Light Use Efficiency in Field-Grown Maize using 3D Reconstructions
Figure 4 for Accessing the Effect of Phyllotaxy and Planting Density on Light Use Efficiency in Field-Grown Maize using 3D Reconstructions
Viaarxiv icon

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis

Add code
Mar 09, 2025
Figure 1 for ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Figure 2 for ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Figure 3 for ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Figure 4 for ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Viaarxiv icon

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

Add code
Nov 25, 2024
Figure 1 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 2 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 3 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Figure 4 for BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Viaarxiv icon

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding

Add code
Oct 29, 2024
Figure 1 for MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Figure 2 for MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Figure 3 for MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Figure 4 for MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Viaarxiv icon

Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification

Add code
Oct 22, 2024
Figure 1 for Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification
Figure 2 for Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification
Figure 3 for Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification
Figure 4 for Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification
Viaarxiv icon

LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting

Add code
Oct 20, 2024
Figure 1 for LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting
Figure 2 for LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting
Figure 3 for LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting
Figure 4 for LTPNet Integration of Deep Learning and Environmental Decision Support Systems for Renewable Energy Demand Forecasting
Viaarxiv icon

Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction

Add code
Sep 25, 2024
Figure 1 for Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Figure 2 for Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Figure 3 for Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Figure 4 for Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Viaarxiv icon