Picture for Xiaoyu Li

Xiaoyu Li

Alphabetical order by last name

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Add code
Jan 08, 2026
Viaarxiv icon

UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?

Add code
Dec 30, 2025
Viaarxiv icon

Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception

Add code
Dec 29, 2025
Viaarxiv icon

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Add code
Oct 30, 2025
Figure 1 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 2 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 3 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 4 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Viaarxiv icon

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Add code
Aug 14, 2025
Figure 1 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 2 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 3 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 4 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Viaarxiv icon

Channel-Independent Federated Traffic Prediction

Add code
Aug 06, 2025
Figure 1 for Channel-Independent Federated Traffic Prediction
Figure 2 for Channel-Independent Federated Traffic Prediction
Figure 3 for Channel-Independent Federated Traffic Prediction
Figure 4 for Channel-Independent Federated Traffic Prediction
Viaarxiv icon

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Add code
Jul 30, 2025
Figure 1 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 2 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 3 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 4 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Viaarxiv icon

IC-Custom: Diverse Image Customization via In-Context Learning

Add code
Jul 02, 2025
Viaarxiv icon

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Add code
Jun 11, 2025
Figure 1 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 2 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 3 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 4 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Viaarxiv icon

Proactive Guidance of Multi-Turn Conversation in Industrial Search

Add code
May 30, 2025
Figure 1 for Proactive Guidance of Multi-Turn Conversation in Industrial Search
Figure 2 for Proactive Guidance of Multi-Turn Conversation in Industrial Search
Figure 3 for Proactive Guidance of Multi-Turn Conversation in Industrial Search
Figure 4 for Proactive Guidance of Multi-Turn Conversation in Industrial Search
Viaarxiv icon