Picture for Xiyang Wu

Xiyang Wu

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Add code
Mar 10, 2026
Viaarxiv icon

First Frame Is the Place to Go for Video Content Customization

Add code
Nov 19, 2025
Viaarxiv icon

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

Add code
Jun 18, 2025
Viaarxiv icon

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

Add code
May 02, 2025
Viaarxiv icon

Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey

Add code
Jan 04, 2025
Figure 1 for Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey
Figure 2 for Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey
Figure 3 for Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey
Figure 4 for Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey
Viaarxiv icon

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Add code
Sep 26, 2024
Figure 1 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 2 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 3 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 4 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Viaarxiv icon

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Add code
Jun 16, 2024
Figure 1 for AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Figure 2 for AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Figure 3 for AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Figure 4 for AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Viaarxiv icon

AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

Add code
Apr 04, 2024
Figure 1 for AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales
Figure 2 for AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales
Figure 3 for AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales
Figure 4 for AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales
Viaarxiv icon

On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities

Add code
Feb 24, 2024
Figure 1 for On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities
Figure 2 for On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities
Figure 3 for On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities
Figure 4 for On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities
Viaarxiv icon

LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

Add code
Sep 30, 2023
Figure 1 for LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments
Figure 2 for LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments
Figure 3 for LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments
Figure 4 for LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments
Viaarxiv icon