Picture for Yue Liao

Yue Liao

EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Add code
May 14, 2025
Viaarxiv icon

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Add code
Apr 22, 2025
Viaarxiv icon

Vision-to-Music Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

Add code
Mar 26, 2025
Figure 1 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 2 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 3 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 4 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Viaarxiv icon

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Add code
Mar 14, 2025
Figure 1 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 2 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 3 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 4 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Viaarxiv icon

Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression

Add code
Dec 22, 2024
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

Add code
Nov 25, 2024
Figure 1 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 2 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 3 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 4 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Figure 1 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 2 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 3 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 4 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Viaarxiv icon

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Add code
Oct 10, 2024
Figure 1 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 2 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 3 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 4 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Viaarxiv icon