Picture for Minghan Li

Minghan Li

TrackVLA: Embodied Visual Tracking in the Wild

Add code
May 29, 2025
Viaarxiv icon

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Add code
Mar 18, 2025
Viaarxiv icon

FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Add code
Mar 17, 2025
Viaarxiv icon

Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models

Add code
Jan 28, 2025
Viaarxiv icon

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Add code
Dec 09, 2024
Viaarxiv icon

KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models

Add code
Nov 09, 2024
Figure 1 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 2 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 3 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Figure 4 for KeyB2: Selecting Key Blocks is Also Important for Long Document Ranking with Large Language Models
Viaarxiv icon

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Add code
Oct 19, 2024
Viaarxiv icon

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Add code
Aug 21, 2024
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Figure 1 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 2 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 3 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 4 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Viaarxiv icon

Unifying Multimodal Retrieval via Document Screenshot Embedding

Add code
Jun 17, 2024
Viaarxiv icon