Picture for Liuyi Wang

Liuyi Wang

VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation

Add code
Dec 22, 2025
Viaarxiv icon

CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Add code
Dec 11, 2025
Viaarxiv icon

Temporal-Guided Visual Foundation Models for Event-Based Vision

Add code
Nov 09, 2025
Viaarxiv icon

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Add code
Jul 17, 2025
Figure 1 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 2 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 3 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 4 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Viaarxiv icon

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Add code
Feb 03, 2025
Figure 1 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 2 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 3 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 4 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Viaarxiv icon

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Add code
Jan 09, 2025
Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Viaarxiv icon

MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

Add code
Jun 25, 2024
Viaarxiv icon

Vision-and-Language Navigation via Causal Learning

Add code
Apr 16, 2024
Viaarxiv icon

Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation

Add code
Mar 06, 2024
Viaarxiv icon

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation

Add code
May 19, 2023
Figure 1 for PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
Figure 2 for PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
Figure 3 for PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
Figure 4 for PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
Viaarxiv icon