Picture for Liuyi Wang

Liuyi Wang

TACO: Towards Task-Consistent Open-Vocabulary Adaptation in Video Recognition

Add code
Jun 24, 2026
Viaarxiv icon

P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

Add code
May 19, 2026
Viaarxiv icon

GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding

Add code
Apr 02, 2026
Viaarxiv icon

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

Add code
Mar 15, 2026
Viaarxiv icon

RynnBrain: Open Embodied Foundation Models

Add code
Feb 13, 2026
Viaarxiv icon

VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation

Add code
Dec 22, 2025
Figure 1 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 2 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 3 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 4 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Viaarxiv icon

CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Add code
Dec 11, 2025
Viaarxiv icon

Temporal-Guided Visual Foundation Models for Event-Based Vision

Add code
Nov 09, 2025
Figure 1 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 2 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 3 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 4 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Viaarxiv icon

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Add code
Jul 17, 2025
Figure 1 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 2 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 3 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 4 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Viaarxiv icon

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Add code
Feb 03, 2025
Figure 1 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 2 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 3 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 4 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Viaarxiv icon