Picture for Fei Xia

Fei Xia

Google DeepMind

Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization

Add code
Jun 06, 2025
Viaarxiv icon

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Add code
May 15, 2025
Viaarxiv icon

Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models

Add code
Apr 17, 2025
Viaarxiv icon

A Scoping Review of Natural Language Processing in Addressing Medically Inaccurate Information: Errors, Misinformation, and Hallucination

Add code
Apr 16, 2025
Viaarxiv icon

Gemini Robotics: Bringing AI into the Physical World

Add code
Mar 25, 2025
Viaarxiv icon

MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes

Add code
Dec 26, 2024
Viaarxiv icon

Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting

Add code
Dec 10, 2024
Viaarxiv icon

SIESEF-FusionNet: Spatial Inter-correlation Enhancement and Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic Segmentation

Add code
Nov 11, 2024
Viaarxiv icon

Vision Language Models are In-Context Value Learners

Add code
Nov 07, 2024
Figure 1 for Vision Language Models are In-Context Value Learners
Figure 2 for Vision Language Models are In-Context Value Learners
Figure 3 for Vision Language Models are In-Context Value Learners
Figure 4 for Vision Language Models are In-Context Value Learners
Viaarxiv icon

AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool

Add code
Nov 06, 2024
Viaarxiv icon