Picture for Chenyi Zhou

Chenyi Zhou

Embodied Science: Closing the Discovery Loop with Agentic Embodied AI

Add code
Mar 20, 2026
Viaarxiv icon

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Add code
Mar 18, 2026
Viaarxiv icon

IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

Add code
Mar 17, 2026
Viaarxiv icon

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Add code
Apr 18, 2024
Figure 1 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Figure 2 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Figure 3 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Figure 4 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Viaarxiv icon

Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

Add code
Apr 06, 2024
Figure 1 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 2 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 3 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Figure 4 for Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models
Viaarxiv icon