Picture for Hongxu Yin

Hongxu Yin

Celine

Grounded 3D-Aware Spatial Vision-Language Modeling

Add code
May 28, 2026
Viaarxiv icon

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

Add code
May 26, 2026
Viaarxiv icon

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Add code
Apr 27, 2026
Viaarxiv icon

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Add code
Apr 23, 2026
Viaarxiv icon

UAV-DETR: DETR for Anti-Drone Target Detection

Add code
Mar 24, 2026
Viaarxiv icon

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

Add code
Mar 12, 2026
Viaarxiv icon

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

Add code
Dec 11, 2025
Figure 1 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 2 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 3 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Figure 4 for FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
Viaarxiv icon

NVIDIA Nemotron Nano V2 VL

Add code
Nov 07, 2025
Viaarxiv icon

3D Aware Region Prompted Vision Language Model

Add code
Sep 16, 2025
Figure 1 for 3D Aware Region Prompted Vision Language Model
Figure 2 for 3D Aware Region Prompted Vision Language Model
Figure 3 for 3D Aware Region Prompted Vision Language Model
Figure 4 for 3D Aware Region Prompted Vision Language Model
Viaarxiv icon

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Add code
Aug 25, 2025
Figure 1 for Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations
Figure 2 for Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations
Figure 3 for Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations
Figure 4 for Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations
Viaarxiv icon