Picture for Zhiding Yu

Zhiding Yu

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Add code
May 27, 2026
Viaarxiv icon

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Add code
Apr 27, 2026
Viaarxiv icon

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

Add code
Mar 19, 2026
Viaarxiv icon

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Add code
Mar 05, 2026
Viaarxiv icon

Stateful Token Reduction for Long-Video Hybrid VLMs

Add code
Feb 27, 2026
Viaarxiv icon

PhyCritic: Multimodal Critic Models for Physical AI

Add code
Feb 11, 2026
Viaarxiv icon

Nemotron ColEmbed V2: Top-Performing Late Interaction embedding models for Visual Document Retrieval

Add code
Feb 03, 2026
Viaarxiv icon

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Add code
Jan 14, 2026
Viaarxiv icon

NVIDIA Nemotron Nano V2 VL

Add code
Nov 07, 2025
Viaarxiv icon