Picture for Zhiding Yu

Zhiding Yu

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding

Add code
Jul 17, 2025
Viaarxiv icon

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Add code
May 29, 2025
Viaarxiv icon

Nemotron-Research-Tool-N1: Tool-Using Language Models with Reinforced Reasoning

Add code
Apr 25, 2025
Viaarxiv icon

FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding

Add code
Apr 24, 2025
Viaarxiv icon

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Add code
Apr 21, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Viaarxiv icon

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Add code
Apr 06, 2025
Figure 1 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 2 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 3 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 4 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Viaarxiv icon

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Add code
Mar 18, 2025
Viaarxiv icon

Hydra-MDP++: Advancing End-to-End Driving via Expert-Guided Hydra-Distillation

Add code
Mar 17, 2025
Viaarxiv icon