Picture for Vibhav Vineet

Vibhav Vineet

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Add code
Dec 11, 2025
Viaarxiv icon

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Add code
Oct 02, 2025
Figure 1 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 2 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 3 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 4 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Viaarxiv icon

What MLLMs Learn about When they Learn about Multimodal Reasoning: Perception, Reasoning, or their Integration?

Add code
Oct 02, 2025
Viaarxiv icon

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames

Add code
May 30, 2025
Figure 1 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 2 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 3 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 4 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Viaarxiv icon

Grounding Task Assistance with Multimodal Cues from a Single Demonstration

Add code
May 02, 2025
Figure 1 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 2 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 3 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 4 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Viaarxiv icon

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Add code
May 02, 2025
Figure 1 for TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Figure 2 for TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Figure 3 for TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Figure 4 for TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Viaarxiv icon

Phi-4-reasoning Technical Report

Add code
Apr 30, 2025
Viaarxiv icon

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning

Add code
Apr 08, 2025
Viaarxiv icon

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Add code
Mar 31, 2025
Viaarxiv icon

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Add code
Mar 11, 2025
Viaarxiv icon