Picture for Koustuv Sinha

Koustuv Sinha

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Add code
Jun 11, 2025
Viaarxiv icon

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models

Add code
Jun 11, 2025
Viaarxiv icon

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

Add code
Jun 11, 2025
Viaarxiv icon

Multi-Modal Language Models as Text-to-Image Model Evaluators

Add code
May 01, 2025
Viaarxiv icon

Scaling Language-Free Visual Representation Learning

Add code
Apr 01, 2025
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Viaarxiv icon

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Add code
Oct 04, 2024
Figure 1 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 2 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 3 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 4 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Viaarxiv icon

Efficient Tool Use with Chain-of-Abstraction Reasoning

Add code
Jan 30, 2024
Figure 1 for Efficient Tool Use with Chain-of-Abstraction Reasoning
Figure 2 for Efficient Tool Use with Chain-of-Abstraction Reasoning
Figure 3 for Efficient Tool Use with Chain-of-Abstraction Reasoning
Figure 4 for Efficient Tool Use with Chain-of-Abstraction Reasoning
Viaarxiv icon

The ART of LLM Refinement: Ask, Refine, and Trust

Add code
Nov 14, 2023
Figure 1 for The ART of LLM Refinement: Ask, Refine, and Trust
Figure 2 for The ART of LLM Refinement: Ask, Refine, and Trust
Figure 3 for The ART of LLM Refinement: Ask, Refine, and Trust
Figure 4 for The ART of LLM Refinement: Ask, Refine, and Trust
Viaarxiv icon

Language model acceptability judgements are not always robust to context

Add code
Dec 18, 2022
Viaarxiv icon