Picture for Debaditya Roy

Debaditya Roy

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

Add code
Jun 09, 2026
Viaarxiv icon

Improving Temporal Action Segmentation via Constraint-Aware Decoding

Add code
May 11, 2026
Viaarxiv icon

Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning

Add code
Apr 28, 2026
Viaarxiv icon

Generating Key Postures of Bharatanatyam Adavus with Pose Estimation

Add code
Mar 31, 2026
Viaarxiv icon

Learning to Generate Long-term Future Narrations Describing Activities of Daily Living

Add code
Mar 03, 2025
Viaarxiv icon

Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios

Add code
Nov 20, 2024
Viaarxiv icon

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos

Add code
Jul 30, 2024
Figure 1 for Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Figure 2 for Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Figure 3 for Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Figure 4 for Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Viaarxiv icon

ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

Add code
Jul 02, 2023
Viaarxiv icon

Modelling Spatio-Temporal Interactions for Compositional Action Recognition

Add code
May 04, 2023
Figure 1 for Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Figure 2 for Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Figure 3 for Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Figure 4 for Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Viaarxiv icon

Interaction Visual Transformer for Egocentric Action Anticipation

Add code
Nov 25, 2022
Viaarxiv icon