Picture for Anshul Gupta

Anshul Gupta

Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models

Add code
May 19, 2026
Viaarxiv icon

Towards Benchmarking Foundation Models for Tabular Data With Text

Add code
Jul 10, 2025
Viaarxiv icon

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Add code
Sep 30, 2024
Viaarxiv icon

Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following

Add code
Jun 06, 2024
Figure 1 for Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
Figure 2 for Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
Figure 3 for Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
Figure 4 for Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
Viaarxiv icon

A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction

Add code
Mar 15, 2024
Figure 1 for A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Figure 2 for A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Figure 3 for A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Figure 4 for A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Viaarxiv icon

Sharingan: A Transformer-based Architecture for Gaze Following

Add code
Oct 01, 2023
Figure 1 for Sharingan: A Transformer-based Architecture for Gaze Following
Figure 2 for Sharingan: A Transformer-based Architecture for Gaze Following
Figure 3 for Sharingan: A Transformer-based Architecture for Gaze Following
Figure 4 for Sharingan: A Transformer-based Architecture for Gaze Following
Viaarxiv icon

A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings

Add code
Jul 11, 2023
Figure 1 for A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
Figure 2 for A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
Figure 3 for A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
Figure 4 for A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
Viaarxiv icon

ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour

Add code
Jul 04, 2023
Viaarxiv icon

End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints

Add code
Nov 22, 2020
Figure 1 for End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints
Figure 2 for End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints
Figure 3 for End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints
Figure 4 for End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints
Viaarxiv icon

Font Identification in Historical Documents Using Active Learning

Add code
Jan 27, 2016
Figure 1 for Font Identification in Historical Documents Using Active Learning
Figure 2 for Font Identification in Historical Documents Using Active Learning
Figure 3 for Font Identification in Historical Documents Using Active Learning
Figure 4 for Font Identification in Historical Documents Using Active Learning
Viaarxiv icon