Picture for Zsolt Kira

Zsolt Kira

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Add code
Jun 09, 2025
Viaarxiv icon

FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering

Add code
May 27, 2025
Viaarxiv icon

Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars

Add code
May 08, 2025
Viaarxiv icon

Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

Add code
Apr 02, 2025
Viaarxiv icon

When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach

Add code
Mar 21, 2025
Viaarxiv icon

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

Add code
Feb 21, 2025
Viaarxiv icon

Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding

Add code
Jan 28, 2025
Viaarxiv icon

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Add code
Dec 11, 2024
Viaarxiv icon

Grounding Descriptions in Images informs Zero-Shot Visual Recognition

Add code
Dec 05, 2024
Figure 1 for Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Figure 2 for Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Figure 3 for Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Figure 4 for Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Viaarxiv icon

Adversarial Attacks Using Differentiable Rendering: A Survey

Add code
Nov 14, 2024
Viaarxiv icon