Picture for Zsolt Kira

Zsolt Kira

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

Add code
Jul 15, 2025
Viaarxiv icon

EscherNet++: Simultaneous Amodal Completion and Scalable View Synthesis through Masked Fine-Tuning and Enhanced Feed-Forward 3D Reconstruction

Add code
Jul 10, 2025
Viaarxiv icon

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

Add code
Jun 18, 2025
Viaarxiv icon

MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding

Add code
Jun 11, 2025
Viaarxiv icon

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Add code
Jun 09, 2025
Viaarxiv icon

FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering

Add code
May 27, 2025
Viaarxiv icon

Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars

Add code
May 08, 2025
Viaarxiv icon

Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

Add code
Apr 02, 2025
Figure 1 for Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Figure 2 for Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Figure 3 for Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Figure 4 for Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Viaarxiv icon

When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach

Add code
Mar 21, 2025
Viaarxiv icon

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

Add code
Feb 21, 2025
Viaarxiv icon