Picture for Kushal Kafle

Kushal Kafle

Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Add code
Feb 24, 2026
Viaarxiv icon

RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Add code
Feb 19, 2026
Viaarxiv icon

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

Add code
Dec 13, 2025
Viaarxiv icon

Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles

Add code
Sep 10, 2025
Viaarxiv icon

Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models

Add code
Sep 04, 2025
Viaarxiv icon

MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos

Add code
Jun 14, 2025
Viaarxiv icon

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Add code
Jan 15, 2025
Figure 1 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 2 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 3 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 4 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Viaarxiv icon

Revisiting Multi-Modal LLM Evaluation

Add code
Aug 09, 2024
Figure 1 for Revisiting Multi-Modal LLM Evaluation
Figure 2 for Revisiting Multi-Modal LLM Evaluation
Figure 3 for Revisiting Multi-Modal LLM Evaluation
Figure 4 for Revisiting Multi-Modal LLM Evaluation
Viaarxiv icon

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

Add code
Jun 17, 2024
Viaarxiv icon

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Add code
Apr 24, 2024
Figure 1 for FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Figure 2 for FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Figure 3 for FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Figure 4 for FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Viaarxiv icon