Picture for Diji Yang

Diji Yang

GenIR: Generative Visual Feedback for Mental Image Retrieval

Add code
Jun 06, 2025
Viaarxiv icon

GRIT: Teaching MLLMs to Think with Images

Add code
May 21, 2025
Viaarxiv icon

Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing

Add code
May 05, 2025
Viaarxiv icon

Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals

Add code
Feb 22, 2025
Viaarxiv icon

Reinforcing Thinking through Reasoning-Enhanced Reward Models

Add code
Dec 31, 2024
Figure 1 for Reinforcing Thinking through Reasoning-Enhanced Reward Models
Figure 2 for Reinforcing Thinking through Reasoning-Enhanced Reward Models
Figure 3 for Reinforcing Thinking through Reasoning-Enhanced Reward Models
Figure 4 for Reinforcing Thinking through Reasoning-Enhanced Reward Models
Viaarxiv icon

Right this way: Can VLMs Guide Us to See More to Answer Questions?

Add code
Nov 01, 2024
Viaarxiv icon

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

Add code
Oct 16, 2024
Viaarxiv icon

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

Add code
May 15, 2024
Figure 1 for IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Figure 2 for IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Figure 3 for IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Figure 4 for IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Viaarxiv icon

Tackling Vision Language Tasks Through Learning Inner Monologues

Add code
Aug 19, 2023
Viaarxiv icon

CPL: Counterfactual Prompt Learning for Vision and Language Models

Add code
Oct 19, 2022
Figure 1 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 2 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 3 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 4 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Viaarxiv icon