Picture for Jiayi Ji

Jiayi Ji

MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation

Add code
Jan 13, 2026
Viaarxiv icon

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval

Add code
Jan 07, 2026
Viaarxiv icon

Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

Add code
Dec 31, 2025
Viaarxiv icon

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Add code
Dec 28, 2025
Viaarxiv icon

Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions

Add code
Oct 26, 2025
Viaarxiv icon

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning

Add code
Oct 09, 2025
Figure 1 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 2 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 3 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 4 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Viaarxiv icon

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Add code
Aug 01, 2025
Viaarxiv icon

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Add code
Jul 03, 2025
Figure 1 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 2 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 3 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 4 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Viaarxiv icon

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

Add code
Jun 09, 2025
Viaarxiv icon

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

Add code
May 23, 2025
Viaarxiv icon