Picture for Ruiyi Zhang

Ruiyi Zhang

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

Add code
Aug 10, 2025
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Viaarxiv icon

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

Add code
May 26, 2025
Viaarxiv icon

A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Add code
May 20, 2025
Viaarxiv icon

CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks

Add code
Apr 29, 2025
Viaarxiv icon

Defense against Prompt Injection Attacks via Mixture of Encodings

Add code
Apr 10, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Viaarxiv icon

Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Add code
Mar 18, 2025
Viaarxiv icon