Picture for Xiyu Ren

Xiyu Ren

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

Add code
May 31, 2026
Viaarxiv icon

Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models

Add code
May 26, 2026
Viaarxiv icon

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Add code
May 14, 2026
Viaarxiv icon

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

Add code
Feb 19, 2026
Viaarxiv icon

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

Add code
May 15, 2025
Viaarxiv icon

ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty

Add code
Dec 28, 2024
Figure 1 for ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Figure 2 for ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Figure 3 for ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Figure 4 for ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Viaarxiv icon