Picture for Yixin Cao

Yixin Cao

Com$^2$: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Add code
Jun 08, 2025
Viaarxiv icon

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings

Add code
Jun 05, 2025
Viaarxiv icon

Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models

Add code
Jun 04, 2025
Viaarxiv icon

Disentangling Language and Culture for Evaluating Multilingual Large Language Models

Add code
May 30, 2025
Viaarxiv icon

FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities

Add code
May 19, 2025
Viaarxiv icon

Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches

Add code
May 18, 2025
Viaarxiv icon

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Add code
Apr 26, 2025
Viaarxiv icon

Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law

Add code
Apr 10, 2025
Viaarxiv icon

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

Add code
Mar 03, 2025
Viaarxiv icon

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Add code
Dec 27, 2024
Figure 1 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 2 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 3 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Figure 4 for Long Context vs. RAG for LLMs: An Evaluation and Revisits
Viaarxiv icon