Picture for Lemao Liu

Lemao Liu

XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics

Add code
Apr 16, 2026
Viaarxiv icon

Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers

Add code
Apr 14, 2026
Viaarxiv icon

A Decomposition Perspective to Long-context Reasoning for LLMs

Add code
Apr 09, 2026
Viaarxiv icon

Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark

Add code
Mar 16, 2026
Viaarxiv icon

VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics

Add code
Jan 27, 2026
Viaarxiv icon

DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models

Add code
Sep 19, 2025
Viaarxiv icon

DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance

Add code
Feb 24, 2025
Figure 1 for DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
Figure 2 for DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
Figure 3 for DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
Figure 4 for DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
Viaarxiv icon

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Add code
Feb 13, 2025
Figure 1 for The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Figure 2 for The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Figure 3 for The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Figure 4 for The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Viaarxiv icon

Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

Add code
Feb 11, 2025
Figure 1 for Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
Figure 2 for Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
Figure 3 for Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
Figure 4 for Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
Viaarxiv icon

Large Language Models Can Self-Improve in Long-context Reasoning

Add code
Nov 12, 2024
Figure 1 for Large Language Models Can Self-Improve in Long-context Reasoning
Figure 2 for Large Language Models Can Self-Improve in Long-context Reasoning
Figure 3 for Large Language Models Can Self-Improve in Long-context Reasoning
Figure 4 for Large Language Models Can Self-Improve in Long-context Reasoning
Viaarxiv icon