Picture for Steffen Eger

Steffen Eger

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

Add code
May 25, 2025
Viaarxiv icon

CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks

Add code
May 16, 2025
Viaarxiv icon

LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering

Add code
May 09, 2025
Viaarxiv icon

TransProQA: an LLM-based literary Translation evaluation metric with Professional Question Answering

Add code
May 08, 2025
Viaarxiv icon

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Add code
Apr 10, 2025
Viaarxiv icon

ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation

Add code
Apr 02, 2025
Viaarxiv icon

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Add code
Mar 14, 2025
Viaarxiv icon

BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression

Add code
Mar 04, 2025
Viaarxiv icon

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Add code
Feb 07, 2025
Viaarxiv icon

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics

Add code
Dec 20, 2024
Viaarxiv icon