Nlg Evaluation


Towards Reward Modeling for AI Tutors in Math Mistake Remediation

Add code
Mar 25, 2026
Viaarxiv icon

MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages

Add code
Mar 21, 2026
Viaarxiv icon

LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Add code
Mar 10, 2026
Viaarxiv icon

PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation

Add code
Mar 11, 2026
Viaarxiv icon

Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy

Add code
Mar 04, 2026
Viaarxiv icon

Who can we trust? LLM-as-a-jury for Comparative Assessment

Add code
Feb 18, 2026
Viaarxiv icon

Online Domain-aware LLM Decoding for Continual Domain Evolution

Add code
Feb 08, 2026
Viaarxiv icon

Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends

Add code
Jan 12, 2026
Viaarxiv icon

From NLG Evaluation to Modern Student Assessment in the Era of ChatGPT: The Great Misalignment Problem and Pedagogical Multi-Factor Assessment (P-MFA)

Add code
Dec 17, 2025
Viaarxiv icon

CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation

Add code
Jan 16, 2026
Viaarxiv icon