Picture for Philippe Laban

Philippe Laban

EvalAgent: Discovering Implicit Evaluation Criteria from the Web

Add code
Apr 21, 2025
Viaarxiv icon

Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing

Add code
Apr 11, 2025
Viaarxiv icon

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation

Add code
Apr 10, 2025
Viaarxiv icon

BingoGuard: LLM Content Moderation Tools with Risk Levels

Add code
Mar 09, 2025
Viaarxiv icon

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Add code
Feb 17, 2025
Viaarxiv icon

SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Add code
Dec 17, 2024
Viaarxiv icon

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Add code
Nov 04, 2024
Viaarxiv icon

Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage

Add code
Oct 20, 2024
Figure 1 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 2 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 3 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 4 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Viaarxiv icon

Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits

Add code
Sep 26, 2024
Viaarxiv icon

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Add code
Jul 01, 2024
Figure 1 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 2 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 3 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 4 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Viaarxiv icon