Picture for Avi Caciularu

Avi Caciularu

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

Add code
Feb 18, 2026
Viaarxiv icon

Latent Reasoning with Supervised Thinking States

Add code
Feb 09, 2026
Viaarxiv icon

DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

Add code
Jun 10, 2025
Viaarxiv icon

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs

Add code
May 30, 2025
Viaarxiv icon

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

Add code
May 28, 2025
Figure 1 for ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
Figure 2 for ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
Figure 3 for ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
Figure 4 for ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
Viaarxiv icon

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Add code
Oct 30, 2024
Figure 1 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 2 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 3 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 4 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Viaarxiv icon

CoverBench: A Challenging Benchmark for Complex Claim Verification

Add code
Aug 06, 2024
Figure 1 for CoverBench: A Challenging Benchmark for Complex Claim Verification
Figure 2 for CoverBench: A Challenging Benchmark for Complex Claim Verification
Figure 3 for CoverBench: A Challenging Benchmark for Complex Claim Verification
Figure 4 for CoverBench: A Challenging Benchmark for Complex Claim Verification
Viaarxiv icon

SEAM: A Stochastic Benchmark for Multi-Document Tasks

Add code
Jun 23, 2024
Figure 1 for SEAM: A Stochastic Benchmark for Multi-Document Tasks
Figure 2 for SEAM: A Stochastic Benchmark for Multi-Document Tasks
Figure 3 for SEAM: A Stochastic Benchmark for Multi-Document Tasks
Figure 4 for SEAM: A Stochastic Benchmark for Multi-Document Tasks
Viaarxiv icon

Identifying User Goals from UI Trajectories

Add code
Jun 20, 2024
Figure 1 for Identifying User Goals from UI Trajectories
Figure 2 for Identifying User Goals from UI Trajectories
Figure 3 for Identifying User Goals from UI Trajectories
Figure 4 for Identifying User Goals from UI Trajectories
Viaarxiv icon

Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

Add code
Jun 19, 2024
Viaarxiv icon