Picture for Lucy Lu Wang

Lucy Lu Wang

Illusions of the Gold Standard: A Large-scale Analysis of Human Evaluation Protocols for Long-form Text Generation

Add code
Jun 09, 2026
Viaarxiv icon

STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

Add code
May 13, 2026
Viaarxiv icon

ReFinE: Streamlining UI Mockup Iteration with Research Findings

Add code
Apr 06, 2026
Viaarxiv icon

Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification

Add code
Jan 23, 2026
Viaarxiv icon

Leveraging Hierarchical Organization for Medical Multi-document Summarization

Add code
Oct 27, 2025
Viaarxiv icon

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Add code
May 23, 2025
Figure 1 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 2 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 3 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Figure 4 for MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
Viaarxiv icon

FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text

Add code
Mar 19, 2025
Figure 1 for FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Figure 2 for FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Figure 3 for FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Figure 4 for FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Viaarxiv icon

Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences

Add code
Feb 27, 2025
Viaarxiv icon

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Add code
Oct 14, 2024
Figure 1 for Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Figure 2 for Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Figure 3 for Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Figure 4 for Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Viaarxiv icon

Know Your Limits: A Survey of Abstention in Large Language Models

Add code
Aug 08, 2024
Figure 1 for Know Your Limits: A Survey of Abstention in Large Language Models
Figure 2 for Know Your Limits: A Survey of Abstention in Large Language Models
Figure 3 for Know Your Limits: A Survey of Abstention in Large Language Models
Figure 4 for Know Your Limits: A Survey of Abstention in Large Language Models
Viaarxiv icon