Picture for Adina Williams

Adina Williams

Meta AI

Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages

Add code
Dec 27, 2025
Viaarxiv icon

Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation

Add code
Dec 23, 2025
Viaarxiv icon

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes

Add code
Nov 05, 2025
Viaarxiv icon

FACTORY: A Challenging Human-Verified Prompt Set for Long-Form Factuality

Add code
Jul 31, 2025
Viaarxiv icon

IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

Add code
Jun 11, 2025
Figure 1 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 2 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 3 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Figure 4 for IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Viaarxiv icon

Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks

Add code
May 28, 2025
Viaarxiv icon

Do different prompting methods yield a common task representation in language models?

Add code
May 17, 2025
Viaarxiv icon

Domain Regeneration: How well do LLMs match syntactic properties of text domains?

Add code
May 12, 2025
Viaarxiv icon

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Add code
Apr 10, 2025
Figure 1 for Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Figure 2 for Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Figure 3 for Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Figure 4 for Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Viaarxiv icon

Chained Tuning Leads to Biased Forgetting

Add code
Dec 21, 2024
Figure 1 for Chained Tuning Leads to Biased Forgetting
Figure 2 for Chained Tuning Leads to Biased Forgetting
Figure 3 for Chained Tuning Leads to Biased Forgetting
Figure 4 for Chained Tuning Leads to Biased Forgetting
Viaarxiv icon