Picture for Katherine M. Collins

Katherine M. Collins

Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

Add code
Feb 26, 2026
Viaarxiv icon

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Add code
Feb 19, 2026
Viaarxiv icon

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

Add code
Nov 11, 2025
Viaarxiv icon

What's in the Box? Reasoning about Unseen Objects from Multimodal Cues

Add code
Jun 17, 2025
Viaarxiv icon

Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships

Add code
May 22, 2025
Viaarxiv icon

When Should We Orchestrate Multiple Agents?

Add code
Mar 17, 2025
Viaarxiv icon

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Add code
Mar 09, 2025
Viaarxiv icon

On Benchmarking Human-Like Intelligence in Machines

Add code
Feb 27, 2025
Figure 1 for On Benchmarking Human-Like Intelligence in Machines
Figure 2 for On Benchmarking Human-Like Intelligence in Machines
Figure 3 for On Benchmarking Human-Like Intelligence in Machines
Figure 4 for On Benchmarking Human-Like Intelligence in Machines
Viaarxiv icon

Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

Add code
Dec 19, 2024
Figure 1 for Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning
Figure 2 for Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning
Viaarxiv icon

Can Large Language Models Understand Symbolic Graphics Programs?

Add code
Aug 15, 2024
Figure 1 for Can Large Language Models Understand Symbolic Graphics Programs?
Figure 2 for Can Large Language Models Understand Symbolic Graphics Programs?
Figure 3 for Can Large Language Models Understand Symbolic Graphics Programs?
Figure 4 for Can Large Language Models Understand Symbolic Graphics Programs?
Viaarxiv icon