Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nick Merrill

Symmetry Defeats Auditing

May 27, 2026

Nick Merrill, Zeke Medley

Abstract:We demonstrate an attack on Introspection Adapters (Shenoy et al., 2026).

Via

Access Paper or Ask Questions

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

May 21, 2026

Nick Merrill, Jaeho Lee, Ezra Karger

Abstract:We document inverse scaling in LLMs on forecasting problems whose underlying time series exhibit superlinear growth and tail risk of regime change, a structure common in finance and epidemiology. On these tasks, more capable models produce worse distributional forecasts. The pattern appears on ForecastBench-Sim (FBSim), a contamination-free, simulated-world benchmark we release, in forecasting synthetic SIR epidemics with a matched linear control, and replicates in real-world datasets on COVID-19, measles, housing markets, and hyperinflation. A per-quantile decomposition shows the failure concentrates at the upper tail, which more capable models shift upward to track aggressive extrapolations of growth, while the lower tail stays put. A within-family study of Llama-3.1 shows that both model scale and post-training independently contribute to this effect. Domain knowledge does not reliably rescue calibration. This inverse scaling does not appear on single-threshold metrics common in LLM forecasting benchmarks, reversing the sign of the capability--accuracy relationship on identical outputs. Single-threshold scoring at conventional cutoffs misses the upper-tail cost; tail-inclusive scoring reverses the sign of the capability--accuracy relationship on the same outputs. We recommend that LLM forecasting evaluations use continuous (and unbounded) measures of accuracy alongside bounded binary threshold metrics.

Via

Access Paper or Ask Questions

The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Jan 23, 2023

Jesse Josua Benjamin, Heidi Biggs, Arne Berger, Julija Rukanskaitė, Michael Heidt, Nick Merrill, James Pierce, Joseph Lindley

Figure 1 for The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Figure 2 for The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Figure 3 for The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Figure 4 for The Entoptic Field Camera as Metaphor-Driven Research-through-Design with AI Technologies

Abstract:Artificial intelligence (AI) technologies are widely deployed in smartphone photography; and prompt-based image synthesis models have rapidly become commonplace. In this paper, we describe a Research-through-Design (RtD) project which explores this shift in the means and modes of image production via the creation and use of the Entoptic Field Camera. Entoptic phenomena usually refer to perceptions of floaters or bright blue dots stemming from the physiological interplay of the eye and brain. We use the term entoptic as a metaphor to investigate how the material interplay of data and models in AI technologies shapes human experiences of reality. Through our case study using first-person design and a field study, we offer implications for critical, reflective, more-than-human and ludic design to engage AI technologies; the conceptualisation of an RtD research space which contributes to AI literacy discourses; and outline a research trajectory concerning materiality and design affordances of AI technologies.

* To be published in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023, Hamburg, Germany

Via

Access Paper or Ask Questions

Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics

Feb 17, 2022

Richmond Y. Wong, Michael A. Madaio, Nick Merrill

Figure 1 for Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics

Abstract:Numerous toolkits have been developed to support ethical AI development. However, toolkits, like all tools, encode assumptions in their design about what work should be done and how. In this paper, we conduct a qualitative analysis of 27 AI ethics toolkits to critically examine how the work of ethics is imagined and how it is supported by these toolkits. Specifically, we examine the discourses toolkits rely on when talking about ethical issues, who they imagine should do the work of ethics, and how they envision the work practices involved in addressing ethics. We find that AI ethics toolkits largely frame the work of AI ethics to be technical work for individual technical practitioners, despite calls for engaging broader sets of stakeholders in grappling with social aspects of AI ethics, and without contending with the organizational and political implications of AI ethics work in practice. Among all toolkits, we identify a mismatch between the imagined work of ethics and the support the toolkits provide for doing that work. We identify a lack of guidance around how to navigate organizational power dynamics as they relate to performing ethical work. We use these omissions to chart future work for researchers and designers of AI ethics toolkits.

* Pre-print manuscript

Via

Access Paper or Ask Questions

Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Jan 11, 2021

Jesse Josua Benjamin, Arne Berger, Nick Merrill, James Pierce

Figure 1 for Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Figure 2 for Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Figure 3 for Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Figure 4 for Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry

Abstract:Design research is important for understanding and interrogating how emerging technologies shape human experience. However, design research with Machine Learning (ML) is relatively underdeveloped. Crucially, designers have not found a grasp on ML uncertainty as a design opportunity rather than an obstacle. The technical literature points to data and model uncertainties as two main properties of ML. Through post-phenomenology, we position uncertainty as one defining material attribute of ML processes which mediate human experience. To understand ML uncertainty as a design material, we investigate four design research case studies involving ML. We derive three provocative concepts: thingly uncertainty: ML-driven artefacts have uncertain, variable relations to their environments; pattern leakage: ML uncertainty can lead to patterns shaping the world they are meant to represent; and futures creep: ML technologies texture human relations to time with uncertainty. Finally, we outline design research trajectories and sketch a post-phenomenological approach to human-ML relations.

* Accepted to ACM 2021 CHI Conference on Human Factors in Computing Systems (CHI 2021)

Via

Access Paper or Ask Questions