Picture for Josefa Lia Stoisser

Josefa Lia Stoisser

Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents

Add code
May 10, 2026
Viaarxiv icon

Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization

Add code
May 07, 2026
Viaarxiv icon

MechPert: Mechanistic Consensus as an Inductive Bias for Unseen Perturbation Prediction

Add code
Feb 14, 2026
Viaarxiv icon

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Add code
Apr 23, 2025
Figure 1 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Figure 2 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Figure 3 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Figure 4 for Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Viaarxiv icon