Picture for Karthik Narasimhan

Karthik Narasimhan

Princeton University

$τ$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

Add code
Mar 14, 2026
Viaarxiv icon

$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Add code
Mar 04, 2026
Viaarxiv icon

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

Add code
Oct 21, 2025
Figure 1 for Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Figure 2 for Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Figure 3 for Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Figure 4 for Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Viaarxiv icon

$τ^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Add code
Jun 09, 2025
Viaarxiv icon

Contextual Experience Replay for Self-Improvement of Language Agents

Add code
Jun 07, 2025
Figure 1 for Contextual Experience Replay for Self-Improvement of Language Agents
Figure 2 for Contextual Experience Replay for Self-Improvement of Language Agents
Figure 3 for Contextual Experience Replay for Self-Improvement of Language Agents
Figure 4 for Contextual Experience Replay for Self-Improvement of Language Agents
Viaarxiv icon

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Add code
Jun 05, 2025
Figure 1 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 2 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 3 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 4 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Viaarxiv icon

IMPersona: Evaluating Individual Level LM Impersonation

Add code
Apr 08, 2025
Figure 1 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 2 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 3 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 4 for IMPersona: Evaluating Individual Level LM Impersonation
Viaarxiv icon

ShieldGemma 2: Robust and Tractable Image Content Moderation

Add code
Apr 01, 2025
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them

Add code
Oct 14, 2024
Figure 1 for An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them
Figure 2 for An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them
Figure 3 for An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them
Figure 4 for An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them
Viaarxiv icon