Picture for Karthik Narasimhan

Karthik Narasimhan

Princeton University

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Add code
Oct 10, 2023
Figure 1 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 2 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 3 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 4 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Viaarxiv icon

FireAct: Toward Language Agent Fine-tuning

Add code
Oct 09, 2023
Figure 1 for FireAct: Toward Language Agent Fine-tuning
Figure 2 for FireAct: Toward Language Agent Fine-tuning
Figure 3 for FireAct: Toward Language Agent Fine-tuning
Figure 4 for FireAct: Toward Language Agent Fine-tuning
Viaarxiv icon

Cognitive Architectures for Language Agents

Add code
Sep 05, 2023
Figure 1 for Cognitive Architectures for Language Agents
Figure 2 for Cognitive Architectures for Language Agents
Figure 3 for Cognitive Architectures for Language Agents
Figure 4 for Cognitive Architectures for Language Agents
Viaarxiv icon

Scaling Laws for Imitation Learning in NetHack

Add code
Jul 18, 2023
Figure 1 for Scaling Laws for Imitation Learning in NetHack
Figure 2 for Scaling Laws for Imitation Learning in NetHack
Figure 3 for Scaling Laws for Imitation Learning in NetHack
Figure 4 for Scaling Laws for Imitation Learning in NetHack
Viaarxiv icon

COLLIE: Systematic Construction of Constrained Text Generation Tasks

Add code
Jul 17, 2023
Viaarxiv icon

InstructEval: Systematic Evaluation of Instruction Selection Methods

Add code
Jul 16, 2023
Figure 1 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 2 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 3 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Figure 4 for InstructEval: Systematic Evaluation of Instruction Selection Methods
Viaarxiv icon

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

Add code
Jun 27, 2023
Figure 1 for InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
Figure 2 for InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
Figure 3 for InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
Figure 4 for InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
Viaarxiv icon

Anthropomorphization of AI: Opportunities and Risks

Add code
May 24, 2023
Viaarxiv icon

CSTS: Conditional Semantic Textual Similarity

Add code
May 24, 2023
Figure 1 for CSTS: Conditional Semantic Textual Similarity
Figure 2 for CSTS: Conditional Semantic Textual Similarity
Figure 3 for CSTS: Conditional Semantic Textual Similarity
Figure 4 for CSTS: Conditional Semantic Textual Similarity
Viaarxiv icon

PruMUX: Augmenting Data Multiplexing with Model Compression

Add code
May 24, 2023
Figure 1 for PruMUX: Augmenting Data Multiplexing with Model Compression
Figure 2 for PruMUX: Augmenting Data Multiplexing with Model Compression
Figure 3 for PruMUX: Augmenting Data Multiplexing with Model Compression
Figure 4 for PruMUX: Augmenting Data Multiplexing with Model Compression
Viaarxiv icon