Picture for Kaiser Sun

Kaiser Sun

How to Interpret Agent Behavior

Add code
May 13, 2026
Viaarxiv icon

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

Add code
Apr 09, 2026
Viaarxiv icon

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

Add code
Mar 10, 2026
Viaarxiv icon

FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights

Add code
Feb 02, 2026
Viaarxiv icon

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

Add code
Jun 06, 2025
Figure 1 for What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Figure 2 for What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Figure 3 for What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Figure 4 for What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Viaarxiv icon

CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?

Add code
Mar 27, 2025
Figure 1 for CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
Figure 2 for CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
Figure 3 for CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
Figure 4 for CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
Viaarxiv icon

Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

Add code
Aug 14, 2024
Viaarxiv icon

The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks

Add code
Oct 26, 2023
Figure 1 for The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
Figure 2 for The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
Figure 3 for The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
Figure 4 for The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
Viaarxiv icon

Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks

Add code
Dec 19, 2022
Figure 1 for Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks
Figure 2 for Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks
Figure 3 for Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks
Figure 4 for Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks
Viaarxiv icon

State-of-the-art generalisation research in NLP: a taxonomy and review

Add code
Oct 10, 2022
Figure 1 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 2 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 3 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 4 for State-of-the-art generalisation research in NLP: a taxonomy and review
Viaarxiv icon