Picture for Jun Suzuki

Jun Suzuki

Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning

Add code
Mar 17, 2026
Viaarxiv icon

Enhancing Persuasive Dialogue Agents by Synthesizing Cross-Disciplinary Communication Strategies

Add code
Feb 26, 2026
Viaarxiv icon

TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks

Add code
Jan 30, 2026
Viaarxiv icon

Relaxing Positional Alignment in Masked Diffusion Language Models

Add code
Jan 30, 2026
Viaarxiv icon

Suppressing Final Layer Hidden State Jumps in Transformer Pretraining

Add code
Jan 26, 2026
Viaarxiv icon

Instruction-Following Evaluation of Large Vision-Language Models

Add code
Dec 29, 2025
Viaarxiv icon

An Open and Reproducible Deep Research Agent for Long-Form Question Answering

Add code
Dec 15, 2025
Figure 1 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 2 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 3 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 4 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Viaarxiv icon

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders

Add code
Oct 25, 2025
Figure 1 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 2 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 3 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 4 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Viaarxiv icon

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Add code
Aug 26, 2025
Viaarxiv icon

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Add code
Aug 25, 2025
Viaarxiv icon