Picture for Keigo Shibata

Keigo Shibata

Suppressing Final Layer Hidden State Jumps in Transformer Pretraining

Add code
Jan 26, 2026
Viaarxiv icon

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Add code
Aug 25, 2025
Viaarxiv icon