Picture for Sho Takase

Sho Takase

Natural Fingerprints of Large Language Models

Add code
Apr 21, 2025
Viaarxiv icon

Efficient Construction of Model Family through Progressive Training Using Model Expansion

Add code
Apr 01, 2025
Viaarxiv icon

Scaling Laws for Upcycling Mixture-of-Experts Language Models

Add code
Feb 05, 2025
Viaarxiv icon

Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models

Add code
Jun 29, 2024
Viaarxiv icon

Large Vocabulary Size Improves Large Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Spike No More: Stabilizing the Pre-training of Large Language Models

Add code
Dec 28, 2023
Figure 1 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 2 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 3 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 4 for Spike No More: Stabilizing the Pre-training of Large Language Models
Viaarxiv icon

Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

Add code
May 29, 2023
Viaarxiv icon

Nearest Neighbor Non-autoregressive Text Generation

Add code
Aug 26, 2022
Figure 1 for Nearest Neighbor Non-autoregressive Text Generation
Figure 2 for Nearest Neighbor Non-autoregressive Text Generation
Figure 3 for Nearest Neighbor Non-autoregressive Text Generation
Figure 4 for Nearest Neighbor Non-autoregressive Text Generation
Viaarxiv icon

Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention

Add code
Jul 27, 2022
Figure 1 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 2 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 3 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 4 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Viaarxiv icon

On Layer Normalizations and Residual Connections in Transformers

Add code
Jun 01, 2022
Figure 1 for On Layer Normalizations and Residual Connections in Transformers
Figure 2 for On Layer Normalizations and Residual Connections in Transformers
Figure 3 for On Layer Normalizations and Residual Connections in Transformers
Figure 4 for On Layer Normalizations and Residual Connections in Transformers
Viaarxiv icon