Picture for Shun Kiyono

Shun Kiyono

Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models

Add code
Jun 29, 2024
Viaarxiv icon

Large Vocabulary Size Improves Large Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Spike No More: Stabilizing the Pre-training of Large Language Models

Add code
Dec 28, 2023
Viaarxiv icon

On Layer Normalizations and Residual Connections in Transformers

Add code
Jun 01, 2022
Figure 1 for On Layer Normalizations and Residual Connections in Transformers
Figure 2 for On Layer Normalizations and Residual Connections in Transformers
Figure 3 for On Layer Normalizations and Residual Connections in Transformers
Figure 4 for On Layer Normalizations and Residual Connections in Transformers
Viaarxiv icon

Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

Add code
May 24, 2022
Figure 1 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 2 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 3 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 4 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Viaarxiv icon

SHAPE: Shifted Absolute Position Embedding for Transformers

Add code
Sep 13, 2021
Figure 1 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 2 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 3 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 4 for SHAPE: Shifted Absolute Position Embedding for Transformers
Viaarxiv icon

Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution

Add code
Apr 15, 2021
Figure 1 for Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution
Figure 2 for Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution
Figure 3 for Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution
Figure 4 for Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution
Viaarxiv icon

Lessons on Parameter Sharing across Layers in Transformers

Add code
Apr 13, 2021
Figure 1 for Lessons on Parameter Sharing across Layers in Transformers
Figure 2 for Lessons on Parameter Sharing across Layers in Transformers
Figure 3 for Lessons on Parameter Sharing across Layers in Transformers
Figure 4 for Lessons on Parameter Sharing across Layers in Transformers
Viaarxiv icon

Rethinking Perturbations in Encoder-Decoders for Fast Training

Add code
Apr 05, 2021
Figure 1 for Rethinking Perturbations in Encoder-Decoders for Fast Training
Figure 2 for Rethinking Perturbations in Encoder-Decoders for Fast Training
Figure 3 for Rethinking Perturbations in Encoder-Decoders for Fast Training
Figure 4 for Rethinking Perturbations in Encoder-Decoders for Fast Training
Viaarxiv icon

An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

Add code
Nov 04, 2020
Figure 1 for An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution
Figure 2 for An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution
Figure 3 for An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution
Figure 4 for An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution
Viaarxiv icon