Picture for Sosuke Kobayashi

Sosuke Kobayashi

Spike No More: Stabilizing the Pre-training of Large Language Models

Add code
Dec 28, 2023
Viaarxiv icon

On Layer Normalizations and Residual Connections in Transformers

Add code
Jun 01, 2022
Figure 1 for On Layer Normalizations and Residual Connections in Transformers
Figure 2 for On Layer Normalizations and Residual Connections in Transformers
Figure 3 for On Layer Normalizations and Residual Connections in Transformers
Figure 4 for On Layer Normalizations and Residual Connections in Transformers
Viaarxiv icon

Decomposing NeRF for Editing via Feature Field Distillation

Add code
May 31, 2022
Figure 1 for Decomposing NeRF for Editing via Feature Field Distillation
Figure 2 for Decomposing NeRF for Editing via Feature Field Distillation
Figure 3 for Decomposing NeRF for Editing via Feature Field Distillation
Figure 4 for Decomposing NeRF for Editing via Feature Field Distillation
Viaarxiv icon

Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

Add code
May 24, 2022
Figure 1 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 2 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 3 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Figure 4 for Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model
Viaarxiv icon

Instance-Based Neural Dependency Parsing

Add code
Sep 28, 2021
Figure 1 for Instance-Based Neural Dependency Parsing
Figure 2 for Instance-Based Neural Dependency Parsing
Figure 3 for Instance-Based Neural Dependency Parsing
Figure 4 for Instance-Based Neural Dependency Parsing
Viaarxiv icon

SHAPE: Shifted Absolute Position Embedding for Transformers

Add code
Sep 13, 2021
Figure 1 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 2 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 3 for SHAPE: Shifted Absolute Position Embedding for Transformers
Figure 4 for SHAPE: Shifted Absolute Position Embedding for Transformers
Viaarxiv icon

Efficient Estimation of Influence of a Training Instance

Add code
Dec 08, 2020
Figure 1 for Efficient Estimation of Influence of a Training Instance
Figure 2 for Efficient Estimation of Influence of a Training Instance
Figure 3 for Efficient Estimation of Influence of a Training Instance
Figure 4 for Efficient Estimation of Influence of a Training Instance
Viaarxiv icon

All Word Embeddings from One Embedding

Add code
May 25, 2020
Figure 1 for All Word Embeddings from One Embedding
Figure 2 for All Word Embeddings from One Embedding
Figure 3 for All Word Embeddings from One Embedding
Figure 4 for All Word Embeddings from One Embedding
Viaarxiv icon

Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition

Add code
Apr 29, 2020
Figure 1 for Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition
Figure 2 for Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition
Figure 3 for Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition
Figure 4 for Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition
Viaarxiv icon

Data Interpolating Prediction: Alternative Interpretation of Mixup

Add code
Jun 20, 2019
Figure 1 for Data Interpolating Prediction: Alternative Interpretation of Mixup
Figure 2 for Data Interpolating Prediction: Alternative Interpretation of Mixup
Figure 3 for Data Interpolating Prediction: Alternative Interpretation of Mixup
Figure 4 for Data Interpolating Prediction: Alternative Interpretation of Mixup
Viaarxiv icon