Alert button
Picture for Sukjin Hong

Sukjin Hong

Alert button

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Add code
Bookmark button
Alert button
Aug 13, 2023
Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Figure 1 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 2 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 3 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Figure 4 for Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Viaarxiv icon

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Add code
Bookmark button
Alert button
Feb 03, 2023
Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn, Du-Seong Chang, Se-Young Yun

Figure 1 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Figure 2 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Figure 3 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Figure 4 for Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Viaarxiv icon

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Add code
Bookmark button
Alert button
Nov 20, 2022
Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

Figure 1 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 2 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 3 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Figure 4 for Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Viaarxiv icon