Picture for Shicheng Tan

Shicheng Tan

Why are hyperbolic neural networks effective? A study on hierarchical representation capability

Add code
Feb 04, 2024
Viaarxiv icon

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Add code
Jun 11, 2023
Figure 1 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Figure 2 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Figure 3 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Figure 4 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Viaarxiv icon

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Add code
Jun 11, 2023
Figure 1 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Figure 2 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Figure 3 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Figure 4 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Viaarxiv icon

Coherence-Based Distributed Document Representation Learning for Scientific Documents

Add code
Jan 08, 2022
Figure 1 for Coherence-Based Distributed Document Representation Learning for Scientific Documents
Figure 2 for Coherence-Based Distributed Document Representation Learning for Scientific Documents
Figure 3 for Coherence-Based Distributed Document Representation Learning for Scientific Documents
Figure 4 for Coherence-Based Distributed Document Representation Learning for Scientific Documents
Viaarxiv icon