Picture for Li Dong

Li Dong

Sherman

Knowledge Distillation of Large Language Models

Add code
Jun 14, 2023
Viaarxiv icon

Augmenting Language Models with Long-Term Memory

Add code
Jun 12, 2023
Viaarxiv icon

Pre-Training to Learn in Context

Add code
May 16, 2023
Viaarxiv icon

Language Is Not All You Need: Aligning Perception with Language Models

Add code
Mar 01, 2023
Figure 1 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 2 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 3 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 4 for Language Is Not All You Need: Aligning Perception with Language Models
Viaarxiv icon

Generic-to-Specific Distillation of Masked Autoencoders

Add code
Feb 28, 2023
Figure 1 for Generic-to-Specific Distillation of Masked Autoencoders
Figure 2 for Generic-to-Specific Distillation of Masked Autoencoders
Figure 3 for Generic-to-Specific Distillation of Masked Autoencoders
Figure 4 for Generic-to-Specific Distillation of Masked Autoencoders
Viaarxiv icon

Semi-Supervised Learning with Pseudo-Negative Labels for Image Classification

Add code
Jan 10, 2023
Figure 1 for Semi-Supervised Learning with Pseudo-Negative Labels for Image Classification
Figure 2 for Semi-Supervised Learning with Pseudo-Negative Labels for Image Classification
Figure 3 for Semi-Supervised Learning with Pseudo-Negative Labels for Image Classification
Figure 4 for Semi-Supervised Learning with Pseudo-Negative Labels for Image Classification
Viaarxiv icon

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Add code
Dec 21, 2022
Figure 1 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 2 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 3 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 4 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Viaarxiv icon

Language Models as Inductive Reasoners

Add code
Dec 21, 2022
Viaarxiv icon

A Length-Extrapolatable Transformer

Add code
Dec 20, 2022
Viaarxiv icon

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Add code
Dec 20, 2022
Figure 1 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 2 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 3 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Figure 4 for GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Viaarxiv icon