Picture for Damai Dai

Damai Dai

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Add code
Dec 28, 2023
Figure 1 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 2 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 3 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Figure 4 for Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Viaarxiv icon

Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

Add code
Oct 12, 2023
Viaarxiv icon

Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion

Add code
May 25, 2023
Figure 1 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 2 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 3 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 4 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Viaarxiv icon

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

Add code
May 24, 2023
Figure 1 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 2 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 3 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 4 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Viaarxiv icon

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

Add code
May 23, 2023
Viaarxiv icon

A Survey for In-context Learning

Add code
Dec 31, 2022
Viaarxiv icon

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Add code
Dec 21, 2022
Figure 1 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 2 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 3 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Figure 4 for Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Viaarxiv icon

Calibrating Factual Knowledge in Pretrained Language Models

Add code
Oct 07, 2022
Figure 1 for Calibrating Factual Knowledge in Pretrained Language Models
Figure 2 for Calibrating Factual Knowledge in Pretrained Language Models
Figure 3 for Calibrating Factual Knowledge in Pretrained Language Models
Figure 4 for Calibrating Factual Knowledge in Pretrained Language Models
Viaarxiv icon

Neural Knowledge Bank for Pretrained Transformers

Add code
Aug 16, 2022
Figure 1 for Neural Knowledge Bank for Pretrained Transformers
Figure 2 for Neural Knowledge Bank for Pretrained Transformers
Figure 3 for Neural Knowledge Bank for Pretrained Transformers
Figure 4 for Neural Knowledge Bank for Pretrained Transformers
Viaarxiv icon