Picture for Qihuang Zhong

Qihuang Zhong

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Add code
May 24, 2023
Figure 1 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 2 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 3 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Figure 4 for Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Viaarxiv icon

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Add code
May 22, 2023
Figure 1 for Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Figure 2 for Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Figure 3 for Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Figure 4 for Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Viaarxiv icon

Towards Making the Most of ChatGPT for Machine Translation

Add code
Mar 24, 2023
Figure 1 for Towards Making the Most of ChatGPT for Machine Translation
Figure 2 for Towards Making the Most of ChatGPT for Machine Translation
Figure 3 for Towards Making the Most of ChatGPT for Machine Translation
Figure 4 for Towards Making the Most of ChatGPT for Machine Translation
Viaarxiv icon

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

Add code
Mar 02, 2023
Figure 1 for Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT
Figure 2 for Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT
Figure 3 for Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT
Figure 4 for Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT
Viaarxiv icon

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Add code
Mar 01, 2023
Viaarxiv icon

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

Add code
Feb 18, 2023
Figure 1 for Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE
Figure 2 for Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE
Figure 3 for Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE
Figure 4 for Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE
Viaarxiv icon

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

Add code
Dec 04, 2022
Figure 1 for Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
Figure 2 for Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
Figure 3 for Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
Figure 4 for Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
Viaarxiv icon

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Add code
Oct 11, 2022
Figure 1 for Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Figure 2 for Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Figure 3 for Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Figure 4 for Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Viaarxiv icon

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

Add code
Aug 22, 2022
Figure 1 for PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Figure 2 for PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Figure 3 for PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Figure 4 for PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Viaarxiv icon

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

Add code
May 30, 2022
Figure 1 for E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Figure 2 for E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Figure 3 for E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Figure 4 for E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Viaarxiv icon