Alert button
Picture for Nan Du

Nan Du

Alert button

Are Large Language Models Good Prompt Optimizers?

Feb 03, 2024
Ruotian Ma, Xiaolei Wang, Xin Zhou, Jian Li, Nan Du, Tao Gui, Qi Zhang, Xuanjing Huang

Viaarxiv icon

On Diversified Preferences of Large Language Model Alignment

Dec 25, 2023
Dun Zeng, Yong Dai, Pengyu Cheng, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu

Viaarxiv icon

Learning to Skip for Language Modeling

Nov 26, 2023
Dewen Zeng, Nan Du, Tao Wang, Yuanzhong Xu, Tao Lei, Zhifeng Chen, Claire Cui

Viaarxiv icon

Adversarial Preference Optimization

Nov 14, 2023
Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du

Figure 1 for Adversarial Preference Optimization
Figure 2 for Adversarial Preference Optimization
Figure 3 for Adversarial Preference Optimization
Figure 4 for Adversarial Preference Optimization
Viaarxiv icon

Everyone Deserves A Reward: Learning Customized Human Preferences

Sep 15, 2023
Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du

Viaarxiv icon

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Aug 25, 2023
Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

Figure 1 for Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Figure 2 for Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Figure 3 for Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Figure 4 for Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Viaarxiv icon

Brainformers: Trading Simplicity for Efficiency

May 29, 2023
Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laundon, Jeff Dean

Figure 1 for Brainformers: Trading Simplicity for Efficiency
Figure 2 for Brainformers: Trading Simplicity for Efficiency
Figure 3 for Brainformers: Trading Simplicity for Efficiency
Figure 4 for Brainformers: Trading Simplicity for Efficiency
Viaarxiv icon

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

May 24, 2023
Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

Figure 1 for DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Figure 2 for DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Figure 3 for DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Figure 4 for DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Viaarxiv icon

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

May 24, 2023
Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou

Figure 1 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 2 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 3 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Figure 4 for Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Viaarxiv icon