Alert button
Picture for Fuzhao Xue

Fuzhao Xue

Alert button

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Jan 29, 2024
Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

Viaarxiv icon

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis

May 22, 2023
Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, Yang You

Figure 1 for To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Figure 2 for To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Figure 3 for To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Figure 4 for To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Viaarxiv icon

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

May 22, 2023
Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

Figure 1 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 2 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 3 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 4 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Viaarxiv icon

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention

Apr 29, 2023
Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You

Figure 1 for Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
Figure 2 for Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
Figure 3 for Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
Figure 4 for Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
Viaarxiv icon

Adaptive Computation with Elastic Input Sequence

Jan 30, 2023
Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

Figure 1 for Adaptive Computation with Elastic Input Sequence
Figure 2 for Adaptive Computation with Elastic Input Sequence
Figure 3 for Adaptive Computation with Elastic Input Sequence
Figure 4 for Adaptive Computation with Elastic Input Sequence
Viaarxiv icon

Deeper vs Wider: A Revisit of Transformer Configuration

May 24, 2022
Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Xin Jiang, Yang You

Figure 1 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 2 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 3 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 4 for Deeper vs Wider: A Revisit of Transformer Configuration
Viaarxiv icon

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

Apr 22, 2022
Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You

Figure 1 for CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Figure 2 for CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Figure 3 for CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Figure 4 for CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Viaarxiv icon

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

Apr 06, 2022
Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You

Figure 1 for Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Figure 2 for Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Figure 3 for Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Figure 4 for Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Viaarxiv icon

One Student Knows All Experts Know: From Sparse to Dense

Jan 26, 2022
Fuzhao Xue, Xiaoxin He, Xiaozhe Ren, Yuxuan Lou, Yang You

Figure 1 for One Student Knows All Experts Know: From Sparse to Dense
Figure 2 for One Student Knows All Experts Know: From Sparse to Dense
Figure 3 for One Student Knows All Experts Know: From Sparse to Dense
Figure 4 for One Student Knows All Experts Know: From Sparse to Dense
Viaarxiv icon