Alert button
Picture for Ze-Feng Gao

Ze-Feng Gao

Alert button

AI-accelerated Discovery of Altermagnetic Materials

Nov 13, 2023
Ze-Feng Gao, Shuai Qu, Bocheng Zeng, Yang Liu, Ji-Rong Wen, Hao Sun, Peng-Jie Guo, Zhong-Yi Lu

Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the very limited availability of known altermagnetic materials (e.g., 14 confirmed materials) hinders the study of such properties. Hence, discovering more types of altermagnetic materials is crucial for a comprehensive understanding of altermagnetism and thus facilitating new applications in the next-generation information technologies, e.g., storage devices and high-sensitivity sensors. Here, we report 25 new altermagnetic materials that cover metals, semiconductors, and insulators, discovered by an AI search engine unifying symmetry analysis, graph neural network pre-training, optimal transport theory, and first-principles electronic structure calculation. The wide range of electronic structural characteristics reveals that various novel physical properties manifest in these newly discovered altermagnetic materials, e.g., anomalous Hall effect, anomalous Kerr effect, and topological property. Noteworthy, we discovered 8 i-wave altermagnetic materials for the first time. Overall, the AI search engine performs much better than human experts and suggests a set of new altermagnetic materials with unique properties, outlining its potential for accelerated discovery of the materials with targeting properties.

* 38 pages; 22 figures; 3 tables 
Viaarxiv icon

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

Jul 26, 2023
Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen

Figure 1 for Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Figure 2 for Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Figure 3 for Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Figure 4 for Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.

* 15 pages, 4 figures 
Viaarxiv icon

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Apr 11, 2023
Peiyu Liu, Ze-Feng Gao, Yushuo Chen, Wayne Xin Zhao, Ji-Rong Wen

Figure 1 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
Figure 2 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
Figure 3 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
Figure 4 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

* 14 pages, 4 figures, 6 tables 
Viaarxiv icon

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

Mar 02, 2022
Ze-Feng Gao, Peiyu Liu, Wayne Xin Zhao, Zhong-Yi Lu, Ji-Rong Wen

Figure 1 for Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Figure 2 for Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Figure 3 for Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Figure 4 for Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

The state-of-the-art Mixture-of-Experts (short as MoE) architecture has achieved several remarkable successes in terms of increasing model capacity. However, MoE has been hindered widespread adoption due to complexity, communication costs, and training instability. Here we present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture by sharing a global central tensor across experts and keeping expert-specific auxiliary tensors. We also design the gradient mask strategy for the tensor structure of MPO to alleviate the overfitting problem. Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity (7.26x fewer parameters with the same amount of experts). We additionally demonstrate an improvement in the positive transfer effects of our approach for multi-task learning.

Viaarxiv icon

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Jun 04, 2021
Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Z. Y. Xie, Zhong-Yi Lu, Ji-Rong Wen

Figure 1 for Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Figure 2 for Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Figure 3 for Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Figure 4 for Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

This paper presents a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we propose a novel fine-tuning strategy by only updating the parameters from the auxiliary tensors, and design an optimization algorithm for MPO-based approximation over stacked network architectures. Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned. Extensive experiments have demonstrated the effectiveness of the proposed approach in model compression, especially the reduction in finetuning parameters (91% reduction on average).

* Accepted by ACL 2021 main conference 
Viaarxiv icon

A Model Compression Method with Matrix Product Operators for Speech Enhancement

Oct 10, 2020
Xingwei Sun, Ze-Feng Gao, Zhong-Yi Lu, Junfeng Li, Yonghong Yan

Figure 1 for A Model Compression Method with Matrix Product Operators for Speech Enhancement
Figure 2 for A Model Compression Method with Matrix Product Operators for Speech Enhancement
Figure 3 for A Model Compression Method with Matrix Product Operators for Speech Enhancement
Figure 4 for A Model Compression Method with Matrix Product Operators for Speech Enhancement

The deep neural network (DNN) based speech enhancement approaches have achieved promising performance. However, the number of parameters involved in these methods is usually enormous for the real applications of speech enhancement on the device with the limited resources. This seriously restricts the applications. To deal with this issue, model compression techniques are being widely studied. In this paper, we propose a model compression method based on matrix product operators (MPO) to substantially reduce the number of parameters in DNN models for speech enhancement. In this method, the weight matrices in the linear transformations of neural network model are replaced by the MPO decomposition format before training. In experiment, this process is applied to the causal neural network models, such as the feedforward multilayer perceptron (MLP) and long short-term memory (LSTM) models. Both MLP and LSTM models with/without compression are then utilized to estimate the ideal ratio mask for monaural speech enhancement. The experimental results show that our proposed MPO-based method outperforms the widely-used pruning method for speech enhancement under various compression rates, and further improvement can be achieved with respect to low compression rates. Our proposal provides an effective model compression method for speech enhancement, especially in cloud-free application.

* IEEE Transactions on Audio, Speech and Language Processing.2020.3030495(2020)  
* 11 pages, 6 figures, 7 tables 
Viaarxiv icon

Compressing deep neural networks by matrix product operators

Apr 11, 2019
Ze-Feng Gao, Song Cheng, Rong-Qiang He, Z. Y. Xie, Hui-Hai Zhao, Zhong-Yi Lu, Tao Xiang

Figure 1 for Compressing deep neural networks by matrix product operators
Figure 2 for Compressing deep neural networks by matrix product operators
Figure 3 for Compressing deep neural networks by matrix product operators
Figure 4 for Compressing deep neural networks by matrix product operators

A deep neural network is a parameterization of a multi-layer mapping of signals in terms of many alternatively arranged linear and nonlinear transformations. The linear transformations, which are generally used in the fully-connected as well as convolutional layers, contain most of the variational parameters that are trained and stored. Compressing a deep neural network to reduce its number of variational parameters but not its prediction power is an important but challenging problem towards the establishment of an optimized scheme in training efficiently these parameters and in lowering the risk of overfitting. Here we show that this problem can be effectively solved by representing linear transformations with matrix product operators (MPO). We have tested this approach in five main neural networks, including FC2, LeNet-5, VGG, ResNet, and DenseNet on two widely used datasets, namely MNIST and CIFAR-10, and found that this MPO representation indeed sets up a faithful and efficient mapping between input and output signals, which can keep or even improve the prediction accuracy with dramatically reduced number of parameters.

* 7+5 pages, 3 figures, 2+7 tables 
Viaarxiv icon