Alert button
Picture for Hui Jiang

Hui Jiang

Alert button

FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout

Jul 14, 2023
Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang

Figure 1 for FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Figure 2 for FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Figure 3 for FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Figure 4 for FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout

Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.

* 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
Viaarxiv icon

A Latent Space Theory for Emergent Abilities in Large Language Models

Apr 24, 2023
Hui Jiang

Figure 1 for A Latent Space Theory for Emergent Abilities in Large Language Models
Figure 2 for A Latent Space Theory for Emergent Abilities in Large Language Models
Figure 3 for A Latent Space Theory for Emergent Abilities in Large Language Models
Figure 4 for A Latent Space Theory for Emergent Abilities in Large Language Models

Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.

* 17 pages, 3 figures 
Viaarxiv icon

Towards Robust k-Nearest-Neighbor Machine Translation

Oct 17, 2022
Hui Jiang, Ziyao Lu, Fandong Meng, Chulun Zhou, Jie Zhou, Degen Huang, Jinsong Su

Figure 1 for Towards Robust k-Nearest-Neighbor Machine Translation
Figure 2 for Towards Robust k-Nearest-Neighbor Machine Translation
Figure 3 for Towards Robust k-Nearest-Neighbor Machine Translation
Figure 4 for Towards Robust k-Nearest-Neighbor Machine Translation

k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years. Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model. However, the underlying retrieved noisy pairs will dramatically deteriorate the model performance. In this paper, we conduct a preliminary study and find that this problem results from not fully exploiting the prediction of the NMT model. To alleviate the impact of noise, we propose a confidence-enhanced kNN-MT model with robust training. Concretely, we introduce the NMT confidence to refine the modeling of two important components of kNN-MT: kNN distribution and the interpolation weight. Meanwhile we inject two types of perturbations into the retrieved pairs for robust training. Experimental results on four benchmark datasets demonstrate that our model not only achieves significant improvements over current kNN-MT models, but also exhibits better robustness. Our code is available at https://github.com/DeepLearnXMU/Robust-knn-mt.

* Accepted to EMNLP 2022 
Viaarxiv icon

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

Jul 14, 2022
Liang Qiao, Hui Jiang, Ying Chen, Can Li, Pengfei Li, Zaisheng Li, Baorui Zou, Dashan Guo, Yingda Xu, Yunlu Xu, Zhanzhan Cheng, Yi Niu

Figure 1 for DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding
Figure 2 for DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding
Figure 3 for DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding
Figure 4 for DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

This paper presents DavarOCR, an open-source toolbox for OCR and document understanding tasks. DavarOCR currently implements 19 advanced algorithms, covering 9 different task forms. DavarOCR provides detailed usage instructions and the trained models for each algorithm. Compared with the previous opensource OCR toolbox, DavarOCR has relatively more complete support for the sub-tasks of the cutting-edge technology of document understanding. In order to promote the development and application of OCR technology in academia and industry, we pay more attention to the use of modules that different sub-domains of technology can share. DavarOCR is publicly released at https://github.com/hikopensource/Davar-Lab-OCR.

* Short paper, Accept by ACM MM2022 
Viaarxiv icon

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation

Jun 01, 2021
Hui Jiang, Chulun Zhou, Fandong Meng, Biao Zhang, Jie Zhou, Degen Huang, Qingqiang Wu, Jinsong Su

Figure 1 for Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
Figure 2 for Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
Figure 3 for Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
Figure 4 for Exploring Dynamic Selection of Branch Expansion Orders for Code Generation

Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at https://github.com/DeepLearnXMU/CG-RL.

* Accepted by ACL 2021 main conference 
Viaarxiv icon

Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition

May 13, 2021
Hui Jiang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Wenqi Ren, Fei Wu, Wenming Tan

Figure 1 for Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition
Figure 2 for Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition
Figure 3 for Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition
Figure 4 for Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition

Text recognition is a popular topic for its broad applications. In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost. The implicit task plays as an auxiliary branch for complementing the sequential recognition. We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks. Through exploiting the complementary effect between explicit and implicit tasks, the feature is reliably enhanced. Extensive experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks. In addition, it is convenient yet effective to equip with variable networks and tasks. We offer abundant ablation studies, generalizing experiments with deeper understanding on the tasks. Code is available.

* Accepted by ICDAR 2021 
Viaarxiv icon

Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning

Mar 05, 2021
Jinsong Su, Jialong Tang, Hui Jiang, Ziyao Lu, Yubin Ge, Linfeng Song, Deyi Xiong, Le Sun, Jiebo Luo

Figure 1 for Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning
Figure 2 for Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning
Figure 3 for Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning
Figure 4 for Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning

In aspect-based sentiment analysis (ABSA), many neural models are equipped with an attention mechanism to quantify the contribution of each context word to sentiment prediction. However, such a mechanism suffers from one drawback: only a few frequent words with sentiment polarities are tended to be taken into consideration for final sentiment decision while abundant infrequent sentiment words are ignored by models. To deal with this issue, we propose a progressive self-supervised attention learning approach for attentional ABSA models. In this approach, we iteratively perform sentiment prediction on all training instances, and continually learn useful attention supervision information in the meantime. During training, at each iteration, context words with the highest impact on sentiment prediction, identified based on their attention weights or gradients, are extracted as words with active/misleading influence on the correct/incorrect prediction for each instance. Words extracted in this way are masked for subsequent iterations. To exploit these extracted words for refining ABSA models, we augment the conventional training objective with a regularization term that encourages ABSA models to not only take full advantage of the extracted active context words but also decrease the weights of those misleading words. We integrate the proposed approach into three state-of-the-art neural ABSA models. Experiment results and in-depth analyses show that our approach yields better attention results and significantly enhances the performance of all three models. We release the source code and trained models at https://github.com/DeepLearnXMU/PSSAttention.

* Artificial Intelligence 2021  
* 31 pages. arXiv admin note: text overlap with arXiv:1906.01213 
Viaarxiv icon

Match$^2$: A Matching over Matching Model for Similar Question Identification

Jun 21, 2020
Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xueqi Cheng, Hui Jiang, Xiaozhao Wang

Figure 1 for Match$^2$: A Matching over Matching Model for Similar Question Identification
Figure 2 for Match$^2$: A Matching over Matching Model for Similar Question Identification
Figure 3 for Match$^2$: A Matching over Matching Model for Similar Question Identification
Figure 4 for Match$^2$: A Matching over Matching Model for Similar Question Identification

Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose a two-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match$^2$, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.

* Accepted by SIGIR 2020. 10 pages 
Viaarxiv icon

On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

Feb 10, 2020
Behnam Asadi, Hui Jiang

Figure 1 for On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks
Figure 2 for On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks
Figure 3 for On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.

* 8 pages 
Viaarxiv icon