Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gang Chen

Junbo

A critical look at the current train/test split in machine learning

Jun 08, 2021
Jimin Tan, Jianan Yang, Sai Wu, Gang Chen, Jake Zhao

Figure 1 for A critical look at the current train/test split in machine learning

Figure 2 for A critical look at the current train/test split in machine learning

Figure 3 for A critical look at the current train/test split in machine learning

Figure 4 for A critical look at the current train/test split in machine learning

The randomized or cross-validated split of training and testing sets has been adopted as the gold standard of machine learning for decades. The establishment of these split protocols are based on two assumptions: (i)-fixing the dataset to be eternally static so we could evaluate different machine learning algorithms or models; (ii)-there is a complete set of annotated data available to researchers or industrial practitioners. However, in this article, we intend to take a closer and critical look at the split protocol itself and point out its weakness and limitation, especially for industrial applications. In many real-world problems, we must acknowledge that there are numerous situations where assumption (ii) does not hold. For instance, for interdisciplinary applications like drug discovery, it often requires real lab experiments to annotate data which poses huge costs in both time and financial considerations. In other words, it can be very difficult or even impossible to satisfy assumption (ii). In this article, we intend to access this problem and reiterate the paradigm of active learning, and investigate its potential on solving problems under unconventional train/test split protocols. We further propose a new adaptive active learning architecture (AAL) which involves an adaptation policy, in comparison with the traditional active learning that only unidirectionally adds data points to the training pool. We primarily justify our points by extensively investigating an interdisciplinary drug-protein binding problem. We additionally evaluate AAL on more conventional machine learning benchmarking datasets like CIFAR-10 to demonstrate the generalizability and efficacy of the new framework.

Via

Access Paper or Ask Questions

Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Apr 21, 2021
Zeyu Chen, Xinhang Zhang, Juan Li, Jingxuan Ni, Gang Chen, Shaohua Wang, Fangfang Fan, Changfeng Charles Wang, Xiaotao Li

Figure 1 for Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Figure 2 for Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Figure 3 for Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Figure 4 for Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Fatigue detection is valued for people to keep mental health and prevent safety accidents. However, detecting facial fatigue, especially mild fatigue in the real world via machine vision is still a challenging issue due to lack of non-lab dataset and well-defined algorithms. In order to improve the detection capability on facial fatigue that can be used widely in daily life, this paper provided an audiovisual dataset named DLFD (daily-life fatigue dataset) which reflected people's facial fatigue state in the wild. A framework using 3D-ResNet along with non-local attention mechanism was training for extraction of local and long-range features in spatial and temporal dimensions. Then, a compacted loss function combining mean squared error and cross-entropy was designed to predict both continuous and categorical fatigue degrees. Our proposed framework has reached an average accuracy of 90.8% on validation set and 72.5% on test set for binary classification, standing a good position compared to other state-of-the-art methods. The analysis of feature map visualization revealed that our framework captured facial dynamics and attempted to build a connection with fatigue state. Our experimental results in multiple metrics proved that our framework captured some typical, micro and dynamic facial features along spatiotemporal dimensions, contributing to the mild fatigue detection in the wild.

* 25 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Apr 01, 2021
Can Cui, Wei Wang, Meihui Zhang, Gang Chen, Zhaojing Luo, Beng Chin Ooi

Figure 1 for AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Figure 2 for AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Figure 3 for AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Figure 4 for AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.

* Accepted by SIGMOD 2021 Data Science and Engineering Track

Via

Access Paper or Ask Questions

Guided Interpolation for Adversarial Training

Feb 15, 2021
Chen Chen, Jingfeng Zhang, Xilie Xu, Tianlei Hu, Gang Niu, Gang Chen, Masashi Sugiyama

Figure 1 for Guided Interpolation for Adversarial Training

Figure 2 for Guided Interpolation for Adversarial Training

Figure 3 for Guided Interpolation for Adversarial Training

Figure 4 for Guided Interpolation for Adversarial Training

To enhance adversarial robustness, adversarial training learns deep neural networks on the adversarial variants generated by their natural data. However, as the training progresses, the training data becomes less and less attackable, undermining the robustness enhancement. A straightforward remedy is to incorporate more training data, but sometimes incurring an unaffordable cost. In this paper, to mitigate this issue, we propose the guided interpolation framework (GIF): in each epoch, the GIF employs the previous epoch's meta information to guide the data's interpolation. Compared with the vanilla mixup, the GIF can provide a higher ratio of attackable data, which is beneficial to the robustness enhancement; it meanwhile mitigates the model's linear behavior between classes, where the linear behavior is favorable to generalization but not to the robustness. As a result, the GIF encourages the model to predict invariantly in the cluster of each class. Experiments demonstrate that the GIF can indeed enhance adversarial robustness on various adversarial training methods and various datasets.

Via

Access Paper or Ask Questions

Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Feb 06, 2021
Victoria Huang, Gang Chen, Qiang Fu

Figure 1 for Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Figure 2 for Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Figure 3 for Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Figure 4 for Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Recently, distributed controller architectures have been quickly gaining popularity in Software-Defined Networking (SDN). However, the use of distributed controllers introduces a new and important Request Dispatching (RD) problem with the goal for every SDN switch to properly dispatch their requests among all controllers so as to optimize network performance. This goal can be fulfilled by designing an RD policy to guide distribution of requests at each switch. In this paper, we propose a Multi-Agent Deep Reinforcement Learning (MA-DRL) approach to automatically design RD policies with high adaptability and performance. This is achieved through a new problem formulation in the form of a Multi-Agent Markov Decision Process (MA-MDP), a new adaptive RD policy design and a new MA-DRL algorithm called MA-PPO. Extensive simulation studies show that our MA-DRL technique can effectively train RD policies to significantly outperform man-made policies, model-based policies, as well as RD policies learned via single-agent DRL algorithms.

Via

Access Paper or Ask Questions

Neural Machine Translation: A Review of Methods, Resources, and Tools

Dec 31, 2020
Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, Yang Liu

Figure 1 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 2 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 3 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Figure 4 for Neural Machine Translation: A Review of Methods, Resources, and Tools

Machine translation (MT) is an important sub-field of natural language processing that aims to translate natural languages using computers. In recent years, end-to-end neural machine translation (NMT) has achieved great success and has become the new mainstream method in practical MT systems. In this article, we first provide a broad review of the methods for NMT and focus on methods relating to architectures, decoding, and data augmentation. Then we summarize the resources and tools that are useful for researchers. Finally, we conclude with a discussion of possible future research directions.

* Accepted by AI Open

Via

Access Paper or Ask Questions

Towards a Universal Continuous Knowledge Base

Dec 25, 2020
Gang Chen, Maosong Sun, Yang Liu

Figure 1 for Towards a Universal Continuous Knowledge Base

Figure 2 for Towards a Universal Continuous Knowledge Base

Figure 3 for Towards a Universal Continuous Knowledge Base

Figure 4 for Towards a Universal Continuous Knowledge Base

In artificial intelligence, knowledge is the information required by an intelligent system to accomplish tasks. While traditional knowledge bases use discrete, symbolic representations, detecting knowledge encoded in the continuous representations learned from data has received increasing attention recently. In this work, we propose a method for building a continuous knowledge base that can store knowledge imported from multiple, diverse neural networks. The key idea of our approach is to define an interface for each neural network and cast knowledge transferring as a function simulation problem. Preliminary experiments on text classification show promising results: we first import the knowledge encoded in an RNN model and a CNN model to the knowledge base, from which the fused knowledge is exported back to the RNN model, achieving a higher classification accuracy than the original RNN model. With the continuous knowledge base, it is also easy to achieve knowledge distillation and transfer learning. Our work opens the door to building a universal continuous knowledge base to collect, store, and organize all continuous knowledge encoded in different neural networks trained for different AI tasks.

Via

Access Paper or Ask Questions

Learning Symbolic Expressions via Gumbel-Max Equation Learner Network

Dec 12, 2020
Gang Chen

Figure 1 for Learning Symbolic Expressions via Gumbel-Max Equation Learner Network

Figure 2 for Learning Symbolic Expressions via Gumbel-Max Equation Learner Network

Figure 3 for Learning Symbolic Expressions via Gumbel-Max Equation Learner Network

Although modern machine learning, in particular deep learning, has achieved outstanding success in scientific and engineering research, most of the neural networks (NNs) learned via these state-of-the-art techniques are black-box models. For a widespread success of machine learning in science and engineering, it is important to develop new NN architectures to effectively extract high-level mathematical knowledge from complex dataset. To meet this research demand, this paper focuses on the symbolic regression problem and develops a new NN architecture called the Gumbel-Max Equation Learner (GMEQL) network. Different from previously proposed Equation Learner (EQL) networks, GMEQL applies continuous relaxation to the network structure via the Gumbel-Max trick and introduces two types of trainable parameters: structure parameters and regression parameters. This paper also proposes a new two-stage training process and new techniques to train structure parameters in both the online and offline settings based on an elite repository. On 8 benchmark symbolic regression problems, GMEQL is experimentally shown to outperform several cutting-edge techniques for symbolic regression.

Via

Access Paper or Ask Questions

LINDT: Tackling Negative Federated Learning with Local Adaptation

Nov 23, 2020
Hong Lin, Lidan Shou, Ke Chen, Gang Chen, Sai Wu

Figure 1 for LINDT: Tackling Negative Federated Learning with Local Adaptation

Figure 2 for LINDT: Tackling Negative Federated Learning with Local Adaptation

Figure 3 for LINDT: Tackling Negative Federated Learning with Local Adaptation

Figure 4 for LINDT: Tackling Negative Federated Learning with Local Adaptation

Federated Learning (FL) is a promising distributed learning paradigm, which allows a number of data owners (also called clients) to collaboratively learn a shared model without disclosing each client's data. However, FL may fail to proceed properly, amid a state that we call negative federated learning (NFL). This paper addresses the problem of negative federated learning. We formulate a rigorous definition of NFL and analyze its essential cause. We propose a novel framework called LINDT for tackling NFL in run-time. The framework can potentially work with any neural-network-based FL systems for NFL detection and recovery. Specifically, we introduce a metric for detecting NFL from the server. On occasion of NFL recovery, the framework makes adaptation to the federated model on each client's local data by learning a Layer-wise Intertwined Dual-model. Experiment results show that the proposed approach can significantly improve the performance of FL on local data in various scenarios of NFL.

Via

Access Paper or Ask Questions

BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

Nov 09, 2020
Zhebin Zhang, Sai Wu, Dawei Jiang, Gang Chen

Figure 1 for BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

Figure 2 for BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

Figure 3 for BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

Figure 4 for BERT-JAM: Boosting BERT-Enhanced Neural Machine Translation with Joint Attention

BERT-enhanced neural machine translation (NMT) aims at leveraging BERT-encoded representations for translation tasks. A recently proposed approach uses attention mechanisms to fuse Transformer's encoder and decoder layers with BERT's last-layer representation and shows enhanced performance. However, their method doesn't allow for the flexible distribution of attention between the BERT representation and the encoder/decoder representation. In this work, we propose a novel BERT-enhanced NMT model called BERT-JAM which improves upon existing models from two aspects: 1) BERT-JAM uses joint-attention modules to allow the encoder/decoder layers to dynamically allocate attention between different representations, and 2) BERT-JAM allows the encoder/decoder layers to make use of BERT's intermediate representations by composing them using a gated linear unit (GLU). We train BERT-JAM with a novel three-phase optimization strategy that progressively unfreezes different components of BERT-JAM. Our experiments show that BERT-JAM achieves SOTA BLEU scores on multiple translation tasks.

Via

Access Paper or Ask Questions