Quantizing deep neural networks (DNNs) has been a promising solution for deploying deep neural networks on embedded devices. However, most of the existing methods do not quantize gradients, and the process of quantizing DNNs still has a lot of floating-point operations, which hinders the further applications of quantized DNNs. To solve this problem, we propose a new heuristic method based on cooperative coevolution for quantizing DNNs. Under the framework of cooperative coevolution, we use the estimation of distribution algorithm to search for the low-bits weights. Specifically, we first construct an initial quantized network from a pre-trained network instead of random initialization and then start searching from it by restricting the search space. So far, the problem is the largest discrete problem known to be solved by evolutionary algorithms. Experiments show that our method can train 4 bit ResNet-20 on the Cifar-10 dataset without sacrificing accuracy.
The field of adversarial textual attack has significantly grown over the last years, where the commonly considered objective is to craft adversarial examples that can successfully fool the target models. However, the imperceptibility of attacks, which is also an essential objective, is often left out by previous studies. In this work, we advocate considering both objectives at the same time, and propose a novel multi-optimization approach (dubbed HydraText) with provable performance guarantee to achieve successful attacks with high imperceptibility. We demonstrate the efficacy of HydraText through extensive experiments under both score-based and decision-based settings, involving five modern NLP models across five benchmark datasets. In comparison to existing state-of-the-art attacks, HydraText consistently achieves simultaneously higher success rates, lower modification rates, and higher semantic similarity to the original texts. A human evaluation study shows that the adversarial examples crafted by HydraText maintain validity and naturality well. Finally, these examples also exhibit good transferability and can bring notable robustness improvement to the target models by adversarial training.
Over the past few years, various word-level textual attack approaches have been proposed to reveal the vulnerability of deep neural networks used in natural language processing. Typically, these approaches involve an important optimization step to determine which substitute to be used for each word in the original input. However, current research on this step is still rather limited, from the perspectives of both problem-understanding and problem-solving. In this paper, we address these issues by uncovering the theoretical properties of the problem and proposing an efficient local search algorithm (LS) to solve it. We establish the first provable approximation guarantee on solving the problem in general cases. Notably, for adversarial textual attack, it is even better than the previous bound which only holds in special case. Extensive experiments involving five NLP tasks, six datasets and eleven NLP models show that LS can largely reduce the number of queries usually by an order of magnitude to achieve high attack success rates. Further experiments show that the adversarial examples crafted by LS usually have higher quality, exhibit better transferability, and can bring more robustness improvement to victim models by adversarial training.
We predict asset returns and measure risk premia using a prominent technique from artificial intelligence -- deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this paper, we first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. We then perform a comparative analysis of these methods using data on U.S. equities. We demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence, and that Long- and Short-term Memory (LSTM) based models tend to have the best out-of-sample performance.
The past decade has seen the rapid development of Reinforcement Learning, which acquires impressive performance with numerous training resources. However, one of the greatest challenges in RL is generalization efficiency (i.e., generalization performance in a unit time). This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection. Given a number of instances, the algorithm chooses out valuable instances as training sets while training the policy, thereby costing fewer resources. Unlike existing approaches, we attempt to actively select and use training data rather than train on all the given data, thereby costing fewer resources. Furthermore, we introduce a general instance evaluation metrics and selection mechanism into the framework. Experiments results reveal that the proposed framework with Proximal Policy Optimization as policy optimizer can effectively improve generalization efficiency than unselect-ed and unbiased selected methods.
Building classifiers on multiple domains is a practical problem in the real life. Instead of building classifiers one by one, multi-domain learning (MDL) simultaneously builds classifiers on multiple domains. MDL utilizes the information shared among the domains to improve the performance. As a supervised learning problem, the labeling effort is still high in MDL problems. Usually, this high labeling cost issue could be relieved by using active learning. Thus, it is natural to utilize active learning to reduce the labeling effort in MDL, and we refer this setting as multi-domain active learning (MDAL). However, there are only few works which are built on this setting. And when the researches have to face this problem, there is no off-the-shelf solutions. Under this circumstance, combining the current multi-domain learning models and single-domain active learning strategies might be a preliminary solution for MDAL problem. To find out the potential of this preliminary solution, a comparative study over 5 models and 4 selection strategies is made in this paper. To the best of our knowledge, this is the first work provides the formal definition of MDAL. Besides, this is the first comparative work for MDAL problem. From the results, the Multinomial Adversarial Networks (MAN) model with a simple best vs second best (BvSB) uncertainty strategy shows its superiority in most cases. We take this combination as our off-the-shelf recommendation for the MDAL problem.
Dynamic Network Embedding (DNE) has recently attracted considerable attention due to the advantage of network embedding in various applications and the dynamic nature of many real-world networks. For dynamic networks, the degree of changes, i.e., defined as the averaged number of changed edges between consecutive snapshots spanning a dynamic network, could be very different in real-world scenarios. Although quite a few DNE methods have been proposed, it still remains unclear that whether and to what extent the existing DNE methods are robust to the degree of changes, which is however an important factor in both academic research and industrial applications. In this work, we investigate the robustness issue of DNE methods w.r.t. the degree of changes for the first time and accordingly, propose a robust DNE method. Specifically, the proposed method follows the notion of ensembles where the base learner adopts an incremental Skip-Gram neural embedding approach. To further boost the performance, a novel strategy is proposed to enhance the diversity among base learners at each timestep by capturing different levels of local-global topology. Extensive experiments demonstrate the benefits of special designs in the proposed method, and the superior performance of the proposed method compared to state-of-the-art methods. The comparative study also reveals the robustness issue of some DNE methods. The source code is available at https://github.com/houchengbin/SG-EDNE
Evolutionary algorithms (EAs) are general-purpose optimization algorithms, inspired by natural evolution. Recent theoretical studies have shown that EAs can achieve good approximation guarantees for solving the problem classes of submodular optimization, which have a wide range of applications, such as maximum coverage, sparse regression, influence maximization, document summarization and sensor placement, just to name a few. Though they have provided some theoretical explanation for the general-purpose nature of EAs, the considered submodular objective functions are defined only over sets or multisets. To complement this line of research, this paper studies the problem class of maximizing monotone submodular functions over sequences, where the objective function depends on the order of items. We prove that for each kind of previously studied monotone submodular objective functions over sequences, i.e., prefix monotone submodular functions, weakly monotone and strongly submodular functions, and DAG monotone submodular functions, a simple multi-objective EA, i.e., GSEMO, can always reach or improve the best known approximation guarantee after running polynomial time in expectation. Note that these best-known approximation guarantees can be obtained only by different greedy-style algorithms before. Empirical studies on various applications, e.g., accomplishing tasks, maximizing information gain, search-and-tracking and recommender systems, show the excellent performance of the GSEMO.
Black-box problems are common in real life like structural design, drug experiments, and machine learning. When optimizing black-box systems, decision-makers always consider multiple performances and give the final decision by comprehensive evaluations. Motivated by such practical needs, we focus on constrained black-box problems where the objective and constraints lack known special structure, and evaluations are expensive and even with noise. We develop a novel constrained Bayesian optimization approach based on the knowledge gradient method ($c-\rm{KG}$). A new acquisition function is proposed to determine the next batch of samples considering optimality and feasibility. An unbiased estimator of the gradient of the new acquisition function is derived to implement the $c-\rm{KG}$ approach.