Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

TND-NAS: Towards Non-differentiable Objectives in Progressive Differentiable NAS Framework

Nov 06, 2021
Bo Lyu, Shiping Wen, Zheng Yan, Kaibo Shi, Ke Li, Tingwen Huang

Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its capability to improve efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving search efficiency, reducing the GPU-memory consumption, and addressing the "depth gap" issue. However, these methods are no longer capable of tackling the non-differentiable objectives, let alone multi-objectives, e.g., performance, robustness, efficiency, and other metrics. We propose an end-to-end architecture search framework towards non-differentiable objectives, TND-NAS, with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS (MNAS). Under differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the search policy of progressively shrinking the supernetwork by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3%, 2.4M/2.95%, 9.57M/2.54%) and CIFAR100 (2.46M/18.3%, 5.46/16.73%, 12.88/15.20%) datasets. Favorably, under real-world scenarios (resource-constrained, platform-specialized), the Pareto-optimal solutions can be conveniently reached by TND-NAS.

  Access Paper or Ask Questions

Stock price prediction using BERT and GAN

Jul 18, 2021
Priyank Sonkiya, Vikas Bajpai, Anukriti Bansal

The stock market has been a popular topic of interest in the recent past. The growth in the inflation rate has compelled people to invest in the stock and commodity markets and other areas rather than saving. Further, the ability of Deep Learning models to make predictions on the time series data has been proven time and again. Technical analysis on the stock market with the help of technical indicators has been the most common practice among traders and investors. One more aspect is the sentiment analysis - the emotion of the investors that shows the willingness to invest. A variety of techniques have been used by people around the globe involving basic Machine Learning and Neural Networks. Ranging from the basic linear regression to the advanced neural networks people have experimented with all possible techniques to predict the stock market. It's evident from recent events how news and headlines affect the stock markets and cryptocurrencies. This paper proposes an ensemble of state-of-the-art methods for predicting stock prices. Firstly sentiment analysis of the news and the headlines for the company Apple Inc, listed on the NASDAQ is performed using a version of BERT, which is a pre-trained transformer model by Google for Natural Language Processing (NLP). Afterward, a Generative Adversarial Network (GAN) predicts the stock price for Apple Inc using the technical indicators, stock indexes of various countries, some commodities, and historical prices along with the sentiment scores. Comparison is done with baseline models like - Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), vanilla GAN, and Auto-Regressive Integrated Moving Average (ARIMA) model.

  Access Paper or Ask Questions

Question Generation for Supporting Informational Query Intents

Oct 19, 2020
Xusen Yin, Jonathan May, Li Zhou, Kevin Small

Users frequently ask simple factoid questions when encountering question answering (QA) systems, attenuating the impact of myriad recent works designed to support more complex questions. Prompting users with automatically generated suggested questions (SQs) can improve understanding of QA system capabilities and thus facilitate using this technology more effectively. While question generation (QG) is a well-established problem, existing methods are not targeted at producing SQ guidance for human users seeking more in-depth information about a specific concept. In particular, existing QG works are insufficient for this task as the generated questions frequently (1) require access to supporting documents as comprehension context (e.g., How many points did LeBron score?) and (2) focus on short answer spans, often producing peripheral factoid questions unlikely to attract interest. In this work, we aim to generate self-explanatory questions that focus on the main document topics and are answerable with variable length passages as appropriate. We satisfy these requirements by using a BERT-based Pointer-Generator Network (BertPGN) trained on the Natural Questions (NQ) dataset. First, we show that the BertPGN model produces state-of-the-art QG performance for long and short answers for in-domain NQ (BLEU-4 for 20.13 and 28.09, respectively). Secondly, we evaluate this QG model on the out-of-domain NewsQA dataset automatically and with human evaluation, demonstrating that our method produces better SQs for news articles, even those from a different domain than the training data.

* 9 pages 

  Access Paper or Ask Questions

Deep Learning-based Pipeline for Module Power Prediction from EL Measurements

Sep 30, 2020
Mathis Hoffmann, Claudia Buerhop-Lutz, Luca Reeb, Tobias Pickel, Thilo Winkler, Bernd Doll, Tobias W眉rfl, Ian Marius Peters, Christoph Brabec, Andreas Maier, Vincent Christlein

Automated inspection plays an important role in monitoring large-scale photovoltaic power plants. Commonly, electroluminescense measurements are used to identify various types of defects on solar modules but have not been used to determine the power of a module. However, knowledge of the power at maximum power point is important as well, since drops in the power of a single module can affect the performance of an entire string. By now, this is commonly determined by measurements that require to discontact or even dismount the module, rendering a regular inspection of individual modules infeasible. In this work, we bridge the gap between electroluminescense measurements and the power determination of a module. We compile a large dataset of 719 electroluminescense measurementsof modules at various stages of degradation, especially cell cracks and fractures, and the corresponding power at maximum power point. Here,we focus on inactive regions and cracks as the predominant type of defect. We set up a baseline regression model to predict the power from electroluminescense measurements with a mean absolute error of 9.0+/-3.7W (4.0+/-8.4%). Then, we show that deep-learning can be used to train a model that performs significantly better (7.3+/-2.7W or 3.2+/-6.5%). With this work, we aim to open a new research topic. Therefore, we publicly release the dataset, the code and trained models to empower other researchers to compare against our results. Finally, we present a thorough evaluation of certain boundary conditions like the dataset size and an automated preprocessing pipeline for on-site measurements showing multiple modules at once.

  Access Paper or Ask Questions

Dise帽o e implementaci贸n de una meta-heur铆stica multi-poblacional de optimizaci贸n combinatoria enfocada a la resoluci贸n de problemas de asignaci贸n de rutas a veh铆culos

Mar 24, 2020
Eneko Osaba

Transportation is an essential area in the nowadays society, both for business sector and citizenry. There are different kinds of transportation systems, each one with its own characteristics. In the same way, various areas of knowledge can deal efficiently with the transport planning. The majority of the problems related with the transport and logistics have common characteristics, so they can be modeled as optimization problems, being able to see them as special cases of other generic problems. These problems fit into the combinatorial optimization field. Much of the problems of this type have an exceptional complexity. A great amount of meta-heuristics can be found the literature, each one with its advantages and disadvantages. Due to the high complexity of combinatorial optimization problems, there is no technique able to solve all these problems optimally. This fact makes the fields of combinatorial optimization and vehicle routing problems be a hot topic of research. This doctoral thesis will focus its efforts on developing a new meta-heuristic to solve different kind of vehicle routing problems. The presented technique offers an added value compared to existing methods, either in relation to the performance, and the contribution of conceptual originality. With the aim of validating the proposed model, the results obtained by the developed meta-heuristic have been compared with the ones obtained by other four algorithms of similar philosophy. Four well-known routing problems have been used in this experimentation, as well as two classical combinatorial optimization problems. In addition to the comparisons based on parameters such as the mean, or the standard deviation, two different statistical tests have been carried out. Thanks to these tests it can be affirmed that the proposed meta-heuristic is competitive in terms of performance and conceptual originality.

* 228 pages, in Spanish, 25 figures, This is my PhD thesis in the University of Deusto, Bilbao, Spain 

  Access Paper or Ask Questions

Privacy Attacks on Network Embeddings

Dec 23, 2019
Michael Ellers, Michael Cochez, Tobias Schumacher, Markus Strohmaier, Florian Lemmerich

Data ownership and data protection are increasingly important topics with ethical and legal implications, e.g., with the right to erasure established in the European General Data Protection Regulation (GDPR). In this light, we investigate network embeddings, i.e., the representation of network nodes as low-dimensional vectors. We consider a typical social network scenario with nodes representing users and edges relationships between them. We assume that a network embedding of the nodes has been trained. After that, a user demands the removal of his data, requiring the full deletion of the corresponding network information, in particular the corresponding node and incident edges. In that setting, we analyze whether after the removal of the node from the network and the deletion of the vector representation of the respective node in the embedding significant information about the link structure of the removed node is still encoded in the embedding vectors of the remaining nodes. This would require a (potentially computationally expensive) retraining of the embedding. For that purpose, we deploy an attack that leverages information from the remaining network and embedding to recover information about the neighbors of the removed node. The attack is based on (i) measuring distance changes in network embeddings and (ii) a machine learning classifier that is trained on networks that are constructed by removing additional nodes. Our experiments demonstrate that substantial information about the edges of a removed node/user can be retrieved across many different datasets. This implies that to fully protect the privacy of users, node deletion requires complete retraining - or at least a significant modification - of original network embeddings. Our results suggest that deleting the corresponding vector representation from network embeddings alone is not sufficient from a privacy perspective.

  Access Paper or Ask Questions

Dual Residual Network for Accurate Human Activity Recognition

Mar 13, 2019
Jun Long, WuQing Sun, Zhan Yang, Osolo Ian Raymond, Bin Li

Human Activity Recognition (HAR) using deep neural network has become a hot topic in human-computer interaction. Machine can effectively identify human naturalistic activities by learning from a large collection of sensor data. Activity recognition is not only an interesting research problem, but also has many real-world practical applications. Based on the success of residual networks in achieving a high level of aesthetic representation of the automatic learning, we propose a novel \textbf{D}ual \textbf{R}esidual \textbf{N}etwork, named DRN. DRN is implemented using two identical path frameworks consisting of (1) a short time window, which is used to capture spatial features, and (2) a long time window, which is used to capture fine temporal features. The long time window path can be made very lightweight by reducing its channel capacity, yet still being able to learn useful temporal representations for activity recognition. In this paper, we mainly focus on proposing a new model to improve the accuracy of HAR. In order to demonstrate the effectiveness of DRN model, we carried out extensive experiments and compared with conventional recognition methods (HC, CBH, CBS) and learning-based methods (AE, MLP, CNN, LSTM, Hybrid, ResNet). The benchmark datasets (OPPORTUNITY, UniMiB-SHAR) were adopted by our experiments. Results from our experiments show that our model is effective in recognizing human activities via wearable datasets. We discuss the influence of networks parameters on performance to provide insights about its optimization.

* submitted to Information 

  Access Paper or Ask Questions

Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning

Jan 04, 2019
Roi Ceren

Optimal decision making with limited or no information in stochastic environments where multiple agents interact is a challenging topic in the realm of artificial intelligence. Reinforcement learning (RL) is a popular approach for arriving at optimal strategies by predicating stimuli, such as the reward for following a strategy, on experience. RL is heavily explored in the single-agent context, but is a nascent concept in multiagent problems. To this end, I propose several principled model-free and partially model-based reinforcement learning approaches for several multiagent settings. In the realm of normative reinforcement learning, I introduce scalable extensions to Monte Carlo exploring starts for partially observable Markov Decision Processes (POMDP), dubbed MCES-P, where I expand the theory and algorithm to the multiagent setting. I first examine MCES-P with probably approximately correct (PAC) bounds in the context of multiagent setting, showing MCESP+PAC holds in the presence of other agents. I then propose a more sample-efficient methodology for antagonistic settings, MCESIP+PAC. For cooperative settings, I extend MCES-P to the Multiagent POMDP, dubbed MCESMP+PAC. I then explore the use of reinforcement learning as a methodology in searching for optima in realistic and latent model environments. First, I explore a parameterized Q-learning approach in modeling humans learning to reason in an uncertain, multiagent environment. Next, I propose an implementation of MCES-P, along with image segmentation, to create an adaptive team-based reinforcement learning technique to positively identify the presence of phenotypically-expressed water and pathogen stress in crop fields.

* Phd thesis, University of Georgia (2018) 

  Access Paper or Ask Questions

A Survey on Application of Machine Learning Techniques in Optical Networks

Oct 05, 2018
Francesco Musumeci, Cristina Rottondi, Avishek Nag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore

Today's telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users' behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions.

  Access Paper or Ask Questions

Story Disambiguation: Tracking Evolving News Stories across News and Social Streams

Aug 16, 2018
Bichen Shi, Thanh-Binh Le, Neil Hurley, Georgiana Ifrim

Following a particular news story online is an important but difficult task, as the relevant information is often scattered across different domains/sources (e.g., news articles, blogs, comments, tweets), presented in various formats and language styles, and may overlap with thousands of other stories. In this work we join the areas of topic tracking and entity disambiguation, and propose a framework named Story Disambiguation - a cross-domain story tracking approach that builds on real-time entity disambiguation and a learning-to-rank framework to represent and update the rich semantic structure of news stories. Given a target news story, specified by a seed set of documents, the goal is to effectively select new story-relevant documents from an incoming document stream. We represent stories as entity graphs and we model the story tracking problem as a learning-to-rank task. This enables us to track content with high accuracy, from multiple domains, in real-time. We study a range of text, entity and graph based features to understand which type of features are most effective for representing stories. We further propose new semi-supervised learning techniques to automatically update the story representation over time. Our empirical study shows that we outperform the accuracy of state-of-the-art methods for tracking mixed-domain document streams, while requiring fewer labeled data to seed the tracked stories. This is particularly the case for local news stories that are easily over shadowed by other trending stories, and for complex news stories with ambiguous content in noisy stream environments.

* 24 pages 

  Access Paper or Ask Questions