We introduce synthetic oversampling in anomaly detection for multi-feature sequence datasets based on autoencoders and generative adversarial networks. The first approach considers the use of an autoencoder in conjunction with standard oversampling methods to generate synthetic data that captures the sequential nature of the data. A different model uses generative adversarial networks to generate structure preserving synthetic data for the minority class. We also use generative adversarial networks on the majority class as an outlier detection method for novelty detection. We show that the use of generative adversarial network based synthetic data improves classification model performance on a variety of sequence data sets.
Deep recurrent neural networks perform well on sequence data and are the model of choice. It is a daunting task to decide the number of layers, especially considering different computational needs for tasks within a sequence of different difficulties. We propose a layer flexible recurrent neural network with adaptive computational time, and expand it to a sequence to sequence model. Contrary to the adaptive computational time model, our model has a dynamic number of transmission states which vary by step and sequence. We evaluate the model on a financial dataset. Experimental results show the performance improvement and indicate the model's ability to dynamically change the number of layers.
Deep learning models based on CNNs are predominantly used in image classification tasks. Such approaches, assuming independence of object categories, normally use a CNN as a feature learner and apply a flat classifier on top of it. Object classes in many settings have hierarchical relations, and classifiers exploiting these relations should perform better. We propose hierarchical classification models combining a CNN to extract hierarchical representations of images, and an RNN or sequence-to-sequence model to capture a hierarchical tree of classes. In addition, we apply residual learning to the RNN part in oder to facilitate training our compound model and improve generalization of the model. Experimental results on a real world proprietary dataset of images show that our hierarchical networks perform better than state-of-the-art CNNs.
There are time series that are amenable to recurrent neural network (RNN) solutions when treated as sequences, but some series, e.g. asynchronous time series, provide a richer variation of feature types than current RNN cells take into account. In order to address such situations, we introduce a unified RNN that handles five different feature types, each in a different manner. Our RNN framework separates sequential features into two groups dependent on their frequency, which we call sparse and dense features, and which affect cell updates differently. Further, we also incorporate time features at the sequential level that relate to the time between specified events in the sequence and are used to modify the cell's memory state. We also include two types of static (whole sequence level) features, one related to time and one not, which are combined with the encoder output. The experiments show that the modeling framework proposed does increase performance compared to standard cells.
There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multi-instance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from other previous works in that we organize instances into relevant groups that are treated differently. We also introduce a method to replace instances that are missing which successfully creates neutral input instances and consistently outperforms standard fill-in methods in real world use cases. In addition, we propose a method for manual dropout when a whole group of instances is missing that allows us to use richer training data and obtain higher accuracy at the end of training. With specific pretraining, we find that the model works to great effect on our real world and pub-lic datasets in comparison to baseline methods, justifying the different treatment among groups of instances.
Recurrent neural networks and sequence to sequence models require a predetermined length for prediction output length. Our model addresses this by allowing the network to predict a variable length output in inference. A new loss function with a tailored gradient computation is developed that trades off prediction accuracy and output length. The model utilizes a function to determine whether a particular output at a time should be evaluated or not given a predetermined threshold. We evaluate the model on the problem of predicting the prices of securities. We find that the model makes longer predictions for more stable securities and it naturally balances prediction accuracy and length.
In this work, we design a machine learning based method, online adaptive primal support vector regression (SVR), to model the implied volatility surface (IVS). The algorithm proposed is the first derivation and implementation of an online primal kernel SVR. It features enhancements that allow efficient online adaptive learning by embedding the idea of local fitness and budget maintenance to dynamically update support vectors upon pattern drifts. For algorithm acceleration, we implement its most computationally intensive parts in a Field Programmable Gate Arrays hardware, where a 132x speedup over CPU is achieved during online prediction. Using intraday tick data from the E-mini S&P 500 options market, we show that the Gaussian kernel outperforms the linear kernel in regulating the size of support vectors, and that our empirical IVS algorithm beats two competing online methods with regards to model complexity and regression errors (the mean absolute percentage error of our algorithm is up to 13%). Best results are obtained at the center of the IVS grid due to its larger number of adjacent support vectors than the edges of the grid. Sensitivity analysis is also presented to demonstrate how hyper parameters affect the error rates and model complexity.
This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to previous approaches using tabular representations. Moreover, with sub-optimal expert demonstrations our algorithms recover both reward functions and strategies with good quality.
The objective of this work is to take advantage of deep neural networks in order to make next day crime count predictions in a fine-grain city partition. We make predictions using Chicago and Portland crime data, which is augmented with additional datasets covering weather, census data, and public transportation. The crime counts are broken into 10 bins and our model predicts the most likely bin for a each spatial region at a daily level. We train this data using increasingly complex neural network structures, including variations that are suited to the spatial and temporal aspects of the crime prediction problem. With our best model we are able to predict the correct bin for overall crime count with 75.6% and 65.3% accuracy for Chicago and Portland, respectively. The results show the efficacy of neural networks for the prediction problem and the value of using external datasets in addition to standard crime data.