This study presents a semi-nonparametric Latent Class Choice Model (LCCM) with a flexible class membership component. The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification with the aim of comparing the two approaches on various measures including prediction accuracy and representation of heterogeneity in the choice process. Mixture models are parametric model-based clustering techniques that have been widely used in areas such as machine learning, data mining and patter recognition for clustering and classification problems. An Expectation-Maximization (EM) algorithm is derived for the estimation of the proposed model. Using two different case studies on travel mode choice behavior, the proposed model is compared to traditional discrete choice models on the basis of parameter estimates' signs, value of time, statistical goodness-of-fit measures, and cross-validation tests. Results show that mixture models improve the overall performance of latent class choice models by providing better out-of-sample prediction accuracy in addition to better representations of heterogeneity without weakening the behavioral and economic interpretability of the choice models.
Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.
Reaction-diffusion systems are ubiquitous in nature and in engineering applications, and are often modeled using a non-linear system of governing equations. While robust numerical methods exist to solve them, deep learning-based reduced ordermodels (ROMs) are gaining traction as they use linearized dynamical models to advance the solution in time. One such family of algorithms is based on Koopman theory, and this paper applies this numerical simulation strategy to reaction-diffusion systems. Adversarial and gradient losses are introduced, and are found to robustify the predictions. The proposed model is extended to handle missing training data as well as recasting the problem from a control perspective. The efficacy of these developments are demonstrated for two different reaction-diffusion problems: (1) the Kuramoto-Sivashinsky equation of chaos and (2) the Turing instability using the Gray-Scott model.
Aiming at a drastic speedup for point-data embeddings at test time, we propose a new framework that uses a pair of multi-layer perceptron (MLP) and look-up table (LUT) to transform point-coordinate inputs into high-dimensional features. When compared with PointNet's feature embedding part realized by MLP that requires millions of dot products, ours at test time requires no such layers of matrix-vector products but requires only looking up the nearest entities followed by interpolation, from the tabulated MLP defined over discrete inputs on a 3D lattice. We call this framework as "LUTI-MLP: LUT Interpolation MLP" that provides a way to train end-to-end tabulated MLP coupled to a LUT in a specific manner without the need for any approximation at test time. LUTI-MLP also provides significant speedup for Jacobian computation of the embedding function wrt global pose coordinate on Lie algebra $\mathfrak{se}(3)$ at test time, which could be used for point-set registration problems. After extensive architectural analysis using ModelNet40 dataset, we confirmed that our LUTI-MLP even with a small-sized table ($8\times 8\times 8$) yields performance comparable to that of MLP while achieving significant speedup: $80\times$ for embedding, $12\times$ for approximate Jacobian, and $860\times$ for canonical Jacobian.
Driving behaviour has a great impact on road safety. A popular way of analysing driving behaviour is to move the focus to the manoeuvres as they give useful information about the driver who is performing them. In this paper, we investigate a new way of identifying manoeuvres from vehicle telematics data, through motif detection in time-series. We implement a modified version of the Extended Motif Discovery (EMD) algorithm, a classical variable-length motif detection algorithm for time-series and we applied it to the UAH-DriveSet, a publicly available naturalistic driving dataset. After a systematic exploration of the extracted motifs, we were able to conclude that the EMD algorithm was not only capable of extracting simple manoeuvres such as accelerations, brakes and curves, but also more complex manoeuvres, such as lane changes and overtaking manoeuvres, which validates motif discovery as a worthwhile line for future research.
Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach -- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU improvement on average compared to mBART, when finetuned on supervised machine translation downstream tasks.
We propose a hierarchical learning strategy aimed at generating sparse representations and associated models for large noisy datasets. The hierarchy follows from approximation spaces identified at successively finer scales. For promoting model generalization at each scale, we also introduce a novel, projection based penalty operator across multiple dimension, using permutation operators for incorporating proximity and ordering information. The paper presents a detailed analysis of approximation properties in the reconstruction Reproducing Kernel Hilbert Spaces (RKHS) with emphasis on optimality and consistency of predictions and behavior of error functionals associated with the produced sparse representations. Results show the performance of the approach as a data reduction and modeling strategy on both synthetic (univariate and multivariate) and real datasets (time series). The sparse model for the test datasets, generated by the presented approach, is also shown to efficiently reconstruct the underlying process and preserve generalizability.
Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise is the outcome of their predictions. However, this comes at a cost: their running time for classification grows linearly with the number of trees, i.e. the size of the forest. In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram. Our experiments on various popular datasets show speed-ups of several orders of magnitude, while, at the same time, also significantly reducing the size of the required data structure.
Recurrent neural networks (RNN) such as long-short-term memory (LSTM) networks are essential in a multitude of daily live tasks such as speech, language, video, and multimodal learning. The shift from cloud to edge computation intensifies the need to contain the growth of RNN parameters. Current research on RNN shows that despite the performance obtained on convolutional neural networks (CNN), keeping a good performance in compressed RNNs is still a challenge. Most of the literature on compression focuses on CNNs using matrix product (MPO) operator tensor trains. However, matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference. We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and resnet (Resnet-18 and Resnet-34) models, and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.