Task embeddings in multi-layer perceptrons for multi-task learning and inductive transfer learning in renewable power forecasts have recently been introduced. In many cases, this approach improves the forecast error and reduces the required training data. However, it does not take the seasonal influences in power forecasts within a day into account, i.e., the diurnal cycle. Therefore, we extended this idea to temporal convolutional networks to consider those seasonalities. We propose transforming the embedding space, which contains the latent similarities between tasks, through convolution and providing these results to the network's residual block. The proposed architecture significantly improves up to 25 percent for multi-task learning for power forecasts on the EuropeWindFarm and GermanSolarFarm dataset compared to the multi-layer perceptron approach. Based on the same data, we achieve a ten percent improvement for the wind datasets and more than 20 percent in most cases for the solar dataset for inductive transfer learning without catastrophic forgetting. Finally, we are the first proposing zero-shot learning for renewable power forecasts to provide predictions even if no training data is available.
Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success.
Current semi-supervised video object segmentation (VOS) methods usually leverage the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we present a Region Aware Video Object Segmentation (RAVOS) approach that predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict their ROIs in the next frame. For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation. For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames. Besides RAVOS, we also propose a large-scale dataset, dubbed OVOS, to benchmark the performance of VOS models under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new OVOS dataset show that our method achieves state-of-the-art performance with significantly faster inference time, e.g., 86.1 J&F at 42 FPS on DAVIS and 84.4 J&F at 23 FPS on YouTube-VOS.
NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male "Hamza"- narrating general content and news, and 2) expressive female "Amina"- narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively. The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet. NatiQ demo is available on-line at https://tts.qcri.org
Deep neural networks (DNNs) are one of the most highlighted methods in machine learning. However, as DNNs are black-box models, they lack explanatory power for their predictions. Recently, neural additive models (NAMs) have been proposed to provide this power while maintaining high prediction performance. In this paper, we propose a novel NAM approach for multivariate nowcasting (NC) problems, which comprise an important focus area of machine learning. For the multivariate time-series data used in NC problems, explanations should be considered for every input value to the variables at distinguishable time steps. By employing generalized additive models, the proposed NAM-NC successfully explains each input value's importance for multiple variables and time steps. Experimental results involving a toy example and two real-world datasets show that the NAM-NC predicts multivariate time-series data as accurately as state-of-the-art neural networks, while also providing the explanatory importance of each input value. We also examine parameter-sharing networks using NAM-NC to decrease their complexity, and NAM-MC's hard-tied feature net extracted explanations with good performance.
We present a safety verification framework for design-time and run-time assurance of learning-based components in aviation systems. Our proposed framework integrates two novel methodologies. From the design-time assurance perspective, we propose offline mixed-fidelity verification tools that incorporate knowledge from different levels of granularity in simulated environments. From the run-time assurance perspective, we propose reachability- and statistics-based online monitoring and safety guards for a learning-based decision-making model to complement the offline verification methods. This framework is designed to be loosely coupled among modules, allowing the individual modules to be developed using independent methodologies and techniques, under varying circumstances and with different tool access. The proposed framework offers feasible solutions for meeting system safety requirements at different stages throughout the system development and deployment cycle, enabling the continuous learning and assessment of the system product.
Deepfakes can degrade the fabric of society by limiting our ability to trust video content from leaders, authorities, and even friends. Cryptographically secure digital signatures may be used by video streaming platforms to endorse content, but these signatures are applied by the content distributor rather than the participants in the video. We introduce WordSig, a simple protocol allowing video participants to digitally sign the words they speak using a stream of QR codes, and allowing viewers to verify the consistency of signatures across videos. This allows establishing a trusted connection between the viewer and the participant that is not mediated by the content distributor. Given the widespread adoption of QR codes for distributing hyperlinks and vaccination records, and the increasing prevalence of celebrity deepfakes, 2022 or later may be a good time for public figures to begin using and promoting QR-based self-authentication tools.
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.
The Levenberg-Marquardt (LM) optimization algorithm has been widely used for solving machine learning problems. Literature reviews have shown that the LM can be very powerful and effective on moderate function approximation problems when the number of weights in the network is not more than a couple of hundred. In contrast, the LM does not seem to perform as well when dealing with pattern recognition or classification problems, and inefficient when networks become large (e.g. with more than 500 weights). In this paper, we exploit the true power of LM algorithm using some real world aircraft datasets. On these datasets most other commonly used optimizers are unable to detect the anomalies caused by the changing conditions of the aircraft engine. The challenging nature of the datasets are the abrupt changes in the time series data. We find that the LM optimizer has a much better ability to approximate abrupt changes and detect anomalies than other optimizers. We compare the performance, in addressing this anomaly/change detection problem, of the LM and several other optimizers. We assess the relative performance based on a range of measures including network complexity (i.e. number of weights), fitting accuracy, over fitting, training time, use of GPUs and memory requirement etc. We also discuss the issue of robust LM implementation in MATLAB and Tensorflow for promoting more popular usage of the LM algorithm and potential use of LM optimizer for large-scale problems.
In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. We propose a methodology based on unsupervised saliency masks and self-supervised feature clustering to kickstart object discovery followed by training a semantic segmentation network on pseudo-labels to bootstrap the system on images with multiple objects. We present results on PASCAL VOC that go far beyond the current state of the art (47.3 mIoU), and we report for the first time results on MS COCO for the whole set of 81 classes: our method discovers 34 categories with more than $20\%$ IoU, while obtaining an average IoU of 19.6 for all 81 categories.