It was recently proposed that neural networks could be used to approximate many-dimensional probability distributions that appear e.g. in lattice field theories or statistical mechanics. Subsequently they can be used as variational approximators to asses extensive properties of statistical systems, like free energy, and also as neural samplers used in Monte Carlo simulations. The practical application of this approach is unfortunately limited by its unfavorable scaling both of the numerical cost required for training, and the memory requirements with the system size. This is due to the fact that the original proposition involved a neural network of width which scaled with the total number of degrees of freedom, e.g. $L^2$ in case of a two dimensional $L\times L$ lattice. In this work we propose a hierarchical association of physical degrees of freedom, for instance spins, to neurons which replaces it with the scaling with the linear extent $L$ of the system. We demonstrate our approach on the two-dimensional Ising model by simulating lattices of various sizes up to $128 \times 128$ spins, with time benchmarks reaching lattices of size $512 \times 512$. We observe that our proposal improves the quality of neural network training, i.e. the approximated probability distribution is closer to the target that could be previously achieved. As a consequence, the variational free energy reaches a value closer to its theoretical expectation and, if applied in a Markov Chain Monte Carlo algorithm, the resulting autocorrelation time is smaller. Finally, the replacement of a single neural network by a hierarchy of smaller networks considerably reduces the memory requirements.
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ Self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population parameters under the constraint of local differential privacy (LDP). Given observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we introduce confidence intervals (CI) and time-uniform confidence sequences (CS) for $\mu^\star \in \mathbb R$ when only given access to the privatized data. We introduce a nonparametric and sequentially interactive generalization of Warner's famous "randomized response" mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. We extend these CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.
Matching cost construction is a key step in light field (LF) depth estimation, but was rarely studied in the deep learning era. Recent deep learning-based LF depth estimation methods construct matching cost by sequentially shifting each sub-aperture image (SAI) with a series of predefined offsets, which is complex and time-consuming. In this paper, we propose a simple and fast cost constructor to construct matching cost for LF depth estimation. Our cost constructor is composed by a series of convolutions with specifically designed dilation rates. By applying our cost constructor to SAI arrays, pixels under predefined disparities can be integrated and matching cost can be constructed without using any shifting operation. More importantly, the proposed cost constructor is occlusion-aware and can handle occlusions by dynamically modulating pixels from different views. Based on the proposed cost constructor, we develop a deep network for LF depth estimation. Our network ranks first on the commonly used 4D LF benchmark in terms of the mean square error (MSE), and achieves a faster running time than other state-of-the-art methods.
We consider systems that require timely monitoring of sources over a communication network, where the cost of delayed information is unknown, time-varying and possibly adversarial. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to the best fixed scheduling policy in hindsight, while remaining computationally feasible. The algorithm and its regret analysis are novel and of independent interest to the study of online restless multi-armed bandit problems. We further design algorithms that achieve sublinear regret compared to the best dynamic policy when the environment is slowly varying. Finally, we apply our algorithms to a mobility tracking problem. We consider non-stationary and adversarial mobility models and illustrate the performance benefit of using our online learning algorithms compared to an oblivious scheduling policy.
Hearth Rate (HR) monitoring is increasingly performed in wrist-worn devices using low-cost photoplethysmography (PPG) sensors. However, Motion Artifacts (MAs) caused by movements of the subject's arm affect the performance of PPG-based HR tracking. This is typically addressed coupling the PPG signal with acceleration measurements from an inertial sensor. Unfortunately, most standard approaches of this kind rely on hand-tuned parameters, which impair their generalization capabilities and their applicability to real data in the field. In contrast, methods based on deep learning, despite their better generalization, are considered to be too complex to deploy on wearable devices. In this work, we tackle these limitations, proposing a design space exploration methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring, all derived from a single "seed" model. Our flow involves a cascade of two Neural Architecture Search (NAS) tools and a hardware-friendly quantizer, whose combination yields both highly accurate and extremely lightweight models. When tested on the PPG-Dalia dataset, our most accurate model sets a new state-of-the-art in Mean Absolute Error. Furthermore, we deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution. Our most accurate quantized network achieves 4.41 Beats Per Minute (BPM) of Mean Absolute Error (MAE), with an energy consumption of 47.65 mJ and a memory footprint of 412 kB. At the same time, the smallest network that obtains a MAE < 8 BPM, among those generated by our flow, has a memory footprint of 1.9 kB and consumes just 1.79 mJ per inference.
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi
Reasoning about visual relationships is central to how humans interpret the visual world. This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i.e., systematic generalization. In this work, we use vision transformers (ViTs) as our base model for visual reasoning and make better use of concepts defined as object entities and their relations to improve the reasoning ability of ViTs. Specifically, we introduce a novel concept-feature dictionary to allow flexible image feature retrieval at training time with concept keys. This dictionary enables two new concept-guided auxiliary tasks: 1) a global task for promoting relational reasoning, and 2) a local task for facilitating semantic object-centric correspondence learning. To examine the systematic generalization of visual reasoning models, we introduce systematic splits for the standard HICO and GQA benchmarks. We show the resulting model, Concept-guided Vision Transformer (or RelViT for short) significantly outperforms prior approaches on HICO and GQA by 16% and 13% in the original split, and by 43% and 18% in the systematic split. Our ablation analyses also reveal our model's compatibility with multiple ViT variants and robustness to hyper-parameters.
Sparse linear regression methods including the well-known LASSO and the Dantzig selector have become ubiquitous in the engineering practice, including in medical imaging. Among other tasks, they have been successfully applied for the estimation of neuronal activity from functional magnetic resonance data without prior knowledge of the stimulus or activation timing, utilizing an approximate knowledge of the hemodynamic response to local neuronal activity. These methods work by generating a parametric family of solutions with different sparsity, among which an ultimate choice is made using an information criteria. We propose a novel approach, that instead of selecting a single option from the family of regularized solutions, utilizes the whole family of such sparse regression solutions. Namely, their ensemble provides a first approximation of probability of activation at each time-point, and together with the conditional neuronal activity distributions estimated with the theory of mixtures with varying concentrations, they serve as the inputs to a Bayes classifier eventually deciding on the verity of activation at each time-point. We show in extensive numerical simulations that this new method performs favourably in comparison with standard approaches in a range of realistic scenarios. This is mainly due to the avoidance of overfitting and underfitting that commonly plague the solutions based on sparse regression combined with model selection methods, including the corrected Akaike Information Criterion. This advantage is finally documented in selected fMRI task datasets.
In display advertising, a small group of sellers and bidders face each other in up to 10 12 auctions a day. In this context, revenue maximisation via monopoly price learning is a high-value problem for sellers. By nature, these auctions are online and produce a very high frequency stream of data. This results in a computational strain that requires algorithms be real-time. Unfortunately, existing methods inherited from the batch setting suffer O($\sqrt t$) time/memory complexity at each update, prohibiting their use. In this paper, we provide the first algorithm for online learning of monopoly prices in online auctions whose update is constant in time and memory.