We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. The challenges of this problem arise from the tension among safety, exploration, performance, and computation. To address these challenges, we propose a polynomial-time algorithm that guarantees feasibility and constraint satisfaction with high probability under proper conditions. Our algorithm is implemented on a single trajectory and does not require system restarts. Further, we analyze the regret of our learning algorithm compared to the optimal safe linear controller with known model information. The proposed algorithm can achieve a $\tilde O(T^{2/3})$ regret, where $T$ is the number of stages and $\tilde O(\cdot)$ absorbs some logarithmic terms of $T$.
In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video. Specifically, given an untrimmed video, WSSTAD aims to localize a spatio-temporal tube (i.e., a sequence of bounding boxes at consecutive times) that encloses the abnormal event, with only coarse video-level annotations as supervision during training. To address this challenging task, we propose a dual-branch network which takes as input the proposals with multi-granularities in both spatial-temporal domains. Each branch employs a relationship reasoning module to capture the correlation between tubes/videolets, which can provide rich contextual information and complex entity relationships for the concept learning of abnormal behaviors. Mutually-guided Progressive Refinement framework is set up to employ dual-path mutual guidance in a recurrent manner, iteratively sharing auxiliary supervision information across branches. It impels the learned concepts of each branch to serve as a guide for its counterpart, which progressively refines the corresponding branch and the whole framework. Furthermore, we contribute two datasets, i.e., ST-UCF-Crime and STRA, consisting of videos containing spatio-temporal abnormal annotations to serve as the benchmarks for WSSTAD. We conduct extensive qualitative and quantitative evaluations to demonstrate the effectiveness of the proposed approach and analyze the key factors that contribute more to handle this task.
The detection of traffic anomalies is a critical component of the intelligent city transportation management system. Previous works have proposed a variety of notable insights and taken a step forward in this field, however, dealing with the complex traffic environment remains a challenge. Moreover, the lack of high-quality data and the complexity of the traffic scene, motivate us to study this problem from a hand-crafted perspective. In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing. With video stabilization, background modeling, and vehicle detection, the pro-processing phase aims to generate candidate anomalies. The dynamic tracking module seeks and locates the start time of anomalies by utilizing vehicle motion patterns and spatiotemporal status. Finally, we use post-processing to fine-tune the temporal boundary of anomalies. Not surprisingly, our proposed framework was ranked $1^{st}$ in the NVIDIA AI CITY 2021 leaderboard for traffic anomaly detection. The code is available at: https://github.com/Endeavour10020/AICity2021-Anomaly-Detection .
Long-range and short-range temporal modeling are two complementary and crucial aspects of video recognition. Most of the state-of-the-arts focus on short-range spatio-temporal modeling and then average multiple snippet-level predictions to yield the final video-level prediction. Thus, their video-level prediction does not consider spatio-temporal features of how video evolves along the temporal dimension. In this paper, we introduce a novel Dynamic Segment Aggregation (DSA) module to capture relationship among snippets. To be more specific, we attempt to generate a dynamic kernel for a convolutional operation to aggregate long-range temporal information among adjacent snippets adaptively. The DSA module is an efficient plug-and-play module and can be combined with the off-the-shelf clip-based models (i.e., TSM, I3D) to perform powerful long-range modeling with minimal overhead. The final video architecture, coined as DSANet. We conduct extensive experiments on several video recognition benchmarks (i.e., Mini-Kinetics-200, Kinetics-400, Something-Something V1 and ActivityNet) to show its superiority. Our proposed DSA module is shown to benefit various video recognition models significantly. For example, equipped with DSA modules, the top-1 accuracy of I3D ResNet-50 is improved from 74.9% to 78.2% on Kinetics-400. Codes will be available.
We consider online convex optimization with time-varying stage costs and additional switching costs. Since the switching costs introduce coupling across all stages, multi-step-ahead (long-term) predictions are incorporated to improve the online performance. However, longer-term predictions tend to suffer from lower quality. Thus, a critical question is: how to reduce the impact of long-term prediction errors on the online performance? To address this question, we introduce a gradient-based online algorithm, Receding Horizon Inexact Gradient (RHIG), and analyze its performance by dynamic regrets in terms of the temporal variation of the environment and the prediction errors. RHIG only considers at most $W$-step-ahead predictions to avoid being misled by worse predictions in the longer term. The optimal choice of $W$ suggested by our regret bounds depends on the tradeoff between the variation of the environment and the prediction accuracy. Additionally, we apply RHIG to a well-established stochastic prediction error model and provide expected regret and concentration bounds under correlated prediction errors. Lastly, we numerically test the performance of RHIG on quadrotor tracking problems.
Video segmentation approaches are of great importance for numerous vision tasks especially in video manipulation for entertainment. Due to the challenges associated with acquiring high-quality per-frame segmentation annotations and large video datasets with different environments at scale, learning approaches shows overall higher accuracy on test dataset but lack strict temporal constraints to self-correct jittering artifacts in most practical applications. We investigate how this jittering artifact degrades the visual quality of video segmentation results and proposed a metric of temporal stability to numerically evaluate it. In particular, we propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts, which combines with high accuracy and high consistency. Equipped with our method, existing video object/semantic segmentation approaches achieve a significant improvement in term of more satisfactory visual quality on video human dataset, which we provide for further research in this field, and also on DAVIS and Cityscape.
This paper studies the automated control method for regulating air conditioner (AC)-type loads in incentive-based residential demand response (DR). The critical challenge is that the customer responses to load adjustment are uncertain and unknown in practice. In this paper, we formulate the AC control problem in a DR event as a Markov decision process that integrates the indoor thermal dynamics and customer opt-out status transition. Specifically, machine learning techniques including Gaussian process and logistic regression are employed to learn the unknown thermal dynamics model and customer opt-out behavior model, respectively. We consider two typical DR objectives for AC load control: 1) minimizing the total load demand, 2) closely tracking a regulated power trajectory. Based on the Thompson sampling framework, we propose an online DR control algorithm to learn the customer behaviors and make real-time AC control schemes. This algorithm considers the influence of various environmental factors on customer behaviors, and is implemented in a distributed fashion to preserve the privacy of customers. Numerical simulations demonstrate the control optimality and learning efficiency of the proposed algorithm.
This paper considers online optimal control with affine constraints on the states and actions under linear dynamics with random disturbances. We consider convex stage cost functions that change adversarially. Besides, we consider time-invariant and known system dynamics and constraints. To solve this problem, we propose Online Gradient Descent with Buffer Zone (OGD-BZ). Theoretically, we show that OGD-BZ can guarantee the system to satisfy all the constraints despite any realization of the disturbances under proper parameters. Further, we investigate the policy regret of OGD-BZ, which compares OGD-BZ's performance with the performance of the optimal linear policy in hindsight. We show that OGD-BZ can achieve $\tilde O(\sqrt T)$ policy regret under proper parameters, where $\tilde O(\cdot)$ absorbs logarithmic terms of $T$.
This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose the Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated by ZODPO are stabilizing with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system.