Department of Automation, Shanghai Jiao Tong University, Shanghai, China, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China, Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai, China
Abstract:With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA.
Abstract:Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to improve FSM from two perspectives: depth network structure optimization and training pipeline optimization. First, we construct a transformer-based depth network with neighbor-enhanced cross-view attention (NCA). The cross-attention modules can better aggregate the cross-view context in both global and neighboring views. Second, we formulate a transformer-based feature matching scheme with progressive training to improve the structure-from-motion (SfM) pipeline. That allows us to learn scale-awareness with sufficient matches and further facilitate network convergence by removing mismatches based on SfM loss. Experiments demonstrate that the resulting Scale-aware full surround monodepth (SA-FSM) method largely improves the scale-aware depth predictions without median-scaling at the test time, and performs favorably against the state-of-the-art FSM methods, e.g., surpassing SurroundDepth by 3.8% in terms of accuracy at delta<1.25 on the DDAD benchmark.
Abstract:The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within continuous space. To further optimize multi-agent cooperation, the Voronoi partition technique is employed, ensuring a reduction in repetitive flight patterns and making the control of multiple agents in a decentralized way. Through a series of experiments, the evaluation and comparison results demonstrate the efficiency of our approach in various environments. The primary application of this innovative approach is demonstrated in the search for horseshoe crabs within their wild habitats, showcasing its potential to revolutionize ecological survey and conservation efforts.
Abstract:Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how information loss in file formats impacts the effectiveness of adversarial attacks. Notably, we observe a pronounced hindrance to the adversarial attack performance due to the information loss of the non-integer pixel values. To address this issue, we explore to leverage the gradient information of the attack samples within the model to mitigate the information loss. We introduce the Do More Steps (DMS) algorithm, which hinges on two core techniques: gradient ascent-based \textit{adversarial integerization} (DMS-AI) and integrated gradients-based \textit{attribution selection} (DMS-AS). Our goal is to alleviate such lossy process to retain the attack performance when storing these adversarial samples digitally. In particular, DMS-AI integerizes the non-integer pixel values according to the gradient direction, and DMS-AS selects the non-integer pixels by comparing attribution results. We conduct thorough experiments to assess the effectiveness of our approach, including the implementations of the DMS-AI and DMS-AS on two large-scale datasets with various latest gradient-based attack methods. Our empirical findings conclusively demonstrate the superiority of our proposed DMS-AI and DMS-AS pixel integerization methods over the standardised methods, such as rounding, truncating and upper approaches, in maintaining attack integrity.
Abstract:Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$. We introduce T2V-Turbo, which integrates feedback from a mixture of differentiable reward models into the consistency distillation (CD) process of a pre-trained T2V model. Notably, we directly optimize rewards associated with single-step generations that arise naturally from computing the CD loss, effectively bypassing the memory constraints imposed by backpropagating gradients through an iterative sampling process. Remarkably, the 4-step generations from our T2V-Turbo achieve the highest total score on VBench, even surpassing Gen-2 and Pika. We further conduct human evaluations to corroborate the results, validating that the 4-step generations from our T2V-Turbo are preferred over the 50-step DDIM samples from their teacher models, representing more than a tenfold acceleration while improving video generation quality.
Abstract:Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups.
Abstract:This paper investigates simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided physical layer security (PLS) in multiple-input multiple-output (MIMO) systems, where the base station (BS) transmits secrecy information with the aid of STAR-RIS against multiple eavesdroppers equipped with multiple antennas. We aim to maximize the secrecy rate by jointly optimizing the active beamforming at the BS and passive beamforming at the STAR-RIS, subject to the hardware constraint for STAR-RIS. To handle the coupling variables, a minimum mean-square error (MMSE) based alternating optimization (AO) algorithm is applied. In particular, the amplitudes and phases of STAR-RIS are divided into two blocks to simplify the algorithm design. Besides, by applying the Majorization-Minimization (MM) method, we derive a closed-form expression of the STAR-RIS's phase shifts. Numerical results show that the proposed scheme significantly outperforms various benchmark schemes, especially as the number of STAR-RIS elements increases.
Abstract:Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring and control framework based on the streaming of continuous point-on-wave (CPOW) measurements with AI-powered data compression and fault detection. Methods and Results: The architecture of the proposed design originates from the Wiener-Kallianpur innovation representation of a random process that transforms causally a stationary random process into an innovation sequence with independent and identically distributed random variables. This work presents a generative AI approach that (i) learns an innovation autoencoder that extracts innovation sequence from CPOW time series, (ii) compresses the CPOW streaming data with innovation autoencoder and subband coding, and (iii) detects unknown faults and novel trends via nonparametric sequential hypothesis testing. Conclusion: This work argues that conventional monitoring using SCADA and phasor measurement unit (PMU) technologies is ill-suited for a future grid with deep penetration of inverter-based renewable generations and distributed energy resources. A monitoring system based on CPOW data streaming and AI data analytics should be the basic building blocks for situational awareness of a highly dynamic future grid.
Abstract:This paper presents a novel generative probabilistic forecasting approach derived from the Wiener-Kallianpur innovation representation of nonparametric time series. Under the paradigm of generative artificial intelligence, the proposed forecasting architecture includes an autoencoder that transforms nonparametric multivariate random processes into canonical innovation sequences, from which future time series samples are generated according to their probability distributions conditioned on past samples. A novel deep-learning algorithm is proposed that constrains the latent process to be an independent and identically distributed sequence with matching autoencoder input-output conditional probability distributions. Asymptotic optimality and structural convergence properties of the proposed generative forecasting approach are established. Three applications involving highly dynamic and volatile time series in real-time market operations are considered: (i) locational marginal price forecasting for merchant storage participants, {(ii) interregional price spread forecasting for interchange markets,} and (iii) area control error forecasting for frequency regulations. Numerical studies based on market data from multiple independent system operators demonstrate superior performance against leading traditional and machine learning-based forecasting techniques under both probabilistic and point forecast metrics.
Abstract:A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the quality of available text data can vary. Filtering out data can also decrease the carbon footprint and financial costs of training models by reducing the amount of training required. Data selection methods aim to determine which candidate data points to include in the training dataset and how to appropriately sample from the selected data points. The promise of improved data selection methods has caused the volume of research in the area to rapidly expand. However, because deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive, few organizations have the resources for extensive data selection research. Consequently, knowledge of effective data selection practices has become concentrated within a few organizations, many of which do not openly share their findings and methodologies. To narrow this gap in knowledge, we present a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches. By describing the current landscape of research, this work aims to accelerate progress in data selection by establishing an entry point for new and established researchers. Additionally, throughout this review we draw attention to noticeable holes in the literature and conclude the paper by proposing promising avenues for future research.