Robotics and Intelligent Manufacturing & School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China, Shenzhen Institute of Artificial Intelligence and Robotics for Society, China




Abstract:It's challenging to balance the networks stability and plasticity in continual learning scenarios, considering stability suffers from the update of model and plasticity benefits from it. Existing works usually focus more on the stability and restrict the learning plasticity of later tasks to avoid catastrophic forgetting of learned knowledge. Differently, we propose a continual learning method named Split2MetaFusion which can achieve better trade-off by employing a two-stage strategy: splitting and meta-weighted fusion. In this strategy, a slow model with better stability, and a fast model with better plasticity are learned sequentially at the splitting stage. Then stability and plasticity are both kept by fusing the two models in an adaptive manner. Towards this end, we design an optimizer named Task-Preferred Null Space Projector(TPNSP) to the slow learning process for narrowing the fusion gap. To achieve better model fusion, we further design a Dreaming-Meta-Weighted fusion policy for better maintaining the old and new knowledge simultaneously, which doesn't require to use the previous datasets. Experimental results and analysis reported in this work demonstrate the superiority of the proposed method for maintaining networks stability and keeping its plasticity. Our code will be released.
Abstract:As black-box machine learning models grow in complexity and find applications in high-stakes scenarios, it is imperative to provide explanations for their predictions. Although Local Interpretable Model-agnostic Explanations (LIME) [22] is a widely adpoted method for understanding model behaviors, it is unstable with respect to random seeds [35,24,3] and exhibits low local fidelity (i.e., how well the explanation approximates the model's local behaviors) [21,16]. Our study shows that this instability problem stems from small sample weights, leading to the dominance of regularization and slow convergence. Additionally, LIME's sampling neighborhood is non-local and biased towards the reference, resulting in poor local fidelity and sensitivity to reference choice. To tackle these challenges, we introduce GLIME, an enhanced framework extending LIME and unifying several prior methods. Within the GLIME framework, we derive an equivalent formulation of LIME that achieves significantly faster convergence and improved stability. By employing a local and unbiased sampling distribution, GLIME generates explanations with higher local fidelity compared to LIME. GLIME explanations are independent of reference choice. Moreover, GLIME offers users the flexibility to choose a sampling distribution based on their specific scenarios.




Abstract:The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
Abstract:When engaging in strategic decision-making, we are frequently confronted with overwhelming information and data. The situation can be further complicated when certain pieces of evidence contradict each other or become paradoxical. The primary challenge is how to determine which information can be trusted when we adopt Artificial Intelligence (AI) systems for decision-making. This issue is known as deciding what to decide or Trustworthy AI. However, the AI system itself is often considered an opaque black box. We propose a new approach to address this issue by introducing a novel framework of Trustworthy AI (TAI) encompassing three crucial components of AI: representation space, loss function, and optimizer. Each component is loosely coupled with four TAI properties. Altogether, the framework consists of twelve TAI properties. We aim to use this framework to conduct the TAI experiments by quantitive and qualitative research methods to satisfy TAI properties for the decision-making context. The framework allows us to formulate an optimal prediction model trained by the given dataset for applying the strategic investment decision of credit default swaps (CDS) in the technology sector. Finally, we provide our view of the future direction of TAI research




Abstract:Human preference alignment is a crucial training step to improve the interaction quality of large language models (LLMs). Existing aligning methods depend on manually annotated preference data to guide the LLM optimization directions. However, in practice, continuously updating LLMs raises a distribution gap between model-generated samples and human-preferred responses, which hinders model fine-tuning efficiency. To mitigate this issue, previous methods require additional preference annotation on generated samples to adapt the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an adversarial preference optimization (APO) framework, where the LLM agent and the preference model update alternatively via a min-max game. Without additional annotation, our APO method can make a self-adaption to the generation distribution gap through the adversarial learning process. In experiments, we empirically verify the effectiveness of APO in improving LLM's helpfulness and harmlessness compared with rejection sampling baselines.




Abstract:Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.




Abstract:Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/




Abstract:Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function approximation further requires us to characterize the approximation error. Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$ convergence rate, where $k$ is the number of iterations.
Abstract:Advancing our understanding of climate processes in regions characterized by intricate terrain complexity is a paramount challenge in contemporary climate science, particularly in the context of global climate change. Notably, the scarcity of observational data in these regions has imposed substantial limitations on understanding the nuanced climate dynamics therein. For the first time, utilizing cutting-edge AI-driven knowledge discovery techniques, we have uncovered explicit equations that elucidate the intricate relationship between terrain features and precipitation patterns, illuminating the previously concealed complexities governing these relationships. These equations, thus far undisclosed, exhibit remarkable accuracy compared to conventional empirical models when applied to precipitation data. Building on this foundation, we reveal a phenomenon known as the '1995 turning point,' indicating a significant shift in the terrain-precipitation relationship in approximately 1995, related to the forces of climate change. These equations have practical applications, particularly in achieving fine-scale downscaling precipitation predictions from low-resolution future climate data. This capability provides invaluable insights into the expected changes in precipitation patterns across diverse terrains under future climate scenarios.




Abstract:With the development of aerospace technology, the increasing population of space debris has posed a great threat to the safety of spacecraft. However, the low intensity of reflected light and high angular velocity of space debris impede the extraction. Besides, due to the limitations of the ground observation methods, small space debris can hardly be detected, making it necessary to enhance the spacecraft's capacity for space situational awareness (SSA). Considering that traditional methods have some defects in low-SNR target detection, such as low effectiveness and large time consumption, this paper proposes a method for low-SNR streak extraction based on local contrast and maximum likelihood estimation (MLE), which can detect space objects with SNR 2.0 efficiently. In the proposed algorithm, local contrast will be applied for crude classifications, which will return connected components as preliminary results, and then MLE will be performed to reconstruct the connected components of targets via orientated growth, further improving the precision. The algorithm has been verified with both simulated streaks and real star tracker images, and the average centroid error of the proposed algorithm is close to the state-of-the-art method like ODCC. At the same time, the algorithm in this paper has significant advantages in efficiency compared with ODCC. In conclusion, the algorithm in this paper is of high speed and precision, which guarantees its promising applications in the extraction of high dynamic targets.