Abstract:One explanation for the strong generalization ability of neural networks is implicit bias. Yet, the definition and mechanism of implicit bias in non-linear contexts remains little understood. In this work, we propose to characterize implicit bias by the count of connected regions in the input space with the same predicted label. Compared with parameter-dependent metrics (e.g., norm or normalized margin), region count can be better adapted to nonlinear, overparameterized models, because it is determined by the function mapping and is invariant to reparametrization. Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyper-parameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks.
Abstract:This paper introduces MutualNeRF, a framework enhancing Neural Radiance Field (NeRF) performance under limited samples using Mutual Information Theory. While NeRF excels in 3D scene synthesis, challenges arise with limited data and existing methods that aim to introduce prior knowledge lack theoretical support in a unified framework. We introduce a simple but theoretically robust concept, Mutual Information, as a metric to uniformly measure the correlation between images, considering both macro (semantic) and micro (pixel) levels. For sparse view sampling, we strategically select additional viewpoints containing more non-overlapping scene information by minimizing mutual information without knowing ground truth images beforehand. Our framework employs a greedy algorithm, offering a near-optimal solution. For few-shot view synthesis, we maximize the mutual information between inferred images and ground truth, expecting inferred images to gain more relevant information from known images. This is achieved by incorporating efficient, plug-and-play regularization terms. Experiments under limited samples show consistent improvement over state-of-the-art baselines in different settings, affirming the efficacy of our framework.
Abstract:In the past few years, Artificial Intelligence (AI)-based weather forecasting methods have widely demonstrated strong competitiveness among the weather forecasting systems. However, these methods are insufficient for high-spatial-resolution short-term nowcasting within 6 hours, which is crucial for warning short-duration, mesoscale and small-scale weather events. Geostationary satellite remote sensing provides detailed, high spatio-temporal and all-day observations, which can address the above limitations of existing methods. Therefore, this paper proposed an advanced data-driven thermal infrared cloud images forecasting model, "DaYu." Unlike existing data-driven weather forecasting models, DaYu is specifically designed for geostationary satellite observations, with a temporal resolution of 0.5 hours and a spatial resolution of ${0.05}^\circ$ $\times$ ${0.05}^\circ$. DaYu is based on a large-scale transformer architecture, which enables it to capture fine-grained cloud structures and learn fast-changing spatio-temporal evolution features effectively. Moreover, its attention mechanism design achieves a balance in computational complexity, making it practical for applications. DaYu not only achieves accurate forecasts up to 3 hours with a correlation coefficient higher than 0.9, 6 hours higher than 0.8, and 12 hours higher than 0.7, but also detects short-duration, mesoscale, and small-scale weather events with enhanced detail, effectively addressing the shortcomings of existing methods in providing detailed short-term nowcasting within 6 hours. Furthermore, DaYu has significant potential in short-term climate disaster prevention and mitigation.
Abstract:In the realm of event prediction, temporal knowledge graph forecasting (TKGF) stands as a pivotal technique. Previous approaches face the challenges of not utilizing experience during testing and relying on a single short-term history, which limits adaptation to evolving data. In this paper, we introduce the Online Neural-Symbolic Event Prediction (ONSEP) framework, which innovates by integrating dynamic causal rule mining (DCRM) and dual history augmented generation (DHAG). DCRM dynamically constructs causal rules from real-time data, allowing for swift adaptation to new causal relationships. In parallel, DHAG merges short-term and long-term historical contexts, leveraging a bi-branch approach to enrich event prediction. Our framework demonstrates notable performance enhancements across diverse datasets, with significant Hit@k (k=1,3,10) improvements, showcasing its ability to augment large language models (LLMs) for event prediction without necessitating extensive retraining. The ONSEP framework not only advances the field of TKGF but also underscores the potential of neural-symbolic approaches in adapting to dynamic data environments.
Abstract:Identifying whether an artwork was used to train a diffusion model is an important research topic, given the rising popularity of AI-generated art and the associated copyright concerns. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitations of applying existing MIA methods for copyright protection: the required access of internal U-nets and the choice of non-member datasets for evaluation. To address the above problems, we introduce a novel black-box membership inference attack method that operates without needing access to the model's internal U-net. We then construct a DALL-E generated dataset for a more comprehensive evaluation. We validate our method across various setups, and our experimental results outperform previous works.
Abstract:In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.
Abstract:This paper presents the design of a 6-DOF all-terrain micro aerial vehicle and two control strategies for multimodal flight, which are experimentally validated. The micro aerial vehicle is propelled by four motors and controlled by a single servo for the control of the cycloidal rotors(cyclorotors) speed and lift direction. Despite the addition of the servo, the system remains underactuated. To address the traditional underactuation problem of cycloidal rotor aircraft, we increase the number of control variables. We propose a PID and a nonlinear model predictive control (NMPC) framework to tackle the model's nonlinearities and achieve control of attitude, position, and their derivatives.Experimental results demonstrate the effectiveness of the proposed multimodal control strategy for 6-DOF all-terrain micro aerial vehicles. The vehicle can operate in aerial, terrestrial, and aquatic modes and can adapt to different terrains and environmental conditions. Our approach enhances the vehicle's performance in each mode of operation, and the results show the advantages of the proposed strategy compared to other control strategies.
Abstract:Parkinson's Disease (PD) is a progressive nervous system disorder that has affected more than 5.8 million people, especially the elderly. Due to the complexity of its symptoms and its similarity to other neurological disorders, early detection requires neurologists or PD specialists to be involved, which is not accessible to most old people. Therefore, we integrate smart mobile devices with AI technologies. In this paper, we introduce the framework of our developed PD early detection system which combines different tasks evaluating both motor and non-motor symptoms. With the developed model, we help users detect PD punctually in non-clinical settings and figure out their most severe symptoms. The results are expected to be further used for PD rehabilitation guidance and detection of other neurological disorders.
Abstract:This paper introduces a structure-deformable land-air robot which possesses both excellent ground driving and flying ability, with smooth switching mechanism between two modes. The elaborate coupled dynamics model of the proposed robot is established, including rotors, chassis, especially the deformable structures. Furthermore, taking fusion locomotion and complex near-ground situations into consideration, a model based controller is designed for landing and mode switching under various harsh conditions, in which we realise the cooperation between fused two motion modes. The entire system is implemented in ADAMS/Simulink simulation and in practical. We conduct experiments under various complex scenarios. The results show our robot can accomplish land-air switching swiftly and smoothly, and the designed controller can effectively improve the landing flexibility and reliability.
Abstract:Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework -- in which the transition probabilities belong to an uncertainty set around a nominal model -- provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs.