This paper systematically studies the cooperative area coverage and target tracking problem of multiple-unmanned aerial vehicles (multi-UAVs). The problem is solved by decomposing into three sub-problems: information fusion, task assignment, and multi-UAV behavior decision-making. Specifically, in the information fusion process, we use the maximum consistency protocol to update the joint estimation states of multi-targets (JESMT) and the area detection information. The area detection information is represented by the equivalent visiting time map (EVTM), which is built based on the detection probability and the actual visiting time of the area. Then, we model the task assignment problem of multi-UAV searching and tracking multi-targets as a network flow model with upper and lower flow bounds. An algorithm named task assignment minimum-cost maximum-flow (TAMM) is proposed. Cooperative behavior decision-making uses Fisher information as the mission reward to obtain the optimal tracking action of the UAV. Furthermore, a coverage behavior decision-making algorithm based on the anti-flocking method is designed for those UAVs assigned the coverage task. Finally, a distributed multi-UAV cooperative area coverage and target tracking algorithm is designed, which integrates information fusion, task assignment, and behavioral decision-making. Numerical and hardware-in-the-loop simulation results show that the proposed method can achieve persistent area coverage and cooperative target tracking.
Despite recent breakthroughs in the field of artificial intelligence (AI) - or more specifically machine learning (ML) algorithms for object recognition and natural language processing - it seems to be the majority view that current AI approaches are still no real match for natural intelligence (NI). More importantly, philosophers have collected a long catalogue of features which imply that NI works differently from current AI not only in a gradual sense, but in a more substantial way: NI is closely related to consciousness, intentionality and experiential features like qualia (the subjective contents of mental states) and allows for understanding (e.g., taking insight into causal relationships instead of 'blindly' relying on correlations), as well as aesthetical and ethical judgement beyond what we can put into (explicit or data-induced implicit) rules to program machines with. Additionally, Psychologists find NI to range from unconscious psychological processes to focused information processing, and from embodied and implicit cognition to 'true' agency and creativity. NI thus seems to transcend any neurobiological functionalism by operating on 'bits of meaning' instead of information in the sense of data, quite unlike both the 'good old fashioned', symbolic AI of the past, as well as the current wave of deep neural network based, 'sub-symbolic' AI, which both share the idea of thinking as (only) information processing. In the following I propose an alternative view of NI as information processing plus 'bundle pushing', discuss an example which illustrates how bundle pushing can cut information processing short, and suggest first ideas for scientific experiments in neuro-biology and information theory as further investigations.
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
Multi-view clustering (MvC) aims at exploring the category structure among multi-view data without label supervision. Multiple views provide more information than single views and thus existing MvC methods can achieve satisfactory performance. However, their performance might seriously degenerate when the views are noisy in practical scenarios. In this paper, we first formally investigate the drawback of noisy views and then propose a theoretically grounded deep MvC method (namely MvCAN) to address this issue. Specifically, we propose a novel MvC objective that enables un-shared parameters and inconsistent clustering predictions across multiple views to reduce the side effects of noisy views. Furthermore, a non-parametric iterative process is designed to generate a robust learning target for mining multiple views' useful information. Theoretical analysis reveals that MvCAN works by achieving the multi-view consistency, complementarity, and noise robustness. Finally, experiments on public datasets demonstrate that MvCAN outperforms state-of-the-art methods and is robust against the existence of noisy views.
In recent years, estimating the duration of medical intervention based on electronic health records (EHRs) has gained significant attention in the filed of clinical decision support. However, current models largely focus on structured data, leaving out information from the unstructured clinical free-text data. To address this, we present a novel language-enhanced transformer-based framework, which projects all relevant clinical data modalities (continuous, categorical, binary, and free-text features) into a harmonized language latent space using a pre-trained sentence encoder with the help of medical prompts. The proposed method enables the integration of information from different modalities within the cell transformer encoder and leads to more accurate duration estimation for medical intervention. Our experimental results on both US-based (length of stay in ICU estimation) and Asian (surgical duration prediction) medical datasets demonstrate the effectiveness of our proposed framework, which outperforms tailored baseline approaches and exhibits robustness to data corruption in EHRs.
Existing question-answering research focuses on unanswerable questions in the context of always providing an answer when a system can\dots but what about cases where a system {\bf should not} answer a question. This can either be to protect sensitive users or sensitive information. Many models expose sensitive information under interrogation by an adversarial user. We seek to determine if it is possible to teach a question-answering system to keep a specific fact secret. We design and implement a proof-of-concept architecture and through our evaluation determine that while possible, there are numerous directions for future research to reduce system paranoia (false positives), information leakage (false negatives) and extend the implementation of the work to more complex problems with preserving secrecy in the presence of information aggregation.
Online travel platforms (OTPs), e.g., Ctrip.com or Fliggy.com, can effectively provide travel-related products or services to users. In this paper, we focus on the multi-scenario click-through rate (CTR) prediction, i.e., training a unified model to serve all scenarios. Existing multi-scenario based CTR methods struggle in the context of OTP setting due to the ignorance of the cold-start users who have very limited data. To fill this gap, we propose a novel method named Cold-Start based Multi-scenario Network (CSMN). Specifically, it consists of two basic components including: 1) User Interest Projection Network (UIPN), which firstly purifies users' behaviors by eliminating the scenario-irrelevant information in behaviors with respect to the visiting scenario, followed by obtaining users' scenario-specific interests by summarizing the purified behaviors with respect to the target item via an attention mechanism; and 2) User Representation Memory Network (URMN), which benefits cold-start users from users with rich behaviors through a memory read and write mechanism. CSMN seamlessly integrates both components in an end-to-end learning framework. Extensive experiments on real-world offline dataset and online A/B test demonstrate the superiority of CSMN over state-of-the-art methods.
The recent interweaving of AI-6G technologies has sparked extensive research interest in further enhancing reliable and timely communications. \emph{Age of Information} (AoI), as a novel and integrated metric implying the intricate trade-offs among reliability, latency, and update frequency, has been well-researched since its conception. This paper contributes new results in this area by employing a Deep Reinforcement Learning (DRL) approach to intelligently decide how to allocate power resources and when to retransmit in a \emph{freshness-sensitive} downlink multi-user Hybrid Automatic Repeat reQuest with Chase Combining (HARQ-CC) aided Non-Orthogonal Multiple Access (NOMA) network. Specifically, an AoI minimization problem is formulated as a Markov Decision Process (MDP) problem. Then, to achieve deterministic, age-optimal, and intelligent power allocations and retransmission decisions, the Double-Dueling-Deep Q Network (DQN) is adopted. Furthermore, a more flexible retransmission scheme, referred to as Retransmit-At-Will scheme, is proposed to further facilitate the timeliness of the HARQ-aided NOMA network. Simulation results verify the superiority of the proposed intelligent scheme and demonstrate the threshold structure of the retransmission policy. Also, answers to whether user pairing is necessary are discussed by extensive simulation results.
Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mathbb{R}^n$, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10\% in testing accuracy.
The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model, which allows for the effective utilization of repository-level information for code completion and grants the ability to generate code at various levels of granularity. Furthermore, RepoCoder utilizes a novel iterative retrieval-generation paradigm that bridges the gap between retrieval context and the intended completion target. We also propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. We test the performance of RepoCoder by using various combinations of code retrievers and generators. Experimental results indicate that RepoCoder significantly improves the zero-shot code completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research.