Unmanned aerial vehicles (UAVs)-enabled aerial communication provides a flexible, reliable, and cost-effective solution for a range of wireless applications. However, due to the high line-of-sight (LoS) probability, aerial communications between UAVs are vulnerable to eavesdropping attacks, particularly when multiple eavesdroppers collude. In this work, we aim to introduce distributed collaborative beamforming (DCB) into UAV swarms and handle the eavesdropper collusion by controlling the corresponding signal distributions. Specifically, we consider a two-way DCB-enabled aerial communication between two UAV swarms and construct these swarms as two UAV virtual antenna arrays. Then, we minimize the two-way known secrecy capacity and the maximum sidelobe level to avoid information leakage from the known and unknown eavesdroppers, respectively. Simultaneously, we also minimize the energy consumption of UAVs for constructing virtual antenna arrays. Due to the conflicting relationships between secure performance and energy efficiency, we consider these objectives as a multi-objective optimization problem. Following this, we propose an enhanced multi-objective swarm intelligence algorithm via the characterized properties of the problem. Simulation results show that our proposed algorithm can obtain a set of informative solutions and outperform other state-of-the-art baseline algorithms. Experimental tests demonstrate that our method can be deployed in limited computing power platforms of UAVs and is beneficial for saving computational resources.
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-context LMs, we create a new synthetic benchmark RULER with flexible configurations for customized sequence length and task complexity. RULER expands upon the vanilla NIAH test to encompass variations with diverse types and quantities of needles. Moreover, RULER introduces new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context. We evaluate ten long-context LMs with 13 representative tasks in RULER. Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases. While these models all claim context sizes of 32K tokens or greater, only four models (GPT-4, Command-R, Yi-34B, and Mixtral) can maintain satisfactory performance at the length of 32K. Our analysis of Yi-34B, which supports context length of 200K, reveals large room for improvement as we increase input length and task complexity. We open source RULER to spur comprehensive evaluation of long-context LMs.
In this paper, we introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing. Unlike existing methods that struggle with accumulated perturbations and often result in distortions, GoodDrag introduces an AlDD framework that alternates between drag and denoising operations within the diffusion process, effectively improving the fidelity of the result. We also propose an information-preserving motion supervision operation that maintains the original features of the starting point for precise manipulation and artifact reduction. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Extensive experiments demonstrate that the proposed GoodDrag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The project page is https://gooddrag.github.io.
This study addresses the challenge of, without training or fine-tuning, controlling the global color aspect of images generated with a diffusion model. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, and this without hindering the quality of the generation. Our method leads to new guidance equations. We show in the color guidance context that, the scaling of the guidance should not decrease but remains high throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, we combine both semantic and general color information on the image to decode the images at low cost. We show that our method is effective at improving fidelity and realism of compressed images at extremely low bit rates, when compared to other classical or more semantic oriented approaches.
Characterization of joint probability distribution for large networks of random variables remains a challenging task in data science. Probabilistic graph approximation with simple topologies has practically been resorted to; typically the tree topology makes joint probability computation much simpler and can be effective for statistical inference on insufficient data. However, to characterize network components where multiple variables cooperate closely to influence others, model topologies beyond a tree are needed, which unfortunately are infeasible to acquire. In particular, our previous work has related optimal approximation of Markov networks of tree-width k >=2 closely to the graph-theoretic problem of finding maximum spanning k-tree (MSkT), which is a provably intractable task. This paper investigates optimal approximation of Markov networks with k-tree topology that retains some designated underlying subgraph. Such a subgraph may encode certain background information that arises in scientific applications, for example, about a known significant pathway in gene networks or the indispensable backbone connectivity in the residue interaction graphs for a biomolecule 3D structure. In particular, it is proved that the \beta-retaining MSkT problem, for a number of classes \beta of graphs, admit O(n^{k+1})-time algorithms for every fixed k>= 1. These \beta-retaining MSkT algorithms offer efficient solutions for approximation of Markov networks with k-tree topology in the situation where certain persistent information needs to be retained.
This paper proposes \textit{GO4Align}, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, compromising two crucial techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indicators, which exploit consistent task correlations with risk information from previous iterations. Comprehensive experimental results on diverse typical benchmarks demonstrate our method's performance superiority with even lower computational costs.
Due to the latest environmental concerns in keeping at bay contaminants emissions in urban areas, air pollution forecasting has been rising the forefront of all researchers around the world. When predicting pollutant concentrations, it is common to include the effects of environmental factors that influence these concentrations within an extended period, like traffic, meteorological conditions and geographical information. Most of the existing approaches exploit this information as past covariates, i.e., past exogenous variables that affected the pollutant but were not affected by it. In this paper, we present a novel forecasting methodology to predict NO$_2$ concentration via both past and future covariates. Future covariates are represented by weather forecasts and future calendar events, which are already known at prediction time. In particular, we deal with air quality observations in a city-wide network of ground monitoring stations, modeling the data structure and estimating the predictions with a Spatiotemporal Graph Neural Network (STGNN). We propose a conditioning block that embeds past and future covariates into the current observations. After extracting meaningful spatiotemporal representations, these are fused together and projected into the forecasting horizon to generate the final prediction. To the best of our knowledge, it is the first time that future covariates are included in time series predictions in a structured way. Remarkably, we find that conditioning on future weather information has a greater impact than considering past traffic conditions. We release our code implementation at https://github.com/polimi-ispl/MAGCRN.
In response to the rising threat of the face morphing attack, this paper introduces and explores the potential of Video-based Morphing Attack Detection (V-MAD) systems in real-world operational scenarios. While current morphing attack detection methods primarily focus on a single or a pair of images, V-MAD is based on video sequences, exploiting the video streams often acquired by face verification tools available, for instance, at airport gates. Through this study, we show for the first time the advantages that the availability of multiple probe frames can bring to the morphing attack detection task, especially in scenarios where the quality of probe images is varied and might be affected, for instance, by pose or illumination variations. Experimental results on a real operational database demonstrate that video sequences represent valuable information for increasing the robustness and performance of morphing attack detection systems.
In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this work, we propose a novel solution to the text-driven style transfer task, namely, Adaptive Style Incorporation~(ASI), to achieve fine-grained feature-level style incorporation. It consists of the Siamese Cross-Attention~(SiCA) to decouple the single-track cross-attention to a dual-track structure to obtain separate content and style features, and the Adaptive Content-Style Blending (AdaBlending) module to couple the content and style information from a structure-consistent manner. Experimentally, our method exhibits much better performance in both structure preservation and stylized effects.
Molecular generation, an essential method for identifying new drug structures, has been supported by advancements in machine learning and computational technology. However, challenges remain in multi-objective generation, model adaptability, and practical application in drug discovery. In this study, we developed a versatile 'plug-in' molecular generation model that incorporates multiple objectives related to target affinity, drug-likeness, and synthesizability, facilitating its application in various drug development contexts. We improved the Particle Swarm Optimization (PSO) in the context of drug discoveries, and identified PSO-ENP as the optimal variant for multi-objective molecular generation and optimization through comparative experiments. The model also incorporates a novel target-ligand affinity predictor, enhancing the model's utility by supporting three-dimensional information and improving synthetic feasibility. Case studies focused on generating and optimizing drug-like big marine natural products were performed, underscoring PSO-ENP's effectiveness and demonstrating its considerable potential for practical drug discovery applications.