For transfer-based attacks, the adversarial examples are crafted on the surrogate model, which can be implemented to mislead the target model effectively. The conventional method for maximizing adversarial transferability involves: (1) fine-tuning hyperparameters to generate multiple batches of adversarial examples on the substitute model; (2) conserving the batch of adversarial examples that have the best comprehensive performance on substitute model and target model, and discarding the others. In this work, we revisit the second step of this process in the context of fine-tuning hyperparameters to craft adversarial examples, where multiple batches of fine-tuned adversarial examples often appear in a single high error hilltop. We demonstrate that averaging multiple batches of adversarial examples under different hyperparameter configurations, which refers to as "adversarial example soups", can often enhance adversarial transferability. Compared with traditional methods, the proposed method incurs no additional generation time and computational cost. Besides, our method is orthogonal to existing transfer-based methods and can be combined with them seamlessly to generate more transferable adversarial examples. Extensive experiments on the ImageNet dataset show that our methods achieve a higher attack success rate than the state-of-the-art attacks.
In the rapidly evolving domain of electrical power systems, the Volt-VAR optimization (VVO) is increasingly critical, especially with the burgeoning integration of renewable energy sources. Traditional approaches to learning-based VVO in expansive and dynamically changing power systems are often hindered by computational complexities. To address this challenge, our research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL), specifically utilizing the Importance Weighted Actor-Learner Architecture (IMPALA) algorithm, executed on the RAY platform. This framework, built upon RLlib-an industry-standard in Reinforcement Learning-ingeniously capitalizes on the distributed computing capabilities and advanced hyperparameter tuning offered by RAY. This design significantly expedites the exploration and exploitation phases in the VVO solution space. Our empirical results demonstrate that our approach not only surpasses existing DRL methods in achieving superior reward outcomes but also manifests a remarkable tenfold reduction in computational requirements. The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control. RLlib-IMPALA leverages RAY's toolkit to enhance analytical capabilities and significantly speeds up training to become more than 10 times faster than other state-of-the-art DRL methods.
Low-Dose computer tomography (LDCT) is an ideal alternative to reduce radiation risk in clinical applications. Although supervised-deep-learning-based reconstruction methods have demonstrated superior performance compared to conventional model-driven reconstruction algorithms, they require collecting massive pairs of low-dose and norm-dose CT images for neural network training, which limits their practical application in LDCT imaging. In this paper, we propose an unsupervised and training data-free learning reconstruction method for LDCT imaging that avoids the requirement for training data. The proposed method is a post-processing technique that aims to enhance the initial low-quality reconstruction results, and it reconstructs the high-quality images by neural work training that minimizes the $\ell_1$-norm distance between the CT measurements and their corresponding simulated sinogram data, as well as the total variation (TV) value of the reconstructed image. Moreover, the proposed method does not require to set the weights for both the data fidelity term and the plenty term. Experimental results on the AAPM challenge data and LoDoPab-CT data demonstrate that the proposed method is able to effectively suppress the noise and preserve the tiny structures. And these results also shows the proposed method's low computational cost and rapid convergence. The source code is available at \url{https://github.com/linfengyu77/IRLDCT}.
Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at https://github.com/Rainbowman0/TML_LLIE.
Effectively and efficiently retrieving images from remote sensing databases is a critical challenge in the realm of remote sensing big data. Utilizing hand-drawn sketches as retrieval inputs offers intuitive and user-friendly advantages, yet the potential of multi-level feature integration from sketches remains underexplored, leading to suboptimal retrieval performance. To address this gap, our study introduces a novel zero-shot, sketch-based retrieval method for remote sensing images, leveraging multi-level, attention-guided tokenization. This approach starts by employing multi-level self-attention feature extraction to tokenize the query sketches, as well as self-attention feature extraction to tokenize the candidate images. It then employs cross-attention mechanisms to establish token correspondence between these two modalities, facilitating the computation of sketch-to-image similarity. Our method demonstrates superior retrieval accuracy over existing sketch-based remote sensing image retrieval techniques, as evidenced by tests on four datasets. Notably, it also exhibits robust zero-shot learning capabilities and strong generalizability in handling unseen categories and novel remote sensing data. The method's scalability can be further enhanced by the pre-calculation of retrieval tokens for all candidate images in a database. This research underscores the significant potential of multi-level, attention-guided tokenization in cross-modal remote sensing image retrieval. For broader accessibility and research facilitation, we have made the code and dataset used in this study publicly available online. Code and dataset are available at https://github.com/Snowstormfly/Cross-modal-retrieval-MLAGT.
In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from severe interference, causing outages in the task offloading process. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) to enable, not only V2I reflective links, but also interference cancellation at the V2V links exploiting the computational capability of its metamaterials. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices, with the objective to maximize a safety-based autonomous driving task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems and solved via an alternate approximation method. Our simulation results demonstrate the effectiveness of the proposed RICS optimization in improving the safety in autonomous driving networks.
Deep learning techniques have been used to build velocity models (VMs) for seismic traveltime tomography and have shown encouraging performance in recent years. However, they need to generate labeled samples (i.e., pairs of input and label) to train the deep neural network (NN) with end-to-end learning, and the real labels for field data inversion are usually missing or very expensive. Some traditional tomographic methods can be implemented quickly, but their effectiveness is often limited by prior assumptions. To avoid generating labeled samples, we propose a novel method by integrating deep learning and dictionary learning to enhance the VMs with low resolution by using the traditional tomography-least square method (LSQR). We first design a type of shallow and simple NN to reduce computational cost followed by proposing a two-step strategy to enhance the VMs with low resolution: (1) Warming up. An initial dictionary is trained from the estimation by LSQR through dictionary learning method; (2) Dictionary optimization. The initial dictionary obtained in the warming-up step will be optimized by the NN, and then it will be used to reconstruct high-resolution VMs with the reference slowness and the estimation by LSQR. Furthermore, we design a loss function to minimize traveltime misfit to ensure that NN training is label-free, and the optimized dictionary can be obtained after each epoch of NN training. We demonstrate the effectiveness of the proposed method through numerical tests.
Property prediction is a fundamental task in crystal material research. To model atoms and structures, structures represented as graphs are widely used and graph learning-based methods have achieved significant progress. Bond angles and bond distances are two key structural information that greatly influence crystal properties. However, most of the existing works only consider bond distances and overlook bond angles. The main challenge lies in the time cost of handling bond angles, which leads to a significant increase in inference time. To solve this issue, we first propose a crystal structure modeling based on dual scale neighbor partitioning mechanism, which uses a larger scale cutoff for edge neighbors and a smaller scale cutoff for angle neighbors. Then, we propose a novel Atom-Distance-Angle Graph Neural Network (ADA-GNN) for property prediction tasks, which can process node information and structural information separately. The accuracy of predictions and inference time are improved with the dual scale modeling and the specially designed architecture of ADA-GNN. The experimental results validate that our approach achieves state-of-the-art results in two large-scale material benchmark datasets on property prediction tasks.
Few-shot object detection (FSOD) aims at extending a generic detector for novel object detection with only a few training examples. It attracts great concerns recently due to the practical meanings. Meta-learning has been demonstrated to be an effective paradigm for this task. In general, methods based on meta-learning employ an additional support branch to encode novel examples (a.k.a. support images) into class prototypes, which are then fused with query branch to facilitate the model prediction. However, the class-level prototypes are difficult to precisely generate, and they also lack detailed information, leading to instability in performance.New methods are required to capture the distinctive local context for more robust novel object detection. To this end, we propose to distill the most representative support features into fine-grained prototypes. These prototypes are then assigned into query feature maps based on the matching results, modeling the detailed feature relations between two branches. This process is realized by our Fine-Grained Feature Aggregation (FFA) module. Moreover, in terms of high-level feature fusion, we propose Balanced Class-Agnostic Sampling (B-CAS) strategy and Non-Linear Fusion (NLF) module from differenct perspectives. They are complementary to each other and depict the high-level feature relations more effectively. Extensive experiments on PASCAL VOC and MS COCO benchmarks show that our method sets a new state-of-the-art performance in most settings. Our code is available at https://github.com/wangchen1801/FPD.
Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority