Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.
Prompt engineering is a new paradigm for enhancing the performance of trained neural network models. For optimizing text-style prompts, existing methods usually individually operate small portions of a text step by step, which either breaks the fluency or could not globally adjust a prompt. Since large language models (LLMs) have powerful ability of generating coherent texts token by token, can we utilize LLMs for improving prompts? Based on this motivation, in this paper, considering a trained LLM as a text generator, we attempt to design a black-box evolution algorithm for automatically optimizing texts, namely SPELL (Semantic Prompt Evolution based on a LLM). The proposed method is evaluated with different LLMs and evolution parameters in different text tasks. Experimental results show that SPELL could rapidly improve the prompts indeed. We further explore the evolution process and discuss on the limitations, potential possibilities and future work.
Noise is a pervasive element within real-world measurement data, significantly undermining the performance of network inference models. However, the quest for a comprehensive enhancement framework capable of bolstering noise resistance across a diverse array of network inference models has remained elusive. Here, we present an elegant and efficient framework tailored to amplify the capabilities of network inference models in the presence of noise. Leveraging curriculum learning, we mitigate the deleterious impact of noisy samples on network inference models. Our proposed framework is model-agnostic, seamlessly integrable into a plethora of model-based and model-free network inference methods. Notably, we utilize one model-based and three model-free network inference methods as the foundation. Extensive experimentation across various synthetic and real-world networks, encapsulating diverse nonlinear dynamic processes, showcases substantial performance augmentation under varied noise types, particularly thriving in scenarios enriched with clean samples. This framework's adeptness in fortifying both model-free and model-based network inference methodologies paves the avenue towards a comprehensive and unified enhancement framework, encompassing the entire spectrum of network inference models. Available Code: https://github.com/xiaoyuans/MANIE.
Despite recent advances in semantic segmentation, an inevitable challenge is the performance degradation caused by the domain shift in real application. Current dominant approach to solve this problem is unsupervised domain adaptation (UDA). However, the absence of labeled target data in UDA is overly restrictive and limits performance. To overcome this limitation, a more practical scenario called semi-supervised domain adaptation (SSDA) has been proposed. Existing SSDA methods are derived from the UDA paradigm and primarily focus on leveraging the unlabeled target data and source data. In this paper, we highlight the significance of exploiting the intra-domain information between the limited labeled target data and unlabeled target data, as it greatly benefits domain adaptation. Instead of solely using the scarce labeled data for supervision, we propose a novel SSDA framework that incorporates both inter-domain mixing and intra-domain mixing, where inter-domain mixing mitigates the source-target domain gap and intra-domain mixing enriches the available target domain information. By simultaneously learning from inter-domain mixing and intra-domain mixing, the network can capture more domain-invariant features and promote its performance on the target domain. We also explore different domain mixing operations to better exploit the target domain information. Comprehensive experiments conducted on the GTA5toCityscapes and SYNTHIA2Cityscapes benchmarks demonstrate the effectiveness of our method, surpassing previous methods by a large margin.
With more and more deep neural networks being deployed as various daily services, their reliability is essential. It's frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based. Recent works usually strengthen the robustness by adversarial training or leveraging the knowledge of an amount of clean data. However, in practical terms, retraining and redeploying the model need a large computational budget, leading to heavy losses to the online service. In addition, when adversarial examples of a certain attack are detected, only limited adversarial examples are available for the service provider, while much clean data may not be accessible. Given the mentioned problems, we propose a new scenario, RaPiD (Rapid Plug-in Defender), which is to rapidly defend against a certain attack for the frozen original service model with limitations of few clean and adversarial examples. Motivated by the generalization and the universal computation ability of pre-trained transformer models, we come up with a new defender method, CeTaD, which stands for Considering Pre-trained Transformers as Defenders. In particular, we evaluate the effectiveness and the transferability of CeTaD in the case of one-shot adversarial examples and explore the impact of different parts of CeTaD as well as training data conditions. CeTaD is flexible, able to be embedded into an arbitrary differentiable model, and suitable for various types of attacks.
Learning to optimize (L2O) has emerged as a powerful framework for black-box optimization (BBO). L2O learns the optimization strategies from the target task automatically without human intervention. This paper focuses on obtaining better performance when handling high-dimensional and expensive BBO with little function evaluation cost, which is the core challenge of black-box optimization. However, current L2O-based methods are weak for this due to a large number of evaluations on expensive black-box functions during training and poor representation of optimization strategy. To achieve this, 1) we utilize the cheap surrogate functions of the target task to guide the design of the optimization strategies; 2) drawing on the mechanism of evolutionary algorithm (EA), we propose a novel framework called B2Opt, which has a stronger representation of optimization strategies. Compared to the BBO baselines, B2Opt can achieve 3 to $10^6$ times performance improvement with less function evaluation cost. We test our proposal in high-dimensional synthetic functions and two real-world applications. We also find that deep B2Opt performs better than shallow ones.
Evolutionary algorithms (EAs) have emerged as a powerful framework for expensive black-box optimization. Obtaining better solutions with less computational cost is essential and challenging for black-box optimization. The most critical obstacle is figuring out how to effectively use the target task information to form an efficient optimization strategy. However, current methods are weak due to the poor representation of the optimization strategy and the inefficient interaction between the optimization strategy and the target task. To overcome the above limitations, we design a learned EA (LEA) to realize the move from hand-designed optimization strategies to learned optimization strategies, including not only hyperparameters but also update rules. Unlike traditional EAs, LEA has high adaptability to the target task and can obtain better solutions with less computational cost. LEA is also able to effectively utilize the low-fidelity information of the target task to form an efficient optimization strategy. The experimental results on one synthetic case, CEC 2013, and two real-world cases show the advantages of learned optimization strategies over human-designed baselines. In addition, LEA is friendly to the acceleration provided by Graphics Processing Units and runs 102 times faster than unaccelerated EA when evolving 32 populations, each containing 6400 individuals.
Accurate state-of-health (SOH) estimation is critical to guarantee the safety, efficiency and reliability of battery-powered applications. Most SOH estimation methods focus on the 0-100\% full state-of-charge (SOC) range that has similar distributions. However, the batteries in real-world applications usually work in the partial SOC range under shallow-cycle conditions and follow different degradation profiles with no labeled data available, thus making SOH estimation challenging. To estimate shallow-cycle battery SOH, a novel unsupervised deep transfer learning method is proposed to bridge different domains using self-attention distillation module and multi-kernel maximum mean discrepancy technique. The proposed method automatically extracts domain-variant features from charge curves to transfer knowledge from the large-scale labeled full cycles to the unlabeled shallow cycles. The CALCE and SNL battery datasets are employed to verify the effectiveness of the proposed method to estimate the battery SOH for different SOC ranges, temperatures, and discharge rates. The proposed method achieves a root-mean-square error within 2\% and outperforms other transfer learning methods for different SOC ranges. When applied to batteries with different operating conditions and from different manufacturers, the proposed method still exhibits superior SOH estimation performance. The proposed method is the first attempt at accurately estimating battery SOH under shallow-cycle conditions without needing a full-cycle characteristic test.
Deep learning (DL) algorithms have been widely applied to short-term voltage stability (STVS) assessment in power systems. However, transferring the knowledge learned in one power grid to other power grids with topology changes is still a challenging task. This paper proposed a transferable DL-based model for STVS assessment by constructing the topology-aware voltage dynamic features from raw PMU data. Since the reactive power flow and grid topology are essential to voltage stability, the topology-aware and physics-informed voltage dynamic features are utilized to effectively represent the topological and temporal patterns from post-disturbance system dynamic trajectories. The proposed DL-based STVS assessment model is tested under random operating conditions on the New England 39-bus system. It has 99.99\% classification accuracy of the short-term voltage stability status using the topology-aware and physics-informed voltage dynamic features. In addition to high accuracy, the experiments show good adaptability to PMU errors. Moreover, The proposed STVS assessment method has outstanding performance on new grid topologies after fine-tuning. In particular, the highest accuracy reaches 99.68\% in evaluation, which demonstrates a good knowledge transfer ability of the proposed model for power grid topology change.
Joint radar and communications (JRC) can realize two radio frequency (RF) functions using one set of resources, greatly saving hardware, energy and spectrum for wireless systems needing both functions. Frequency-hopping (FH) MIMO radar is a popular candidate for JRC, as the achieved communication symbol rate can greatly exceed radar pulse repetition frequency. However, practical transceiver imperfections can fail many existing theoretical designs. In this work, we unveil for the first time the non-trivial impact of hardware imperfections on FH-MIMO JRC and analytically model the impact. We also design new waveforms and, accordingly, develop a low-complexity algorithm to jointly estimate the hardware imperfections of unsynchronized receiver. Moreover, employing low-cost software-defined radios and commercial off-the-shelf (COTS) products, we build the first FH-MIMO JRC experiment platform with radar and communications simultaneously validated over the air. Corroborated by simulation and experiment results, the proposed designs achieves high performances for both radar and communications.