Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China, School of Computing, University of Portsmouth, Portsmouth, United Kingdom
Abstract:To provide a lightweight and cost-effective solution for the long-wave infrared imaging using a singlet, we develop a camera by integrating a High-Frequency-Enhancing Cycle-GAN neural network into a metalens imaging system. The High-Frequency-Enhancing Cycle-GAN improves the quality of the original metalens images by addressing inherent frequency loss introduced by the metalens. In addition to the bidirectional cyclic generative adversarial network, it incorporates a high-frequency adversarial learning module. This module utilizes wavelet transform to extract high-frequency components, and then establishes a high-frequency feedback loop. It enables the generator to enhance the camera outputs by integrating adversarial feedback from the high-frequency discriminator. This ensures that the generator adheres to the constraints imposed by the high-frequency adversarial loss, thereby effectively recovering the camera's frequency loss. This recovery guarantees high-fidelity image output from the camera, facilitating smooth video production. Our camera is capable of achieving dynamic imaging at 125 frames per second with an End Point Error value of 12.58. We also achieve 0.42 for Fr\'echet Inception Distance, 30.62 for Peak Signal to Noise Ratio, and 0.69 for Structural Similarity in the recorded videos.
Abstract:Detecting ships in synthetic aperture radar (SAR) images is challenging due to strong speckle noise, complex surroundings, and varying scales. This paper proposes MLDet, a multitask learning framework for SAR ship detection, consisting of object detection, speckle suppression, and target segmentation tasks. An angle classification loss with aspect ratio weighting is introduced to improve detection accuracy by addressing angular periodicity and object proportions. The speckle suppression task uses a dual-feature fusion attention mechanism to reduce noise and fuse shallow and denoising features, enhancing robustness. The target segmentation task, leveraging a rotated Gaussian-mask, aids the network in extracting target regions from cluttered backgrounds and improves detection efficiency with pixel-level predictions. The Gaussian-mask ensures ship centers have the highest probabilities, gradually decreasing outward under a Gaussian distribution. Additionally, a weighted rotated boxes fusion (WRBF) strategy combines multi-direction anchor predictions, filtering anchors beyond boundaries or with high overlap but low confidence. Extensive experiments on SSDD+ and HRSID datasets demonstrate the effectiveness and superiority of MLDet.
Abstract:It is well-known that a diverse corpus is critical for training large language models, which are typically constructed from a mixture of various domains. In general, previous efforts resort to sampling training data from different domains with static proportions, as well as adjusting data proportions during training. However, few methods have addressed the complexities of domain-adaptive continual pre-training. To fill this gap, we propose Velocitune, a novel framework dynamically assesses learning velocity and adjusts data proportions accordingly, favoring slower-learning domains while shunning faster-learning ones, which is guided by a scaling law to indicate the desired learning goal for each domain with less associated cost. To evaluate the effectiveness of Velocitune, we conduct experiments in a reasoning-focused dataset with CodeLlama, as well as in a corpus specialised for system command generation with Llama3 and Mistral. Velocitune achieves performance gains in both math and code reasoning tasks and command-line generation benchmarks. Further analysis reveals that key factors driving Velocitune's effectiveness include target loss prediction and data ordering.
Abstract:Traditional approaches for designing analog circuits are time-consuming and require significant human expertise. Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes. In our work, we introduce a novel approach, LEDRO, utilizing Large Language Models (LLMs) in conjunction with optimization techniques to iteratively refine the design space for analog circuit sizing. LEDRO is highly generalizable compared to other RL and BO baselines, eliminating the need for design annotation or model training for different topologies or technology nodes. We conduct a comprehensive evaluation of our proposed framework and baseline on 22 different Op-Amp topologies across four FinFET technology nodes. Results demonstrate the superior performance of LEDRO as it outperforms our best baseline by an average of 13% FoM improvement with 2.15x speed-up on low complexity Op-Amps and 48% FoM improvement with 1.7x speed-up on high complexity Op-Amps. This highlights LEDRO's effective performance, efficiency, and generalizability.
Abstract:Reinforcement Learning with Human Feedback (RLHF) is the key to the success of large language models (LLMs) in recent years. In this work, we first introduce the concepts of knowledge breadth and knowledge depth, which measure the comprehensiveness and depth of an LLM or knowledge source respectively. We reveal that the imbalance in the number of prompts and responses can lead to a potential disparity in breadth and depth learning within alignment tuning datasets by showing that even a simple uniform method for balancing the number of instructions and responses can lead to significant improvements. Building on this, we further propose Balanced Preference Optimization (BPO), designed to dynamically augment the knowledge depth of each sample. BPO is motivated by the observation that the usefulness of knowledge varies across samples, necessitating tailored learning of knowledge depth. To achieve this, we introduce gradient-based clustering, estimating the knowledge informativeness and usefulness of each augmented sample based on the model's optimization direction. Our experimental results across various benchmarks demonstrate that BPO outperforms other baseline methods in alignment tuning while maintaining training efficiency. Furthermore, we conduct a detailed analysis of each component of BPO, providing guidelines for future research in preference data optimization.
Abstract:In this paper, we review legal testing methods based on Large Language Models (LLMs), using the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions. We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain. Systematic tests are conducted on English and Chinese legal cases, and the results are analyzed in depth. Through systematic testing of legal cases from common law systems and China, this paper explores the strengths and weaknesses of LLMs in understanding and applying legal texts, reasoning through legal issues, and predicting judgments. The experimental results highlight both the potential and limitations of LLMs in legal applications, particularly in terms of challenges related to the interpretation of legal language and the accuracy of legal reasoning. Finally, the paper provides a comprehensive analysis of the advantages and disadvantages of various types of models, offering valuable insights and references for the future application of AI in the legal field.
Abstract:This paper provides an in-depth analysis of Token2Wave, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In Token2Wave, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.
Abstract:Explainability is an essential reason limiting the application of neural networks in many vital fields. Although neuro-symbolic AI hopes to enhance the overall explainability by leveraging the transparency of symbolic learning, the results are less evident than imagined. This article proposes a classification for explainability by considering both model design and behavior of 191 studies from 2013, focusing on neuro-symbolic AI, hoping to inspire scholars who want to understand the explainability of neuro-symbolic AI. Precisely, we classify them into five categories by considering whether the form of bridging the representation differences is readable as their design factor, if there are representation differences between neural networks and symbolic logic learning, and whether a model decision or prediction process is understandable as their behavior factor: implicit intermediate representations and implicit prediction, partially explicit intermediate representations and partially explicit prediction, explicit intermediate representations or explicit prediction, explicit intermediate representation and explicit prediction, unified representation and explicit prediction. We also analyzed the research trends and three significant challenges: unified representations, explainability and transparency, and sufficient cooperation from neural networks and symbolic learning. Finally, we put forward suggestions for future research in three aspects: unified representations, enhancing model explainability, ethical considerations, and social impact.
Abstract:Neuro-symbolic AI is an effective method for improving the overall performance of AI models by combining the advantages of neural networks and symbolic learning. However, there are differences between the two in terms of how they process data, primarily because they often use different data representation methods, which is often an important factor limiting the overall performance of the two. From this perspective, we analyzed 191 studies from 2013 by constructing a four-level classification framework. The first level defines five types of representation spaces, and the second level focuses on five types of information modalities that the representation space can represent. Then, the third level describes four symbolic logic methods. Finally, the fourth-level categories propose three collaboration strategies between neural networks and symbolic learning. Furthermore, we conducted a detailed analysis of 46 research based on their representation space.
Abstract:We propose an innovative token representation and update method in a new ultra-small language model: the Wave network. Specifically, we use a \textbf{complex vector} to represent each token, encoding both global and local semantics of the input text. A \textbf{complex vector} consists of two components: a magnitude vector representing the \textit{global semantics} of the input text, and a phase vector capturing the \textit{relationships between individual tokens and global semantics}. Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly initialized token embeddings, our single-layer Wave Network achieves 90.91\% accuracy with wave interference and 91.66\% with wave modulation -- outperforming a single Transformer layer using BERT pre-trained embeddings by 19.23\% and 19.98\%, respectively, and approaching the accuracy of the pre-trained and fine-tuned BERT base model (94.64\%). Additionally, compared to BERT base, the Wave Network reduces video memory usage and training time by 77.34\% and 85.62\% during wave modulation. In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.