Abstract:Traffic time series imputation is crucial for the safety and reliability of intelligent transportation systems, while diverse types of missing data, including random, fiber, and block missing make the imputation task challenging. Existing models often focus on disentangling and separately modeling spatial and temporal patterns based on relationships between data points. However, these approaches struggle to adapt to the random missing positions, and fail to learn long-term and large-scale dependencies, which are essential in extensive missing conditions. In this paper, patterns are categorized into two types to handle various missing data conditions: primary patterns, which originate from internal relationships between data points, and auxiliary patterns, influenced by external factors like timestamps and node attributes. Accordingly, we propose the Primary-Auxiliary Spatio-Temporal network (PAST). It comprises a graph-integrated module (GIM) and a cross-gated module (CGM). GIM captures primary patterns via dynamic graphs with interval-aware dropout and multi-order convolutions, and CGM extracts auxiliary patterns through bidirectional gating on embedded external features. The two modules interact via shared hidden vectors and are trained under an ensemble self-supervised framework. Experiments on three datasets under 27 missing data conditions demonstrate that the imputation accuracy of PAST outperforms seven state-of-the-art baselines by up to 26.2% in RMSE and 31.6% in MAE.
Abstract:Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues, they still suffer from considerable communication overhead. Existing approaches primarily rely on static gradient compression to enhance communication efficiency; however, these methods neglect the dynamic nature of evolving gradients during training, leading to performance degradation. Accelerating LLM training via compression without sacrificing performance remains a challenge. In this paper, we propose an entropy-driven dynamic gradient compression framework called EDGC. The core concept is to adjust the compression rate during LLM training based on the evolving trends of gradient entropy, taking into account both compression efficiency and error. EDGC consists of three key components.First, it employs a down-sampling method to efficiently estimate gradient entropy, reducing computation overhead. Second, it establishes a theoretical model linking compression rate with gradient entropy, enabling more informed compression decisions. Lastly, a window-based adjustment mechanism dynamically adapts the compression rate across pipeline stages, improving communication efficiency and maintaining model performance. We implemented EDGC on a 32-NVIDIA-V100 cluster and a 64-NVIDIA-H100 cluster to train GPT2-2.5B and GPT2-12.1B, respectively. The results show that EDGC significantly reduces communication latency and training time by up to 46.45% and 16.13% while preserving LLM accuracy.
Abstract:Knowledge Graphs (KGs) structure real-world entities and their relationships into triples, enhancing machine reasoning for various tasks. While domain-specific KGs offer substantial benefits, their manual construction is often inefficient and requires specialized knowledge. Recent approaches for knowledge graph construction (KGC) based on large language models (LLMs), such as schema-guided KGC and reference knowledge integration, have proven efficient. However, these methods are constrained by their reliance on manually defined schema, single-document processing, and public-domain references, making them less effective for domain-specific corpora that exhibit complex knowledge dependencies and specificity, as well as limited reference knowledge. To address these challenges, we propose LKD-KGC, a novel framework for unsupervised domain-specific KG construction. LKD-KGC autonomously analyzes document repositories to infer knowledge dependencies, determines optimal processing sequences via LLM driven prioritization, and autoregressively generates entity schema by integrating hierarchical inter-document contexts. This schema guides the unsupervised extraction of entities and relationships, eliminating reliance on predefined structures or external knowledge. Extensive experiments show that compared with state-of-the-art baselines, LKD-KGC generally achieves improvements of 10% to 20% in both precision and recall rate, demonstrating its potential in constructing high-quality domain-specific KGs.




Abstract:Software systems often record important runtime information in logs to help with troubleshooting. Log-based anomaly detection has become a key research area that aims to identify system issues through log data, ultimately enhancing the reliability of software systems. Traditional deep learning methods often struggle to capture the semantic information embedded in log data, which is typically organized in natural language. In this paper, we propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs). LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences. Additionally, we introduce a projector to align the vector representation spaces of BERT and Llama, ensuring a cohesive understanding of log semantics. Unlike conventional methods that require log parsers to extract templates, LogLLM preprocesses log messages with regular expressions, streamlining the entire process. Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability. Experimental results across four public datasets demonstrate that LogLLM outperforms state-of-the-art methods. Even when handling unstable logs, it effectively captures the semantic meaning of log messages and detects anomalies accurately.
Abstract:Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the capability to interpret the causes of anomalies in natural language, providing valuable insights into the detected anomalies.
Abstract:Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.