Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qi Zhu

Northwestern University

Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Feb 05, 2024

Payal Mohapatra, Lixu Wang, Qi Zhu

Figure 1 for Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Figure 2 for Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Figure 3 for Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Figure 4 for Phase-driven Domain Generalizable Learning for Nonstationary Time Series

Abstract:Monitoring and recognizing patterns in continuous sensing data is crucial for many practical applications. These real-world time-series data are often nonstationary, characterized by varying statistical and spectral properties over time. This poses a significant challenge in developing learning models that can effectively generalize across different distributions. In this work, based on our observation that nonstationary statistics are intrinsically linked to the phase information, we propose a time-series learning framework, PhASER. It consists of three novel elements: 1) phase augmentation that diversifies non-stationarity while preserving discriminatory semantics, 2) separate feature encoding by viewing time-varying magnitude and phase as independent modalities, and 3) feature broadcasting by incorporating phase with a novel residual connection for inherent regularization to enhance distribution invariant learning. Upon extensive evaluation on 5 datasets from human activity recognition, sleep-stage classification, and gesture recognition against 10 state-of-the-art baseline methods, we demonstrate that PhASER consistently outperforms the best baselines by an average of 5% and up to 13% in some cases. Moreover, PhASER's principles can be applied broadly to boost the generalization ability of existing time series classification models.

Via

Access Paper or Ask Questions

Federated Learning with New Knowledge: Fundamentals, Advances, and Futures

Feb 03, 2024

Lixu Wang, Yang Zhao, Jiahua Dong, Ating Yin, Qinbin Li, Xiao Wang, Dusit Niyato, Qi Zhu

Abstract:Federated Learning (FL) is a privacy-preserving distributed learning approach that is rapidly developing in an era where privacy protection is increasingly valued. It is this rapid development trend, along with the continuous emergence of new demands for FL in the real world, that prompts us to focus on a very important problem: Federated Learning with New Knowledge. The primary challenge here is to effectively incorporate various new knowledge into existing FL systems and evolve these systems to reduce costs, extend their lifespan, and facilitate sustainable development. In this paper, we systematically define the main sources of new knowledge in FL, including new features, tasks, models, and algorithms. For each source, we thoroughly analyze and discuss how to incorporate new knowledge into existing FL systems and examine the impact of the form and timing of new knowledge arrival on the incorporation process. Furthermore, we comprehensively discuss the potential future directions for FL with new knowledge, considering a variety of factors such as scenario setups, efficiency, and security. There is also a continuously updating repository for this topic: https://github.com/conditionWang/FLNK.

* 10 pages

Via

Access Paper or Ask Questions

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Jan 30, 2024

Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang(+3 more)

Figure 1 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Figure 2 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Figure 3 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Figure 4 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Abstract:The recent trend of using Large Language Models (LLMs) as intelligent agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools. However, existing benchmarks typically focus on simple synthesized queries that do not reflect real-world complexity, thereby offering limited perspectives in evaluating tool utilization. To address this issue, we present UltraTool, a novel benchmark designed to improve and evaluate LLMs' ability in tool utilization within real-world scenarios. UltraTool focuses on the entire process of using tools - from planning and creating to applying them in complex tasks. It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving. A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage and simplifies the task solving by mapping out the intermediate steps. Thus, unlike previous work, it eliminates the restriction of pre-defined toolset during planning. Through extensive experiments on various LLMs, we offer novel insights into the evaluation of capabilities of LLMs in tool utilization, thereby contributing a fresh perspective to this rapidly evolving field. The benchmark is publicly available at https://github.com/JoeYing1019/UltraTool.

Via

Access Paper or Ask Questions

YODA: Teacher-Student Progressive Learning for Language Models

Jan 28, 2024

Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang(+3 more)

Figure 1 for YODA: Teacher-Student Progressive Learning for Language Models

Figure 2 for YODA: Teacher-Student Progressive Learning for Language Models

Figure 3 for YODA: Teacher-Student Progressive Learning for Language Models

Figure 4 for YODA: Teacher-Student Progressive Learning for Language Models

Abstract:Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-student progressive learning framework that emulates the teacher-student education process to improve the efficacy of model fine-tuning. The framework operates on an interactive \textit{basic-generalized-harder} loop. The teacher agent provides tailored feedback on the student's answers, and systematically organizes the education process. This process unfolds by teaching the student basic examples, reinforcing understanding through generalized questions, and then enhancing learning by posing questions with progressively enhanced complexity. With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions. The systematic procedural data, which reflects the progressive learning process of humans, is then utilized for model training. Taking math reasoning as a testbed, experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain (+17.01\% on GSM8K and +9.98\% on MATH). In addition, we find that training with curriculum learning further improves learning robustness.

* 14 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Software Engineering for Robotics: Future Research Directions; Report from the 2023 Workshop on Software Engineering for Robotics

Jan 22, 2024

Claire Le Goues, Sebastian Elbaum, David Anthony, Z. Berkay Celik, Mauricio Castillo-Effen, Nikolaus Correll, Pooyan Jamshidi, Morgan Quigley, Trenton Tabor, Qi Zhu

Abstract:Robots are experiencing a revolution as they permeate many aspects of our daily lives, from performing house maintenance to infrastructure inspection, from efficiently warehousing goods to autonomous vehicles, and more. This technical progress and its impact are astounding. This revolution, however, is outstripping the capabilities of existing software development processes, techniques, and tools, which largely have remained unchanged for decades. These capabilities are ill-suited to handling the challenges unique to robotics software such as dealing with a wide diversity of domains, heterogeneous hardware, programmed and learned components, complex physical environments captured and modeled with uncertainty, emergent behaviors that include human interactions, and scalability demands that span across multiple dimensions. Looking ahead to the need to develop software for robots that are ever more ubiquitous, autonomous, and reliant on complex adaptive components, hardware, and data, motivated an NSF-sponsored community workshop on the subject of Software Engineering for Robotics, held in Detroit, Michigan in October 2023. The goal of the workshop was to bring together thought leaders across robotics and software engineering to coalesce a community, and identify key problems in the area of SE for robotics that that community should aim to solve over the next 5 years. This report serves to summarize the motivation, activities, and findings of that workshop, in particular by articulating the challenges unique to robot software, and identifying a vision for fruitful near-term research directions to tackle them.

* 16 pages

Via

Access Paper or Ask Questions

DACR: Distribution-Augmented Contrastive Reconstruction for Time-Series Anomaly Detection

Jan 20, 2024

Lixu Wang, Shichao Xu, Xinyu Du, Qi Zhu

Figure 1 for DACR: Distribution-Augmented Contrastive Reconstruction for Time-Series Anomaly Detection

Figure 2 for DACR: Distribution-Augmented Contrastive Reconstruction for Time-Series Anomaly Detection

Figure 3 for DACR: Distribution-Augmented Contrastive Reconstruction for Time-Series Anomaly Detection

Figure 4 for DACR: Distribution-Augmented Contrastive Reconstruction for Time-Series Anomaly Detection

Abstract:Anomaly detection in time-series data is crucial for identifying faults, failures, threats, and outliers across a range of applications. Recently, deep learning techniques have been applied to this topic, but they often struggle in real-world scenarios that are complex and highly dynamic, e.g., the normal data may consist of multiple distributions, and various types of anomalies may differ from the normal data to different degrees. In this work, to tackle these challenges, we propose Distribution-Augmented Contrastive Reconstruction (DACR). DACR generates extra data disjoint from the normal data distribution to compress the normal data's representation space, and enhances the feature extractor through contrastive learning to better capture the intrinsic semantics from time-series data. Furthermore, DACR employs an attention mechanism to model the semantic dependencies among multivariate time-series features, thereby achieving more robust reconstruction for anomaly detection. Extensive experiments conducted on nine benchmark datasets in various anomaly detection scenarios demonstrate the effectiveness of DACR in achieving new state-of-the-art time-series anomaly detection.

* 5 pages, 3 figures, accepted at ICASSP 2024

Via

Access Paper or Ask Questions

Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns

Dec 21, 2023

Yifei Sun, Qi Zhu, Yang Yang, Chunping Wang, Tianyu Fan, Jiajun Zhu, Lei Chen

Abstract:Recently, the paradigm of pre-training and fine-tuning graph neural networks has been intensively studied and applied in a wide range of graph mining tasks. Its success is generally attributed to the structural consistency between pre-training and downstream datasets, which, however, does not hold in many real-world scenarios. Existing works have shown that the structural divergence between pre-training and downstream graphs significantly limits the transferability when using the vanilla fine-tuning strategy. This divergence leads to model overfitting on pre-training graphs and causes difficulties in capturing the structural properties of the downstream graphs. In this paper, we identify the fundamental cause of structural divergence as the discrepancy of generative patterns between the pre-training and downstream graphs. Furthermore, we propose G-Tuning to preserve the generative patterns of downstream graphs. Given a downstream graph G, the core idea is to tune the pre-trained GNN so that it can reconstruct the generative patterns of G, the graphon W. However, the exact reconstruction of a graphon is known to be computationally expensive. To overcome this challenge, we provide a theoretical analysis that establishes the existence of a set of alternative graphons called graphon bases for any given graphon. By utilizing a linear combination of these graphon bases, we can efficiently approximate W. This theoretical finding forms the basis of our proposed model, as it enables effective learning of the graphon bases and their associated coefficients. Compared with existing algorithms, G-Tuning demonstrates an average improvement of 0.5% and 2.6% on in-domain and out-of-domain transfer learning experiments, respectively.

* Accepted to AAAI 2024

Via

Access Paper or Ask Questions

Federated Continual Novel Class Learning

Dec 21, 2023

Lixu Wang, Chenxi Liu, Junfeng Guo, Jiahua Dong, Xiao Wang, Heng Huang, Qi Zhu

Figure 1 for Federated Continual Novel Class Learning

Figure 2 for Federated Continual Novel Class Learning

Figure 3 for Federated Continual Novel Class Learning

Figure 4 for Federated Continual Novel Class Learning

Abstract:In a privacy-focused era, Federated Learning (FL) has emerged as a promising machine learning technique. However, most existing FL studies assume that the data distribution remains nearly fixed over time, while real-world scenarios often involve dynamic and continual changes. To equip FL systems with continual model evolution capabilities, we focus on an important problem called Federated Continual Novel Class Learning (FedCN) in this work. The biggest challenge in FedCN is to merge and align novel classes that are discovered and learned by different clients without compromising privacy. To address this, we propose a Global Alignment Learning (GAL) framework that can accurately estimate the global novel class number and provide effective guidance for local training from a global perspective, all while maintaining privacy protection. Specifically, GAL first locates high-density regions in the representation space through a bi-level clustering mechanism to estimate the novel class number, with which the global prototypes corresponding to novel classes can be constructed. Then, GAL uses a novel semantic weighted loss to capture all possible correlations between these prototypes and the training data for mitigating the impact of pseudo-label noise and data heterogeneity. Extensive experiments on various datasets demonstrate GAL's superior performance over state-of-the-art novel class discovery methods. In particular, GAL achieves significant improvements in novel-class performance, increasing the accuracy by 5.1% to 10.6% in the case of one novel class learning stage and by 7.8% to 17.9% in the case of two novel class learning stages, without sacrificing known-class performance. Moreover, GAL is shown to be effective in equipping a variety of different mainstream FL algorithms with novel class discovery and learning capability, highlighting its potential for many real-world applications.

* 23 pages, 3 figures

Via

Access Paper or Ask Questions

Data Management For Large Language Models: A Survey

Dec 04, 2023

Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu

Figure 1 for Data Management For Large Language Models: A Survey

Figure 2 for Data Management For Large Language Models: A Survey

Figure 3 for Data Management For Large Language Models: A Survey

Figure 4 for Data Management For Large Language Models: A Survey

Abstract:Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at https://github.com/ZigeW/data_management_LLM.

* Work in progress

Via

Access Paper or Ask Questions

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

Nov 28, 2023

Yixuan Wang, Ruochen Jiao, Chengtian Lang, Sinong Simon Zhan, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

Abstract:Autonomous Driving (AD) faces crucial hurdles for commercial launch, notably in the form of diminished public trust and safety concerns from long-tail unforeseen driving scenarios. This predicament is due to the limitation of deep neural networks in AD software, which struggle with interpretability and exhibit poor generalization capabilities in out-of-distribution and uncertain scenarios. To this end, this paper advocates for the integration of Large Language Models (LLMs) into the AD system, leveraging their robust common-sense knowledge, reasoning abilities, and human-interaction capabilities. The proposed approach deploys the LLM as an intelligent decision-maker in planning, incorporating safety verifiers for contextual safety learning to enhance overall AD performance and safety. We present results from two case studies that affirm the efficacy of our approach. We further discuss the potential integration of LLM for other AD software components including perception, prediction, and simulation. Despite the observed challenges in the case studies, the integration of LLMs is promising and beneficial for reinforcing both safety and performance in AD.

* 14 pages, 7 figures

Via

Access Paper or Ask Questions