Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qingsong Zou

Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification

Nov 12, 2025

Liqin Tan, Pin Chen, Menghan Liu, Xiean Wang, Jianhuan Cen, Qingsong Zou

Abstract:We present MatUQ, a benchmark framework for evaluating graph neural networks (GNNs) on out-of-distribution (OOD) materials property prediction with uncertainty quantification (UQ). MatUQ comprises 1,375 OOD prediction tasks constructed from six materials datasets using five OFM-based and a newly proposed structure-aware splitting strategy, SOAP-LOCO, which captures local atomic environments more effectively. We evaluate 12 representative GNN models under a unified uncertainty-aware training protocol that combines Monte Carlo Dropout and Deep Evidential Regression (DER), and introduce a novel uncertainty metric, D-EviU, which shows the strongest correlation with prediction errors in most tasks. Our experiments yield two key findings. First, the uncertainty-aware training approach significantly improves model prediction accuracy, reducing errors by an average of 70.6\% across challenging OOD scenarios. Second, the benchmark reveals that no single model dominates universally: earlier models such as SchNet and ALIGNN remain competitive, while newer models like CrystalFramer and SODNet demonstrate superior performance on specific material properties. These results provide practical insights for selecting reliable models under distribution shifts in materials discovery.

* 12 pages, 1 figure, 5 tables

Via

Access Paper or Ask Questions

Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes

Jan 31, 2025

Zhiyao Xu, Dan Zhao, Qingsong Zou, Jingyu Xiao, Yong Jiang, Zhenhui Yuan, Qing Li

Figure 1 for Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes

Figure 2 for Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes

Figure 3 for Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes

Figure 4 for Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes

Abstract:In recent years, as smart home systems have become more widespread, security concerns within these environments have become a growing threat. Currently, most smart home security solutions, such as anomaly detection and behavior prediction models, are trained using fixed datasets that are precollected. However, the process of dataset collection is time-consuming and lacks the flexibility needed to adapt to the constantly evolving smart home environment. Additionally, the collection of personal data raises significant privacy concerns for users. Lately, large language models (LLMs) have emerged as a powerful tool for a wide range of tasks across diverse application domains, thanks to their strong capabilities in natural language processing, reasoning, and problem-solving. In this paper, we propose an LLM-based synthetic dataset generation IoTGen framework to enhance the generalization of downstream smart home intelligent models. By generating new synthetic datasets that reflect changes in the environment, smart home intelligent models can be retrained to overcome the limitations of fixed and outdated data, allowing them to better align with the dynamic nature of real-world home environments. Specifically, we first propose a Structure Pattern Perception Compression (SPPC) method tailored for IoT behavior data, which preserves the most informative content in the data while significantly reducing token consumption. Then, we propose a systematic approach to create prompts and implement data generation to automatically generate IoT synthetic data with normative and reasonable properties, assisting task models in adaptive training to improve generalization and real-world performance.

Via

Access Paper or Ask Questions

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Jun 18, 2024

Jingyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo(+4 more)

Figure 1 for Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Figure 2 for Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Figure 3 for Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Figure 4 for Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Abstract:Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuard consistently outperforms state-of-the-art baselines and also offers highly interpretable results.

* KDD 2024

Via

Access Paper or Ask Questions

Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

Apr 18, 2023

Wentao Zhang, Yujun Huang, Tong Zhang, Qingsong Zou, Wei-Shi Zheng, Ruixuan Wang

Figure 1 for Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

Figure 2 for Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

Figure 3 for Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

Figure 4 for Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

Abstract:Currently intelligent diagnosis systems lack the ability of continually learning to diagnose new diseases once deployed, under the condition of preserving old disease knowledge. In particular, updating an intelligent diagnosis system with training data of new diseases would cause catastrophic forgetting of old disease knowledge. To address the catastrophic forgetting issue, a novel adapter-based strategy is proposed to help effectively learn a set of new diseases at each round (or task) of continual learning, without changing the shared feature extractor. The learnable lightweight task-specific adapter(s) can be flexibly designed (e.g., two convolutional layers) and then added to the pretrained and fixed feature extractor. Together with a specially designed task-specific head which absorbs all previously learned old diseases as a single 'out-of-distribution' category, task-specific adapter(s) can help the pretrained feature extractor more effectively extract discriminative features between diseases. In addition, a simple yet effective fine-tuning is applied to collaboratively fine-tune multiple task-specific heads such that outputs from different heads are comparable and consequently the appropriate classifier head can be more accurately selected during model inference. Extensive empirical evaluations on three image datasets demonstrate the superior performance of the proposed method in continual learning of new diseases. The source code will be released publicly.

* 10 pages

Via

Access Paper or Ask Questions