Alert button
Picture for Yan Feng

Yan Feng

Alert button

CDR: Conservative Doubly Robust Learning for Debiased Recommendation

Aug 17, 2023
ZiJie Song, JiaWei Chen, Sheng Zhou, QiHao Shi, Yan Feng, Chun Chen, Can Wang

Figure 1 for CDR: Conservative Doubly Robust Learning for Debiased Recommendation
Figure 2 for CDR: Conservative Doubly Robust Learning for Debiased Recommendation
Figure 3 for CDR: Conservative Doubly Robust Learning for Debiased Recommendation
Figure 4 for CDR: Conservative Doubly Robust Learning for Debiased Recommendation

In recommendation systems (RS), user behavior data is observational rather than experimental, resulting in widespread bias in the data. Consequently, tackling bias has emerged as a major challenge in the field of recommendation systems. Recently, Doubly Robust Learning (DR) has gained significant attention due to its remarkable performance and robust properties. However, our experimental findings indicate that existing DR methods are severely impacted by the presence of so-called Poisonous Imputation, where the imputation significantly deviates from the truth and becomes counterproductive. To address this issue, this work proposes Conservative Doubly Robust strategy (CDR) which filters imputations by scrutinizing their mean and variance. Theoretical analyses show that CDR offers reduced variance and improved tail bounds.In addition, our experimental investigations illustrate that CDR significantly enhances performance and can indeed reduce the frequency of poisonous imputation.

Viaarxiv icon

A data-driven approach to predict decision point choice during normal and evacuation wayfinding in multi-story buildings

Aug 07, 2023
Yan Feng, Panchamy Krishnakumari

Understanding pedestrian route choice behavior in complex buildings is important to ensure pedestrian safety. Previous studies have mostly used traditional data collection methods and discrete choice modeling to understand the influence of different factors on pedestrian route and exit choice, particularly in simple indoor environments. However, research on pedestrian route choice in complex buildings is still limited. This paper presents a data-driven approach for understanding and predicting the pedestrian decision point choice during normal and emergency wayfinding in a multi-story building. For this, we first built an indoor network representation and proposed a data mapping technique to map VR coordinates to the indoor representation. We then used a well-established machine learning algorithm, namely the random forest (RF) model to predict pedestrian decision point choice along a route during four wayfinding tasks in a multi-story building. Pedestrian behavioral data in a multi-story building was collected by a Virtual Reality experiment. The results show a much higher prediction accuracy of decision points using the RF model (i.e., 93% on average) compared to the logistic regression model. The highest prediction accuracy was 96% for task 3. Additionally, we tested the model performance combining personal characteristics and we found that personal characteristics did not affect decision point choice. This paper demonstrates the potential of applying a machine learning algorithm to study pedestrian route choice behavior in complex indoor buildings.

Viaarxiv icon

OpenGSL: A Comprehensive Benchmark for Graph Structure Learning

Jun 17, 2023
Zhiyao Zhou, Sheng Zhou, Bochao Mao, Xuanyi Zhou, Jiawei Chen, Qiaoyu Tan, Daochen Zha, Can Wang, Yan Feng, Chun Chen

Figure 1 for OpenGSL: A Comprehensive Benchmark for Graph Structure Learning
Figure 2 for OpenGSL: A Comprehensive Benchmark for Graph Structure Learning
Figure 3 for OpenGSL: A Comprehensive Benchmark for Graph Structure Learning
Figure 4 for OpenGSL: A Comprehensive Benchmark for Graph Structure Learning

Graph Neural Networks (GNNs) have emerged as the de facto standard for representation learning on graphs, owing to their ability to effectively integrate graph topology and node attributes. However, the inherent suboptimal nature of node connections, resulting from the complex and contingent formation process of graphs, presents significant challenges in modeling them effectively. To tackle this issue, Graph Structure Learning (GSL), a family of data-centric learning approaches, has garnered substantial attention in recent years. The core concept behind GSL is to jointly optimize the graph structure and the corresponding GNN models. Despite the proposal of numerous GSL methods, the progress in this field remains unclear due to inconsistent experimental protocols, including variations in datasets, data processing techniques, and splitting strategies. In this paper, we introduce OpenGSL, the first comprehensive benchmark for GSL, aimed at addressing this gap. OpenGSL enables a fair comparison among state-of-the-art GSL methods by evaluating them across various popular datasets using uniform data processing and splitting strategies. Through extensive experiments, we observe that existing GSL methods do not consistently outperform vanilla GNN counterparts. However, we do observe that the learned graph structure demonstrates a strong generalization ability across different GNN backbones, despite its high computational and space requirements. We hope that our open-sourced library will facilitate rapid and equitable evaluation and inspire further innovative research in the field of GSL. The code of the benchmark can be found in https://github.com/OpenGSL/OpenGSL.

* 9 pages, 4 figures 
Viaarxiv icon

Generalizable Black-Box Adversarial Attack with Meta Learning

Jan 01, 2023
Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, Jingyi Zhang, Yanbo Fan, Yujiu Yang

Figure 1 for Generalizable Black-Box Adversarial Attack with Meta Learning
Figure 2 for Generalizable Black-Box Adversarial Attack with Meta Learning
Figure 3 for Generalizable Black-Box Adversarial Attack with Meta Learning
Figure 4 for Generalizable Black-Box Adversarial Attack with Meta Learning

In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.

* T-PAMI 2022. Project Page is at https://github.com/SCLBD/MCG-Blackbox 
Viaarxiv icon

Robust Sequence Networked Submodular Maximization

Dec 28, 2022
Qihao Shi, Bingyang Fu, Can Wang, Jiawei Chen, Sheng Zhou, Yan Feng, Chun Chen

Figure 1 for Robust Sequence Networked Submodular Maximization
Figure 2 for Robust Sequence Networked Submodular Maximization
Figure 3 for Robust Sequence Networked Submodular Maximization
Figure 4 for Robust Sequence Networked Submodular Maximization

In this paper, we study the \underline{R}obust \underline{o}ptimization for \underline{se}quence \underline{Net}worked \underline{s}ubmodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges in the graph. Under such networked submodular scenario, the impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable. In this paper, we take the first step to study the RoseNets problem. We design a robust greedy algorithm, which is robust against the removal of an arbitrary subset of the selected elements. The approximation ratio of the algorithm depends both on the number of the removed elements and the network topology. We further conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness of the proposed algorithm.

* 12 pages, 14 figures, aaai2023 conference accepted 
Viaarxiv icon

Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification

Nov 30, 2022
Daniel Getter, Julie Bessac, Johann Rudi, Yan Feng

Figure 1 for Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification
Figure 2 for Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification
Figure 3 for Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification
Figure 4 for Statistical treatment of convolutional neural network super-resolution of inland surface wind for subgrid-scale variability quantification

Machine learning models are frequently employed to perform either purely physics-free or hybrid downscaling of climate data. However, the majority of these implementations operate over relatively small downscaling factors of about 4--6x. This study examines the ability of convolutional neural networks (CNN) to downscale surface wind speed data from three different coarse resolutions (25km, 48km, and 100km side-length grid cells) to 3km and additionally focuses on the ability to recover subgrid-scale variability. Within each downscaling factor, namely 8x, 16x, and 32x, we consider models that produce fine-scale wind speed predictions as functions of different input features: coarse wind fields only; coarse wind and fine-scale topography; and coarse wind, topography, and temporal information in the form of a timestamp. Furthermore, we train one model at 25km to 3km resolution whose fine-scale outputs are probability density function parameters through which sample wind speeds can be generated. All CNN predictions performed on one out-of-sample data outperform classical interpolation. Models with coarse wind and fine topography are shown to exhibit the best performance compared to other models operating across the same downscaling factor. Our timestamp encoding results in lower out-of-sample generalizability compared to other input configurations. Overall, the downscaling factor plays the largest role in model performance.

Viaarxiv icon

Accelerating Diffusion Sampling with Classifier-based Feature Distillation

Nov 22, 2022
Wujie Sun, Defang Chen, Can Wang, Deshi Ye, Yan Feng, Chun Chen

Figure 1 for Accelerating Diffusion Sampling with Classifier-based Feature Distillation
Figure 2 for Accelerating Diffusion Sampling with Classifier-based Feature Distillation
Figure 3 for Accelerating Diffusion Sampling with Classifier-based Feature Distillation
Figure 4 for Accelerating Diffusion Sampling with Classifier-based Feature Distillation

Although diffusion model has shown great potential for generating higher quality images than GANs, slow sampling speed hinders its wide application in practice. Progressive distillation is thus proposed for fast sampling by progressively aligning output images of $N$-step teacher sampler with $N/2$-step student sampler. In this paper, we argue that this distillation-based accelerating method can be further improved, especially for few-step samplers, with our proposed \textbf{C}lassifier-based \textbf{F}eature \textbf{D}istillation (CFD). Instead of aligning output images, we distill teacher's sharpened feature distribution into the student with a dataset-independent classifier, making the student focus on those important features to improve performance. We also introduce a dataset-oriented loss to further optimize the model. Experiments on CIFAR-10 show the superiority of our method in achieving high quality and fast sampling. Code will be released soon.

Viaarxiv icon

Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression

Aug 26, 2022
Yi-chong Xia, Bin Chen, Yan Feng, Tian-shuo Ge

Figure 1 for Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression
Figure 2 for Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression
Figure 3 for Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression
Figure 4 for Multi-Scale Architectures Matter: On the Adversarial Robustness of Flow-based Lossless Compression

As a probabilistic modeling technique, the flow-based model has demonstrated remarkable potential in the field of lossless compression \cite{idf,idf++,lbb,ivpf,iflow},. Compared with other deep generative models (eg. Autoregressive, VAEs) \cite{bitswap,hilloc,pixelcnn++,pixelsnail} that explicitly model the data distribution probabilities, flow-based models perform better due to their excellent probability density estimation and satisfactory inference speed. In flow-based models, multi-scale architecture provides a shortcut from the shallow layer to the output layer, which significantly reduces the computational complexity and avoid performance degradation when adding more layers. This is essential for constructing an advanced flow-based learnable bijective mapping. Furthermore, the lightweight requirement of the model design in practical compression tasks suggests that flows with multi-scale architecture achieve the best trade-off between coding complexity and compression efficiency.

Viaarxiv icon

sqSGD: Locally Private and Communication Efficient Federated Learning

Jun 22, 2022
Yan Feng, Tao Xiong, Ruofan Wu, LingJuan Lv, Leilei Shi

Figure 1 for sqSGD: Locally Private and Communication Efficient Federated Learning
Figure 2 for sqSGD: Locally Private and Communication Efficient Federated Learning
Figure 3 for sqSGD: Locally Private and Communication Efficient Federated Learning
Figure 4 for sqSGD: Locally Private and Communication Efficient Federated Learning

Federated learning (FL) is a technique that trains machine learning models from decentralized data sources. We study FL under local notions of privacy constraints, which provides strong protection against sensitive data disclosures via obfuscating the data before leaving the client. We identify two major concerns in designing practical privacy-preserving FL algorithms: communication efficiency and high-dimensional compatibility. We then develop a gradient-based learning algorithm called \emph{sqSGD} (selective quantized stochastic gradient descent) that addresses both concerns. The proposed algorithm is based on a novel privacy-preserving quantization scheme that uses a constant number of bits per dimension per client. Then we improve the base algorithm in three ways: first, we apply a gradient subsampling strategy that simultaneously offers better training performance and smaller communication costs under a fixed privacy budget. Secondly, we utilize randomized rotation as a preprocessing step to reduce quantization error. Thirdly, an adaptive gradient norm upper bound shrinkage strategy is adopted to improve accuracy and stabilize training. Finally, the practicality of the proposed framework is demonstrated on benchmark datasets. Experiment results show that sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints. In addition, with fixed privacy and communication level, the performance of sqSGD significantly dominates that of various baseline algorithms.

Viaarxiv icon

Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation

Jun 07, 2022
Zhehui Zhou, Defang Chen, Can Wang, Yan Feng, Chun Chen

Figure 1 for Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation
Figure 2 for Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation
Figure 3 for Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation
Figure 4 for Improving Knowledge Graph Embedding via Iterative Self-Semantic Knowledge Distillation

Knowledge graph embedding (KGE) has been intensively investigated for link prediction by projecting entities and relations into continuous vector spaces. Current popular high-dimensional KGE methods obtain quite slight performance gains while require enormous computation and memory costs. In contrast to high-dimensional KGE models, training low-dimensional models is more efficient and worthwhile for better deployments to practical intelligent systems. However, the model expressiveness of semantic information in knowledge graphs (KGs) is highly limited in the low dimension parameter space. In this paper, we propose iterative self-semantic knowledge distillation strategy to improve the KGE model expressiveness in the low dimension space. KGE model combined with our proposed strategy plays the teacher and student roles alternatively during the whole training process. Specifically, at a certain iteration, the model is regarded as a teacher to provide semantic information for the student. At next iteration, the model is regard as a student to incorporate the semantic information transferred from the teacher. We also design a novel semantic extraction block to extract iteration-based semantic information for the training model self-distillation. Iteratively incorporating and accumulating iteration-based semantic information enables the low-dimensional model to be more expressive for better link prediction in KGs. There is only one model during the whole training, which alleviates the increase of computational expensiveness and memory requirements. Furthermore, the proposed strategy is model-agnostic and can be seamlessly combined with other KGE models. Consistent and significant performance gains in experimental evaluations on four standard datasets demonstrate the effectiveness of the proposed self-distillation strategy.

Viaarxiv icon