Alert button
Picture for Rui Wang

Rui Wang

Alert button

Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

Sep 20, 2023
Shaocong Xu, Pengfei Li, Xinyu Liu, Qianpu Sun, Yang Li, Shihui Guo, Zhen Wang, Bo Jiang, Rui Wang, Kehua Sheng, Bo Zhang, Hao Zhao

Figure 1 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 2 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 3 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 4 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

LiDAR-based semantic scene understanding is an important module in the modern autonomous driving perception stack. However, identifying Out-Of-Distribution (OOD) points in a LiDAR point cloud is challenging as point clouds lack semantically rich features when compared with RGB images. We revisit this problem from the perspective of selective classification, which introduces a selective function into the standard closed-set classification setup. Our solution is built upon the basic idea of abstaining from choosing any known categories but learns a point-wise abstaining penalty with a marginbased loss. Synthesizing outliers to approximate unlimited OOD samples is also critical to this idea, so we propose a strong synthesis pipeline that generates outliers originated from various factors: unrealistic object categories, sampling patterns and sizes. We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance. We benchmark our method on SemanticKITTI and nuScenes and achieve state-of-the-art results. Risk-coverage analysis further reveals intrinsic properties of different methods. Codes and models will be publicly available.

* codes is available at https://github.com/Daniellli/PAD.git 
Viaarxiv icon

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Sep 19, 2023
Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM.

Viaarxiv icon

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Sep 17, 2023
Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Figure 1 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 2 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 3 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 4 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.

Viaarxiv icon

A Benchmark for Text Expansion: Datasets, Metrics, and Baselines

Sep 17, 2023
Yi Chen, Haiyun Jiang, Wei Bi, Rui Wang, Longyue Wang, Shuming Shi, Ruifeng Xu

This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifiers into proper locations of the plain text to concretize or vivify human writings. Different from existing insertion-based writing assistance tasks, TE requires the model to be more flexible in both locating and generation, and also more cautious in keeping basic semantics. We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references for both English and Chinese. To facilitate automatic evaluation, we design various metrics from multiple perspectives. In particular, we propose Info-Gain to effectively measure the informativeness of expansions, which is an important quality dimension in TE. On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines, especially in expansion informativeness. Experiments verify the feasibility of the TE task and point out potential directions for future research toward better automatic text expansion.

Viaarxiv icon

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Sep 15, 2023
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.

* Work in progress 
Viaarxiv icon

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Sep 07, 2023
Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

Figure 1 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 2 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 3 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection
Figure 4 for DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.

* Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine 
Viaarxiv icon

Exploring the Robustness of Human Parsers Towards Common Corruptions

Sep 07, 2023
Sanyi Zhang, Xiaochun Cao, Rui Wang, Guo-Jun Qi, Jie Zhou

Figure 1 for Exploring the Robustness of Human Parsers Towards Common Corruptions
Figure 2 for Exploring the Robustness of Human Parsers Towards Common Corruptions
Figure 3 for Exploring the Robustness of Human Parsers Towards Common Corruptions
Figure 4 for Exploring the Robustness of Human Parsers Towards Common Corruptions

Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions. Specifically, two types of data augmentations from different views, i.e., image-aware augmentation and model-aware image-to-image transformation, are integrated in a sequential manner for adapting to unforeseen image corruptions. The image-aware augmentation can enrich the high diversity of training images with the help of common image operations. The model-aware augmentation strategy that improves the diversity of input data by considering the model's randomness. The proposed method is model-agnostic, and it can plug and play into arbitrary state-of-the-art human parsing frameworks. The experimental results show that the proposed method demonstrates good universality which can improve the robustness of the human parsing models and even the semantic segmentation models when facing various image common corruptions. Meanwhile, it can still obtain approximate performance on clean data.

* Accepted by IEEE Transactions on Image Processing (TIP) 
Viaarxiv icon

What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You

Sep 04, 2023
Rui Wang, Xing Liu, Yanan Wang, Haiping Huang

Figure 1 for What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You
Figure 2 for What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You
Figure 3 for What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You
Figure 4 for What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You

The recently released artificial intelligence conversational agent, ChatGPT, has gained significant attention in academia and real life. A multitude of early ChatGPT users eagerly explore its capabilities and share their opinions on it via social media. Both user queries and social media posts express public concerns regarding this advanced dialogue system. To mine public concerns about ChatGPT, a novel Self-Supervised neural Topic Model (SSTM), which formalizes topic modeling as a representation learning procedure, is proposed in this paper. Extensive experiments have been conducted on Twitter posts about ChatGPT and queries asked by ChatGPT users. And experimental results demonstrate that the proposed approach could extract higher quality public concerns with improved interpretability and diversity, surpassing the performance of state-of-the-art approaches.

Viaarxiv icon

M2HGCL: Multi-Scale Meta-Path Integrated Heterogeneous Graph Contrastive Learning

Sep 03, 2023
Yuanyuan Guo, Yu Xia, Rui Wang, Rongcheng Duan, Lu Li, Jiangmeng Li

Inspired by the successful application of contrastive learning on graphs, researchers attempt to impose graph contrastive learning approaches on heterogeneous information networks. Orthogonal to homogeneous graphs, the types of nodes and edges in heterogeneous graphs are diverse so that specialized graph contrastive learning methods are required. Most existing methods for heterogeneous graph contrastive learning are implemented by transforming heterogeneous graphs into homogeneous graphs, which may lead to ramifications that the valuable information carried by non-target nodes is undermined thereby exacerbating the performance of contrastive learning models. Additionally, current heterogeneous graph contrastive learning methods are mainly based on initial meta-paths given by the dataset, yet according to our deep-going exploration, we derive empirical conclusions: only initial meta-paths cannot contain sufficiently discriminative information; and various types of meta-paths can effectively promote the performance of heterogeneous graph contrastive learning methods. To this end, we propose a new multi-scale meta-path integrated heterogeneous graph contrastive learning (M2HGCL) model, which discards the conventional heterogeneity-homogeneity transformation and performs the graph contrastive learning in a joint manner. Specifically, we expand the meta-paths and jointly aggregate the direct neighbor information, the initial meta-path neighbor information and the expanded meta-path neighbor information to sufficiently capture discriminative information. A specific positive sampling strategy is further imposed to remedy the intrinsic deficiency of contrastive learning, i.e., the hard negative sample sampling issue. Through extensive experiments on three real-world datasets, we demonstrate that M2HGCL outperforms the current state-of-the-art baseline models.

* Accepted to the conference of ADMA2023 as an Oral presentation 
Viaarxiv icon

FTA: Stealthy and Robust Backdoor Attack with Flexible Trigger on Federated Learning

Aug 31, 2023
Yanqi Qiao, Congwen Chen, Rui Wang, Kaitai Liang

Current backdoor attacks against federated learning (FL) strongly rely on universal triggers or semantic patterns, which can be easily detected and filtered by certain defense mechanisms such as norm clipping, comparing parameter divergences among local updates. In this work, we propose a new stealthy and robust backdoor attack with flexible triggers against FL defenses. To achieve this, we build a generative trigger function that can learn to manipulate the benign samples with an imperceptible flexible trigger pattern and simultaneously make the trigger pattern include the most significant hidden features of the attacker-chosen label. Moreover, our trigger generator can keep learning and adapt across different rounds, allowing it to adjust to changes in the global model. By filling the distinguishable difference (the mapping between the trigger pattern and target label), we make our attack naturally stealthy. Extensive experiments on real-world datasets verify the effectiveness and stealthiness of our attack compared to prior attacks on decentralized learning framework with eight well-studied defenses.

Viaarxiv icon