Alert button
Picture for Pengfei Li

Pengfei Li

Alert button

Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

Sep 20, 2023
Shaocong Xu, Pengfei Li, Xinyu Liu, Qianpu Sun, Yang Li, Shihui Guo, Zhen Wang, Bo Jiang, Rui Wang, Kehua Sheng, Bo Zhang, Hao Zhao

LiDAR-based semantic scene understanding is an important module in the modern autonomous driving perception stack. However, identifying Out-Of-Distribution (OOD) points in a LiDAR point cloud is challenging as point clouds lack semantically rich features when compared with RGB images. We revisit this problem from the perspective of selective classification, which introduces a selective function into the standard closed-set classification setup. Our solution is built upon the basic idea of abstaining from choosing any known categories but learns a point-wise abstaining penalty with a marginbased loss. Synthesizing outliers to approximate unlimited OOD samples is also critical to this idea, so we propose a strong synthesis pipeline that generates outliers originated from various factors: unrealistic object categories, sampling patterns and sizes. We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance. We benchmark our method on SemanticKITTI and nuScenes and achieve state-of-the-art results. Risk-coverage analysis further reveals intrinsic properties of different methods. Codes and models will be publicly available.

* codes is available at https://github.com/Daniellli/PAD.git 
Viaarxiv icon

3D Implicit Transporter for Temporally Consistent Keypoint Discovery

Sep 10, 2023
Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang, Jian Zhao

Figure 1 for 3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Figure 2 for 3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Figure 3 for 3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Figure 4 for 3D Implicit Transporter for Temporally Consistent Keypoint Discovery

Keypoint-based representation has proven advantageous in various visual and robotic tasks. However, the existing 2D and 3D methods for detecting keypoints mainly rely on geometric consistency to achieve spatial alignment, neglecting temporal consistency. To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information. However, the direct application of the Transporter to 3D point clouds is infeasible due to their structural differences from 2D images. Thus, we propose the first 3D version of the Transporter, which leverages hybrid 3D representation, cross attention, and implicit reconstruction. We apply this new learning system on 3D articulated objects and nonrigid animals (humans and rodents) and show that learned keypoints are spatio-temporally consistent. Additionally, we propose a closed-loop control strategy that utilizes the learned keypoints for 3D object manipulation and demonstrate its superior performance. Codes are available at https://github.com/zhongcl-thu/3D-Implicit-Transporter.

* ICCV2023 oral paper 
Viaarxiv icon

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

Jul 11, 2023
Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong

Figure 1 for Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering
Figure 2 for Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering
Figure 3 for Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering
Figure 4 for Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information. However, due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance. In this paper, we present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text using medical image caption datasets, by leveraging both unimodal and multimodal contrastive losses, along with masked language modeling and image text matching as pretraining objectives. The pre-trained model is then transferred to downstream medical VQA tasks. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets with significant accuracy improvements of 2.2%, 14.7%, and 1.7% respectively. Besides, we conduct a comprehensive analysis to validate the effectiveness of different components of the approach and study different pre-training settings. Our codes and models are available at https://github.com/pengfeiliHEU/MUMC.

* accepted by MICCAI2023 
Viaarxiv icon

Towards Environmentally Equitable AI via Geographical Load Balancing

Jun 20, 2023
Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

Figure 1 for Towards Environmentally Equitable AI via Geographical Load Balancing
Figure 2 for Towards Environmentally Equitable AI via Geographical Load Balancing
Figure 3 for Towards Environmentally Equitable AI via Geographical Load Balancing

Fueled by the soaring popularity of large language and foundation models, the accelerated growth of artificial intelligence (AI) models' enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity -- the fact that AI's environmental footprint can be disproportionately higher in certain regions than in others -- has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI's environmental inequity by balancing its regional negative environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (GLB) to explicitly address AI's environmental impacts on the most disadvantaged regions. We run trace-based simulations by considering a set of 10 geographically-distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while our proposed equity-aware GLB can significantly reduce the regional disparity in terms of carbon and water footprints.

* Source code: https://github.com/Ren-Research/Environmentally-Equitable-AI 
Viaarxiv icon

Learning-Augmented Decentralized Online Convex Optimization in Networks

Jun 16, 2023
Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

This paper studies decentralized online convex optimization in a networked multi-agent system and proposes a novel algorithm, Learning-Augmented Decentralized Online optimization (LADO), for individual agents to select actions only based on local online information. LADO leverages a baseline policy to safeguard online actions for worst-case robustness guarantees, while staying close to the machine learning (ML) policy for average performance improvement. In stark contrast with the existing learning-augmented online algorithms that focus on centralized settings, LADO achieves strong robustness guarantees in a decentralized setting. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement.

Viaarxiv icon

Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees

May 31, 2023
Pengfei Li, Jianyi Yang, Shaolei Ren

Figure 1 for Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Figure 2 for Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Figure 3 for Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Figure 4 for Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees

Many problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but it lacks robustness and can perform arbitrarily poorly. In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judicious condition to hedge against future uncertainties, decides whether to follow the expert's decision or the RL decision for each online item. We prove that for any $\rho\in[0,1]$, LOMAR is $\rho$-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines. Our code is available at: https://github.com/Ren-Research/LOMAR

* Accepted by ICML 2023 
Viaarxiv icon

Robustified Learning for Online Optimization with Memory Costs

May 01, 2023
Pengfei Li, Jianyi Yang, Shaolei Ren

Figure 1 for Robustified Learning for Online Optimization with Memory Costs
Figure 2 for Robustified Learning for Online Optimization with Memory Costs
Figure 3 for Robustified Learning for Online Optimization with Memory Costs
Figure 4 for Robustified Learning for Online Optimization with Memory Costs

Online optimization with memory costs has many real-world applications, where sequential actions are made without knowing the future input. Nonetheless, the memory cost couples the actions over time, adding substantial challenges. Conventionally, this problem has been approached by various expert-designed online algorithms with the goal of achieving bounded worst-case competitive ratios, but the resulting average performance is often unsatisfactory. On the other hand, emerging machine learning (ML) based optimizers can improve the average performance, but suffer from the lack of worst-case performance robustness. In this paper, we propose a novel expert-robustified learning (ERL) approach, achieving {both} good average performance and robustness. More concretely, for robustness, ERL introduces a novel projection operator that robustifies ML actions by utilizing an expert online algorithm; for average performance, ERL trains the ML optimizer based on a recurrent architecture by explicitly considering downstream expert robustification. We prove that, for any $\lambda\geq1$, ERL can achieve $\lambda$-competitive against the expert algorithm and $\lambda\cdot C$-competitive against the optimal offline algorithm (where $C$ is the expert's competitive ratio). Additionally, we extend our analysis to a novel setting of multi-step memory costs. Finally, our analysis is supported by empirical experiments for an energy scheduling application.

* This paper has been accepted by and will be presented at the INFOCOM 2023 
Viaarxiv icon

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

Apr 25, 2023
Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, Guyue Zhou

Figure 1 for DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
Figure 2 for DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
Figure 3 for DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
Figure 4 for DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently. While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals generated in the first stage and resulting in spatially sparse training signals. In contrast, we propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals. A fundamental issue of this new design is the quantization error caused by point-to-voxel discretization, which inevitably leads to misalignment between two transformed views in the voxel domain. To this end, we derive and implement closed-form rules that compensate this misalignment onthe-fly. Our results are significant, e.g., promoting ScanNet mAP@0.5 from 35.2% to 48.5% using 20% annotation. Codes and data will be publicly available.

* Code: https://github.com/AIR-DISCOVER/DQS3D 
Viaarxiv icon

Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models

Apr 06, 2023
Pengfei Li, Jianyi Yang, Mohammad A. Islam, Shaolei Ren

Figure 1 for Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Figure 2 for Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Figure 3 for Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Figure 4 for Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models

The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3 and GPT-4, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft's state-of-the-art U.S. data centers can directly consume 700,000 liters of clean freshwater (enough for producing 370 BMW cars or 320 Tesla electric vehicles) and the water consumption would have been tripled if training were done in Microsoft's Asian data centers, but such information has been kept as a secret. This is extremely concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also should, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate fine-grained water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models' runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI.

* Source codes available at: https://github.com/Ren-Research/Making-AI-Less-Thirsty 
Viaarxiv icon

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Mar 20, 2023
Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Huang, Yadao Wang, Weichao Wang, Pengfei Li, Xiaoda Zhang, Alexander Podolskiy, Grigory Arshinov, Andrey Bout, Irina Piontkovskaya, Jiansheng Wei, Xin Jiang, Teng Su, Qun Liu, Jun Yao

Figure 1 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 2 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 3 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 4 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-{\Sigma}. With parameter inherent from PanGu-{\alpha}, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-{\Sigma} provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.

Viaarxiv icon