Alert button
Picture for Bo Jiang

Bo Jiang

Alert button

Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

Sep 20, 2023
Shaocong Xu, Pengfei Li, Xinyu Liu, Qianpu Sun, Yang Li, Shihui Guo, Zhen Wang, Bo Jiang, Rui Wang, Kehua Sheng, Bo Zhang, Hao Zhao

Figure 1 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 2 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 3 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection
Figure 4 for Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

LiDAR-based semantic scene understanding is an important module in the modern autonomous driving perception stack. However, identifying Out-Of-Distribution (OOD) points in a LiDAR point cloud is challenging as point clouds lack semantically rich features when compared with RGB images. We revisit this problem from the perspective of selective classification, which introduces a selective function into the standard closed-set classification setup. Our solution is built upon the basic idea of abstaining from choosing any known categories but learns a point-wise abstaining penalty with a marginbased loss. Synthesizing outliers to approximate unlimited OOD samples is also critical to this idea, so we propose a strong synthesis pipeline that generates outliers originated from various factors: unrealistic object categories, sampling patterns and sizes. We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance. We benchmark our method on SemanticKITTI and nuScenes and achieve state-of-the-art results. Risk-coverage analysis further reveals intrinsic properties of different methods. Codes and models will be publicly available.

* codes is available at https://github.com/Daniellli/PAD.git 
Viaarxiv icon

Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks

Sep 11, 2023
Xilai Fan, Ya-Feng Liu, Bo Jiang

Figure 1 for Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks
Figure 2 for Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks

In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We first establish the equivalence between the considered problem and its semidefinite relaxation (SDR). Then we further derive the partial Lagrangian dual of the SDR problem and show that the objective function of the obtained dual problem is differentiable. Based on the differentiability, we propose two efficient projected gradient ascent algorithms for solving the dual problem, which are projected exact gradient ascent (PEGA) and projected inexact gradient ascent (PIGA). While PEGA is guaranteed to find the global solution of the dual problem (and hence the global solution of the original problem), PIGA is more computationally efficient due to the lower complexity in inexactly computing the gradient. Global optimality and high efficiency of the proposed algorithms are demonstrated via numerical experiments.

* 5 pages, 2 figures, submitted for possible publication 
Viaarxiv icon

A study on the impact of pre-trained model on Just-In-Time defect prediction

Sep 05, 2023
Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W. K. Chan, Bo Jiang

Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

Viaarxiv icon

Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves

Aug 15, 2023
Yue Xiang, Bo Jiang, Haifeng Dai

Figure 1 for Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves
Figure 2 for Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves
Figure 3 for Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves

The degradation process of lithium-ion batteries is intricately linked to their entire lifecycle as power sources and energy storage devices, encompassing aspects such as performance delivery and cycling utilization. Consequently, the accurate and expedient estimation or prediction of the aging state of lithium-ion batteries has garnered extensive attention. Nonetheless, prevailing research predominantly concentrates on either aging estimation or prediction, neglecting the dynamic fusion of both facets. This paper proposes a hybrid model for capacity aging estimation and prediction based on deep learning, wherein salient features highly pertinent to aging are extracted from charge and discharge relaxation processes. By amalgamating historical capacity decay data, the model dynamically furnishes estimations of the present capacity and forecasts of future capacity for lithium-ion batteries. Our approach is validated against a novel dataset involving charge and discharge cycles at varying rates. Specifically, under a charging condition of 0.25C, a mean absolute percentage error (MAPE) of 0.29% is achieved. This outcome underscores the model's adeptness in harnessing relaxation processes commonly encountered in the real world and synergizing with historical capacity records within battery management systems (BMS), thereby affording estimations and prognostications of capacity decline with heightened precision.

* 3 figures, 1 table, 9 pages 
Viaarxiv icon

MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction

Aug 10, 2023
Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

Figure 1 for MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction
Figure 2 for MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction
Figure 3 for MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction
Figure 4 for MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction

High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision. The proposed method well copes with various map elements with arbitrary shapes. It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets. Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes. Code and more demos are available at \url{https://github.com/hustvl/MapTR} for facilitating further studies and applications.

* Code available at https://github.com/hustvl/MapTR . arXiv admin note: substantial text overlap with arXiv:2208.14437 
Viaarxiv icon

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition

Aug 08, 2023
Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian

Figure 1 for SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Figure 2 for SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Figure 3 for SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Figure 4 for SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition

Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.

* In Peer Review 
Viaarxiv icon

Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Jul 03, 2023
Bo Jiang, Tianchi Zhao, Ming Li

Figure 1 for Thompson Sampling under Bernoulli Rewards with Local Differential Privacy
Figure 2 for Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $\epsilon$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.

* Accepted by ICML 22 workshop 
Viaarxiv icon

Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition

Jun 08, 2023
Bo Jiang, Chengguo Yuan, Xiao Wang, Zhimin Bao, Lin Zhu, Bin Luo

Figure 1 for Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition
Figure 2 for Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition
Figure 3 for Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition
Figure 4 for Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition

Considering the balance of performance and efficiency, sampled point and voxel methods are usually employed to down-sample dense events into sparse ones. After that, one popular way is to leverage a graph model which treats the sparse points/voxels as nodes and adopts graph neural networks (GNNs) to learn the representation for event data. Although good performance can be obtained, however, their results are still limited mainly due to two issues. (1) Existing event GNNs generally adopt the additional max (or mean) pooling layer to summarize all node embeddings into a single graph-level representation for the whole event data representation. However, this approach fails to capture the importance of graph nodes and also fails to be fully aware of the node representations. (2) Existing methods generally employ either a sparse point or voxel graph representation model which thus lacks consideration of the complementary between these two types of representation models. To address these issues, in this paper, we propose a novel dual point-voxel absorbing graph representation learning for event stream data representation. To be specific, given the input event stream, we first transform it into the sparse event cloud and voxel grids and build dual absorbing graph models for them respectively. Then, we design a novel absorbing graph convolutional network (AGCN) for our dual absorbing graph representation and learning. The key aspect of the proposed AGCN is its ability to effectively capture the importance of nodes and thus be fully aware of node representations in summarizing all node representations through the introduced absorbing nodes. Finally, the event representations of dual learning branches are concatenated together to extract the complementary information of two cues. The output is then fed into a linear layer for event data classification.

Viaarxiv icon

AMatFormer: Efficient Feature Matching via Anchor Matching Transformer

May 30, 2023
Bo Jiang, Shuxian Luo, Xiao Wang, Chuanfu Li, Jin Tang

Figure 1 for AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
Figure 2 for AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
Figure 3 for AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
Figure 4 for AMatFormer: Efficient Feature Matching via Anchor Matching Transformer

Learning based feature matching methods have been commonly studied in recent years. The core issue for learning feature matching is to how to learn (1) discriminative representations for feature points (or regions) within each intra-image and (2) consensus representations for feature points across inter-images. Recently, self- and cross-attention models have been exploited to address this issue. However, in many scenes, features are coming with large-scale, redundant and outliers contaminated. Previous self-/cross-attention models generally conduct message passing on all primal features which thus lead to redundant learning and high computational cost. To mitigate limitations, inspired by recent seed matching methods, in this paper, we propose a novel efficient Anchor Matching Transformer (AMatFormer) for the feature matching problem. AMatFormer has two main aspects: First, it mainly conducts self-/cross-attention on some anchor features and leverages these anchor features as message bottleneck to learn the representations for all primal features. Thus, it can be implemented efficiently and compactly. Second, AMatFormer adopts a shared FFN module to further embed the features of two images into the common domain and thus learn the consensus feature representations for the matching problem. Experiments on several benchmarks demonstrate the effectiveness and efficiency of the proposed AMatFormer matching approach.

* Accepted by IEEE Transactions on Multimedia (TMM) 2023 
Viaarxiv icon

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

May 18, 2023
Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang, Chenghu Zhou

Figure 1 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 2 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 3 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift
Figure 4 for Prediction with Incomplete Data under Agnostic Mask Distribution Shift

Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i.e., mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.

Viaarxiv icon