Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiangyang Xue

Fudan University

Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

May 06, 2023

Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue

Figure 1 for Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

Figure 2 for Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

Figure 3 for Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

Figure 4 for Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

Abstract:Federated learning is a privacy-preserving collaborative learning approach. Recently, some studies have proposed the semi-supervised federated learning setting to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients. However, existing methods still face challenges such as high communication costs, training pressure on the client devices, and distribution differences among the server and the clients. In this paper, we introduce the powerful pre-trained diffusion models into federated learning and propose FedDISC, a Federated Diffusion Inspired Semi-supervised Co-training method, to address these challenges. Specifically, we first extract prototypes from the labeled data on the server and send them to the clients. The clients then use these prototypes to predict pseudo-labels of the local data, and compute the cluster centroids and domain-specific features to represent their personalized distributions. After adding noise, the clients send these features and their corresponding pseudo-labels back to the server, which uses a pre-trained diffusion model to conditionally generate pseudo-samples complying with the client distributions and train an aggregated model on them. Our method does not require local training and only involves forward inference on the clients. Our extensive experiments on DomainNet, Openimage, and NICO++ demonstrate that the proposed FedDISC method effectively addresses the one-shot semi-supervised problem on Non-IID clients and outperforms the compared SOTA methods. We also demonstrate through visualization that it is of neglectable possibility for FedDISC to leak privacy-sensitive information of the clients.

Via

Access Paper or Ask Questions

Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Apr 28, 2023

Shoumeng Qiu, Feng Jiang, Haiqiang Zhang, Xiangyang Xue, Jian Pu

Figure 1 for Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Figure 2 for Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Figure 3 for Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Figure 4 for Multi-to-Single Knowledge Distillation for Point Cloud Semantic Segmentation

Abstract:3D point cloud semantic segmentation is one of the fundamental tasks for environmental understanding. Although significant progress has been made in recent years, the performance of classes with few examples or few points is still far from satisfactory. In this paper, we propose a novel multi-to-single knowledge distillation framework for the 3D point cloud semantic segmentation task to boost the performance of those hard classes. Instead of fusing all the points of multi-scans directly, only the instances that belong to the previously defined hard classes are fused. To effectively and sufficiently distill valuable knowledge from multi-scans, we leverage a multilevel distillation framework, i.e., feature representation distillation, logit distillation, and affinity distillation. We further develop a novel instance-aware affinity distillation algorithm for capturing high-level structural knowledge to enhance the distillation efficacy for hard classes. Finally, we conduct experiments on the SemanticKITTI dataset, and the results on both the validation and test sets demonstrate that our method yields substantial improvements compared with the baseline method. The code is available at \Url{https://github.com/skyshoumeng/M2SKD}.

* ICRA 2023

Via

Access Paper or Ask Questions

Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions

Apr 24, 2023

Yun He, Danhang Tang, Yinda Zhang, Xiangyang Xue, Yanwei Fu

Abstract:Most existing point cloud upsampling methods have roughly three steps: feature extraction, feature expansion and 3D coordinate prediction. However,they usually suffer from two critical issues: (1)fixed upsampling rate after one-time training, since the feature expansion unit is customized for each upsampling rate; (2)outliers or shrinkage artifact caused by the difficulty of precisely predicting 3D coordinates or residuals of upsampled points. To adress them, we propose a new framework for accurate point cloud upsampling that supports arbitrary upsampling rates. Our method first interpolates the low-res point cloud according to a given upsampling rate. And then refine the positions of the interpolated points with an iterative optimization process, guided by a trained model estimating the difference between the current point cloud and the high-res target. Extensive quantitative and qualitative results on benchmarks and downstream tasks demonstrate that our method achieves the state-of-the-art accuracy and efficiency.

* Accepted by CVPR 2023. Code is avaliable at https://github.com/yunhe20/Grad-PU

Via

Access Paper or Ask Questions

GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Mar 29, 2023

Xinxin Hu, Haotian Chen, Junjie Zhang, Hongchang Chen, Shuxin Liu, Xing Li, Yahui Wang, Xiangyang Xue

Figure 1 for GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Figure 2 for GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Figure 3 for GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Figure 4 for GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection

Abstract:Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a drastically increase in telecom fraud, which significantly dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This is a new and challenging problem, but little previous work has been noticed. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs. The GAT-COBO code and datasets are available at https://github.com/xxhu94/GAT-COBO.

Via

Access Paper or Ask Questions

Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Mar 28, 2023

Xinxin Hu, Haotian Chen, Hongchang Chen, Shuxin Liu, Xing Li, Shibo Zhang, Yahui Wang, Xiangyang Xue

Figure 1 for Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Figure 2 for Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Figure 3 for Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Figure 4 for Cost Sensitive GNN-based Imbalanced Learning for Mobile Social Network Fraud Detection

Abstract:With the rapid development of mobile networks, the people's social contacts have been considerably facilitated. However, the rise of mobile social network fraud upon those networks, has caused a great deal of distress, in case of depleting personal and social wealth, then potentially doing significant economic harm. To detect fraudulent users, call detail record (CDR) data, which portrays the social behavior of users in mobile networks, has been widely utilized. But the imbalance problem in the aforementioned data, which could severely hinder the effectiveness of fraud detectors based on graph neural networks(GNN), has hardly been addressed in previous work. In this paper, we are going to present a novel Cost-Sensitive Graph Neural Network (CSGNN) by creatively combining cost-sensitive learning and graph neural networks. We conduct extensive experiments on two open-source realworld mobile network fraud datasets. The results show that CSGNN can effectively solve the graph imbalance problem and then achieve better detection performance than the state-of-the-art algorithms. We believe that our research can be applied to solve the graph imbalance problems in other fields. The CSGNN code and datasets are publicly available at https://github.com/xxhu94/CSGNN.

Via

Access Paper or Ask Questions

Semantic Neural Decoding via Cross-Modal Generation

Mar 26, 2023

Xuelin Qian, Yikai Wang, Yanwei Fu, Xiangyang Xue, Jianfeng Feng

Abstract:Semantic neural decoding aims to elucidate the cognitive processes of the human brain by reconstructing observed images from brain recordings. Although recent works have utilized deep generative models to generate images conditioned on fMRI signals, achieving high-quality generation with consistent semantics has proven to be a formidable challenge. To address this issue, we propose an end-to-end framework, SemanSig, which directly encodes fMRI signals and extracts semantic information. SemanSig leverages a deep generative model to decode the semantic information into high-quality images. To enhance the effectiveness of our framework, we use the ImageNet class prototype space as the internal representation space of fMRI signals, thereby reducing signal redundancy and learning difficulty. Consequently, this forms a semantic-rich and visually-friendly internal representation for generative models to decode. Notably, SemanSig does not require pre-training on a large fMRI dataset, and performs remarkably well when trained from scratch, even when the fMRI signal is limited. Our experimental results validate the effectiveness of SemanSig in achieving high-quality image generation with consistent semantics.

Via

Access Paper or Ask Questions

Learning Versatile 3D Shape Generation with Improved AR Models

Mar 26, 2023

Simian Luo, Xuelin Qian, Yanwei Fu, Yinda Zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Xiangyang Xue

Abstract:Auto-Regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. While this approach has been extended to the 3D domain for powerful shape generation, it still has two limitations: expensive computations on volumetric grids and ambiguous auto-regressive order along grid dimensions. To overcome these limitations, we propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids. Our approach not only reduces computational costs but also preserves essential geometric details by learning the joint distribution in a more tractable order. Moreover, thanks to the simplicity of our model architecture, we can naturally extend it from unconditional to conditional generation by concatenating various conditioning inputs, such as point clouds, categories, images, and texts. Extensive experiments demonstrate that ImAM can synthesize diverse and faithful shapes of multiple categories, achieving state-of-the-art performance.

* 26 pages, 26 figures

Via

Access Paper or Ask Questions

Weakly-Supervised Text Instance Segmentation

Mar 23, 2023

Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue

Abstract:Text segmentation is a challenging vision task with many downstream applications. Current text segmentation methods require pixel-level annotations, which are expensive in the cost of human labor and limited in application scenarios. In this paper, we take the first attempt to perform weakly-supervised text instance segmentation by bridging text recognition and text segmentation. The insight is that text recognition methods provide precise attention position of each text instance, and the attention location can feed to both a text adaptive refinement head (TAR) and a text segmentation head. Specifically, the proposed TAR generates pseudo labels by performing two-stage iterative refinement operations on the attention location to fit the accurate boundaries of the corresponding text instance. Meanwhile, the text segmentation head takes the rough attention location to predict segmentation masks which are supervised by the aforementioned pseudo labels. In addition, we design a mask-augmented contrastive learning by treating our segmentation result as an augmented version of the input text image, thus improving the visual representation and further enhancing the performance of both recognition and segmentation. The experimental results demonstrate that the proposed method significantly outperforms weakly-supervised instance segmentation methods on ICDAR13-FST (18.95$\%$ improvement) and TextSeg (17.80$\%$ improvement) benchmarks.

Via

Access Paper or Ask Questions

Rethinking the Multi-view Stereo from the Perspective of Rendering-based Augmentation

Mar 11, 2023

Chenjie Cao, Xinlin Ren, Xiangyang Xue, Yanwei Fu

Abstract:GigaMVS presents several challenges to existing Multi-View Stereo (MVS) algorithms for its large scale, complex occlusions, and gigapixel images. To address these problems, we first apply one of the state-of-the-art learning-based MVS methods, --MVSFormer, to overcome intractable scenarios such as textureless and reflections regions suffered by traditional PatchMatch methods, but it fails in a few large scenes' reconstructions. Moreover, traditional PatchMatch algorithms such as ACMMP, OpenMVS, and RealityCapture are leveraged to further improve the completeness in large scenes. Furthermore, to unify both advantages of deep learning methods and the traditional PatchMatch, we propose to render depth and color images to further fine-tune the MVSFormer model. Notably, we find that the MVS method could produce much better predictions through rendered images due to the coincident illumination, which we believe is significant for the MVS community. Thus, MVSFormer is capable of generalizing to large-scale scenes and complementarily solves the textureless reconstruction problem. Finally, we have assembled all point clouds mentioned above \textit{except ones from RealityCapture} and ranked Top-1 on the competitive GigaReconstruction.

* This is a technical report of team Ewrfcas (Fudan University) in the GigaMVS reconstruction competition. Our method achieved the 1st performance in the 2022 reconstruction benchmark

Via

Access Paper or Ask Questions

SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Feb 25, 2023

Jiawei Hou, Qi Chen, Yurong Cheng, Guang Chen, Xiangyang Xue, Taiping Zeng, Jian Pu

Figure 1 for SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Figure 2 for SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Figure 3 for SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Figure 4 for SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Abstract:Automatic underground parking has attracted considerable attention as the scope of autonomous driving expands. The auto-vehicle is supposed to obtain the environmental information, track its location, and build a reliable map of the scenario. Mainstream solutions consist of well-trained neural networks and simultaneous localization and mapping (SLAM) methods, which need numerous carefully labeled images and multiple sensor estimations. However, there is a lack of underground parking scenario datasets with multiple sensors and well-labeled images that support both SLAM tasks and perception tasks, such as semantic segmentation and parking slot detection. In this paper, we present SUPS, a simulated dataset for underground automatic parking, which supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images according to timestamps. We intend to cover the defect of existing datasets with the variability of environments and the diversity and accessibility of sensors in the virtual scene. Specifically, the dataset records frames from four surrounding fisheye cameras, two forward pinhole cameras, a depth camera, and data from LiDAR, inertial measurement unit (IMU), GNSS. Pixel-level semantic labels are provided for objects, especially ground signs such as arrows, parking lines, lanes, and speed bumps. Perception, 3D reconstruction, depth estimation, and SLAM, and other relative tasks are supported by our dataset. We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset. Finally, we open source our virtual 3D scene built based on Unity Engine and release our dataset at https://github.com/jarvishou829/SUPS.

* Accepted for publication at the 25th IEEE Intelligent Transportation Systems Conference (ITSC 2022)

Via

Access Paper or Ask Questions