Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xia Li

LIDILEM

Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

Feb 04, 2024

Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

Abstract:Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on two challenging datasets: Human3.6M and MPI-INF-3DHP. Notably, our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M. Our source code will be open-sourced.

Via

Access Paper or Ask Questions

ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Jan 16, 2024

Zhongbin Fang, Xia Li, Xiangtai Li, Shen Zhao, Mengyuan Liu

Figure 1 for ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Figure 2 for ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Figure 3 for ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Figure 4 for ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Abstract:Recently, 3D point cloud classification has made significant progress with the help of many datasets. However, these datasets do not reflect the incomplete nature of real-world point clouds caused by occlusion, which limits the practical application of current methods. To bridge this gap, we propose ModelNet-O, a large-scale synthetic dataset of 123,041 samples that emulate real-world point clouds with self-occlusion caused by scanning from monocular cameras. ModelNet-O is 10 times larger than existing datasets and offers more challenging cases to evaluate the robustness of existing methods. Our observation on ModelNet-O reveals that well-designed sparse structures can preserve structural information of point clouds under occlusion, motivating us to propose a robust point cloud processing method that leverages a critical point sampling (CPS) strategy in a multi-level manner. We term our method PointMLS. Through extensive experiments, we demonstrate that our PointMLS achieves state-of-the-art results on ModelNet-O and competitive results on regular datasets, and it is robust and effective. More experiments also demonstrate the robustness and effectiveness of PointMLS.

* Project page: https://github.com/fanglaosi/PointMLS

Via

Access Paper or Ask Questions

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Jan 07, 2024

Rongqin Liang, Yuanman Li, Jiantao Zhou, Xia Li

Abstract:Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Dec 11, 2023

Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yu Yao, Yali Shen

Figure 1 for SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Figure 2 for SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Figure 3 for SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Figure 4 for SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

Abstract:Radiation therapy serves as an effective and standard method for cancer treatment. Excellent radiation therapy plans always rely on high-quality dose distribution maps obtained through repeated trial and error by experienced experts. However, due to individual differences and complex clinical situations, even seasoned expert teams may need help to achieve the best treatment plan every time quickly. Many automatic dose distribution prediction methods have been proposed recently to accelerate the radiation therapy planning process and have achieved good results. However, these results suffer from over-smoothing issues, with the obtained dose distribution maps needing more high-frequency details, limiting their clinical application. To address these limitations, we propose a dose prediction diffusion model based on SwinTransformer and a projector, SP-DiffDose. To capture the direct correlation between anatomical structure and dose distribution maps, SP-DiffDose uses a structural encoder to extract features from anatomical images, then employs a conditional diffusion process to blend noise and anatomical images at multiple scales and gradually map them to dose distribution maps. To enhance the dose prediction distribution for organs at risk, SP-DiffDose utilizes SwinTransformer in the deeper layers of the network to capture features at different scales in the image. To learn good representations from the fused features, SP-DiffDose passes the fused features through a designed projector, improving dose prediction accuracy. Finally, we evaluate SP-DiffDose on an internal dataset. The results show that SP-DiffDose outperforms existing methods on multiple evaluation metrics, demonstrating the superiority and generalizability of our method.

Via

Access Paper or Ask Questions

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Dec 06, 2023

Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, Mengyuan Liu

Figure 1 for Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Figure 2 for Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Figure 3 for Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Figure 4 for Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Abstract:In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, Skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the inter-frame and cross-task pose similarity that makes it outstandingly hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks.

* Project page: https://github.com/fanglaosi/Skeleton-in-Context

Via

Access Paper or Ask Questions

FedLPA: Personalized One-shot Federated Learning with Layer-Wise Posterior Aggregation

Oct 03, 2023

Xiang Liu, Liangxi Liu, Feiyang Ye, Yunheng Shen, Xia Li, Linshan Jiang, Jialin Li

Abstract:Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing the overhead of communication, one-shot federated learning (i.e., limiting client-server communication into a single round) has gained popularity among researchers. However, the one-shot aggregation performances are sensitively affected by the non-identical training data distribution, which exhibits high statistical heterogeneity in some real-world scenarios. To address this issue, we propose a novel one-shot aggregation method with Layer-wise Posterior Aggregation, named FedLPA. FedLPA aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any confidential local information, e.g., label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.

* 26pages, 6 figures

Via

Access Paper or Ask Questions

Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Sep 25, 2023

Muxin Liao, Shishun Tian, Yuhang Zhang, Guoguang Hua, Wenbin Zou, Xia Li

Figure 1 for Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Figure 2 for Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Figure 3 for Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Figure 4 for Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Abstract:Prototypical contrastive learning (PCL) has been widely used to learn class-wise domain-invariant features recently. These methods are based on the assumption that the prototypes, which are represented as the central value of the same class in a certain domain, are domain-invariant. Since the prototypes of different domains have discrepancies as well, the class-wise domain-invariant features learned from the source domain by PCL need to be aligned with the prototypes of other domains simultaneously. However, the prototypes of the same class in different domains may be different while the prototypes of different classes may be similar, which may affect the learning of class-wise domain-invariant features. Based on these observations, a calibration-based dual prototypical contrastive learning (CDPCL) approach is proposed to reduce the domain discrepancy between the learned class-wise features and the prototypes of different domains for domain generalization semantic segmentation. It contains an uncertainty-guided PCL (UPCL) and a hard-weighted PCL (HPCL). Since the domain discrepancies of the prototypes of different classes may be different, we propose an uncertainty probability matrix to represent the domain discrepancies of the prototypes of all the classes. The UPCL estimates the uncertainty probability matrix to calibrate the weights of the prototypes during the PCL. Moreover, considering that the prototypes of different classes may be similar in some circumstances, which means these prototypes are hard-aligned, the HPCL is proposed to generate a hard-weighted matrix to calibrate the weights of the hard-aligned prototypes during the PCL. Extensive experiments demonstrate that our approach achieves superior performance over current approaches on domain generalization semantic segmentation tasks.

* Accepted by ACM MM'23

Via

Access Paper or Ask Questions

Multi-scale Target-Aware Framework for Constrained Image Splicing Detection and Localization

Aug 21, 2023

Yuxuan Tan, Yuanman Li, Limin Zeng, Jiaxiong Ye, Wei wang, Xia Li

Abstract:Constrained image splicing detection and localization (CISDL) is a fundamental task of multimedia forensics, which detects splicing operation between two suspected images and localizes the spliced region on both images. Recent works regard it as a deep matching problem and have made significant progress. However, existing frameworks typically perform feature extraction and correlation matching as separate processes, which may hinder the model's ability to learn discriminative features for matching and can be susceptible to interference from ambiguous background pixels. In this work, we propose a multi-scale target-aware framework to couple feature extraction and correlation matching in a unified pipeline. In contrast to previous methods, we design a target-aware attention mechanism that jointly learns features and performs correlation matching between the probe and donor images. Our approach can effectively promote the collaborative learning of related patches, and perform mutual promotion of feature learning and correlation matching. Additionally, in order to handle scale transformations, we introduce a multi-scale projection method, which can be readily integrated into our target-aware framework that enables the attention process to be conducted between tokens containing information of varying scales. Our experiments demonstrate that our model, which uses a unified pipeline, outperforms state-of-the-art methods on several benchmark datasets and is robust against scale transformations.

* accepted by ACMMM2023

Via

Access Paper or Ask Questions

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video

Aug 20, 2023

Yingxuan You, Hong Liu, Ti Wang, Wenhao Li, Runwei Ding, Xia Li

Abstract:Despite significant progress in single image-based 3D human mesh recovery, accurately and smoothly recovering 3D human motion from a video remains challenging. Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.

* Accepted by ICCV 2023. Project page: https://kasvii.github.io/PMCE

Via

Access Paper or Ask Questions

Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch

Aug 08, 2023

Yingjie He, Yuanman Li, Changsheng Chen, Xia Li

Abstract:The recently developed deep algorithms achieve promising progress in the field of image copy-move forgery detection (CMFD). However, they have limited generalizability in some practical scenarios, where the copy-move objects may not appear in the training images or cloned regions are from the background. To address the above issues, in this work, we propose a novel end-to-end CMFD framework by integrating merits from both conventional and deep methods. Specifically, we design a deep cross-scale patchmatch method tailored for CMFD to localize copy-move regions. In contrast to existing deep models, our scheme aims to seek explicit and reliable point-to-point matching between source and target regions using features extracted from high-resolution scales. Further, we develop a manipulation region location branch for source/target separation. The proposed CMFD framework is completely differentiable and can be trained in an end-to-end manner. Extensive experimental results demonstrate the high generalizability of our method to different copy-move contents, and the proposed scheme achieves significantly better performance than existing approaches.

* 6 pages, 4 figures, accepted by ICME2023

Via

Access Paper or Ask Questions