Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuequan Lu

PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

Nov 01, 2024

Jincen Jiang, Qianyu Zhou, Yuhang Li, Xinkui Zhao, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang, Xuequan Lu

Figure 1 for PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

Figure 2 for PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

Figure 3 for PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

Figure 4 for PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

Abstract:In this paper, we present PCoTTA, an innovative, pioneering framework for Continual Test-Time Adaptation (CoTTA) in multi-task point cloud understanding, enhancing the model's transferability towards the continually changing target domain. We introduce a multi-task setting for PCoTTA, which is practical and realistic, handling multiple tasks within one unified model during the continual adaptation. Our PCoTTA involves three key components: automatic prototype mixture (APM), Gaussian Splatted feature shifting (GSFS), and contrastive prototype repulsion (CPR). Firstly, APM is designed to automatically mix the source prototypes with the learnable prototypes with a similarity balancing factor, avoiding catastrophic forgetting. Then, GSFS dynamically shifts the testing sample toward the source domain, mitigating error accumulation in an online manner. In addition, CPR is proposed to pull the nearest learnable prototype close to the testing feature and push it away from other prototypes, making each prototype distinguishable during the adaptation. Experimental comparisons lead to a new benchmark, demonstrating PCoTTA's superiority in boosting the model's transferability towards the continually changing target domain.

* Accepted to NeurIPS 2024

Via

Access Paper or Ask Questions

Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Oct 17, 2024

Fengqi Liu, Hexiang Wang, Jingyu Gong, Ran Yi, Qianyu Zhou, Xuequan Lu, Jiangbo Lu, Lizhuang Ma

Figure 1 for Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Figure 2 for Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Figure 3 for Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Figure 4 for Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

Abstract:Speech-driven gesture generation aims at synthesizing a gesture sequence synchronized with the input speech signal. Previous methods leverage neural networks to directly map a compact audio representation to the gesture sequence, ignoring the semantic association of different modalities and failing to deal with salient gestures. In this paper, we propose a novel speech-driven gesture generation method by emphasizing the semantic consistency of salient posture. Specifically, we first learn a joint manifold space for the individual representation of audio and body pose to exploit the inherent semantic association between two modalities, and propose to enforce semantic consistency via a consistency loss. Furthermore, we emphasize the semantic consistency of salient postures by introducing a weakly-supervised detector to identify salient postures, and reweighting the consistency loss to focus more on learning the correspondence between salient postures and the high-level semantics of speech content. In addition, we propose to extract audio features dedicated to facial expression and body gesture separately, and design separate branches for face and body gesture synthesis. Extensive experimental results demonstrate the superiority of our method over the state-of-the-art approaches.

Via

Access Paper or Ask Questions

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Aug 31, 2024

Kunming Su, Qiuxia Wu, Panpan Cai, Xiaogang Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

Figure 1 for RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Figure 2 for RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Figure 3 for RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Figure 4 for RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Abstract:Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant latent representations, and 2) facilitating self-supervised reconstruction in a rotation-invariant manner. For the first challenge, we introduce RI-Transformer, which features disentangled geometry content, rotation-invariant relative orientation and position embedding mechanisms for constructing rotation-invariant point cloud latent space. For the second challenge, a novel dual-branch student-teacher architecture is devised. It enables the self-supervised learning via the reconstruction of masked patches within the learned rotation-invariant latent space. Each branch is based on an RI-Transformer, and they are connected with an additional RI-Transformer predictor. The teacher encodes all point patches, while the student solely encodes unmasked ones. Finally, the predictor predicts the latent features of the masked patches using the output latent embeddings from the student, supervised by the outputs from the teacher. Extensive experiments demonstrate that our method is robust to rotations, achieving the state-of-the-art performance on various downstream tasks.

Via

Access Paper or Ask Questions

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Aug 29, 2024

Sicheng Liu, Lintao Wang, Xiaogan Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

Figure 1 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 2 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 3 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 4 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Abstract:Extreme Multimodal Summarization with Multimodal Output (XMSMO) becomes an attractive summarization approach by integrating various types of information to create extremely concise yet informative summaries for individual modalities. Existing methods overlook the issue that multimodal data often contains more topic irrelevant information, which can mislead the model into producing inaccurate summaries especially for extremely short ones. In this paper, we propose SITransformer, a Shared Information-guided Transformer for extreme multimodal summarization. It has a shared information guided pipeline which involves a cross-modal shared information extractor and a cross-modal interaction module. The extractor formulates semantically shared salient information from different modalities by devising a novel filtering process consisting of a differentiable top-k selector and a shared-information guided gating unit. As a result, the common, salient, and relevant contents across modalities are identified. Next, a transformer with cross-modal attentions is developed for intra- and inter-modality learning with the shared information guidance to produce the extreme summary. Comprehensive experiments demonstrate that SITransformer significantly enhances the summarization quality for both video and text summaries for XMSMO. Our code will be publicly available at https://github.com/SichengLeoLiu/MMAsia24-XMSMO.

* 8 pages, 5 figures, submitted to ACM Multimedia Asia 2024

Via

Access Paper or Ask Questions

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Aug 24, 2024

Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan

Figure 1 for PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Figure 2 for PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Figure 3 for PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Figure 4 for PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

Abstract:Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to the use of convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models (SSMs) in DG PCC and find that directly applying SSMs into DG PCC will encounter several challenges: the inherent topology of the point cloud tends to be disrupted and leads to noise accumulation during the serialization stage. Besides, the lack of designs in domain-agnostic feature learning and data scanning will introduce unanticipated domain-specific information into the 3D sequence data. To this end, we propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains and has the advantages of global receptive fields and efficient linear complexity. PointDGMamba consists of three innovative components: Masked Sequence Denoising (MSD), Sequence-wise Cross-domain Feature Aggregation (SCFA), and Dual-level Domain Scanning (DDS). In particular, MSD selectively masks out the noised point tokens of the point cloud sequences, SCFA introduces cross-domain but same-class point cloud features to encourage the model to learn how to extract more generalized features. DDS includes intra-domain scanning and cross-domain scanning to facilitate information exchange between features. In addition, we propose a new and more challenging benchmark PointDG-3to1 for multi-domain generalization. Extensive experiments demonstrate the effectiveness and state-of-the-art performance of our presented PointDGMamba.

Via

Access Paper or Ask Questions

DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Jul 11, 2024

Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

Figure 1 for DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Figure 2 for DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Figure 3 for DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Figure 4 for DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Abstract:Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task learning capability, it usually relies on high-quality context-rich data and considers a single dataset, and has rarely been studied in point cloud understanding. In this paper, we introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. To this end, we propose Domain Generalized Point-In-Context Learning (DG-PIC) that boosts the generalizability across various tasks and domains at testing time. In particular, we develop dual-level source prototype estimation that considers both global-level shape contextual and local-level geometrical structures for representing source domains and a dual-level test-time feature shifting mechanism that leverages both macro-level domain semantic information and micro-level patch positional relationships to pull the target data closer to the source ones during the testing. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, \textit{i.e.,} point cloud reconstruction, denoising, and registration, within one unified model. We also introduce a benchmark for this new setting. Comprehensive experiments demonstrate that DG-PIC outperforms state-of-the-art techniques significantly.

* Accepted to ECCV 2024

Via

Access Paper or Ask Questions

Consistent Point Orientation for Manifold Surfaces via Boundary Integration

Jul 03, 2024

Weizhou Liu, Xingce Wang, Haichuan Zhao, Xingfei Xue, Zhongke Wu, Xuequan Lu, Ying He

Abstract:This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integral over the boundary surface, we formulate a boundary energy derived from the Dirichlet energy of the GWN. Taking as input a point cloud with randomly oriented normals, we optimize this energy to restore the global harmonicity of the GWN field, thereby recovering the globally consistent normals. Experiments show that our method outperforms state-of-the-art approaches, exhibiting enhanced robustness to noise, outliers, complex topologies, and thin structures. Our code can be found at \url{https://github.com/liuweizhou319/BIM}.

* accepted in siggraph2024

Via

Access Paper or Ask Questions

Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

May 27, 2024

Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling

Figure 1 for Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

Figure 2 for Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

Figure 3 for Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

Figure 4 for Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

Abstract:Adversarial attacks on Face Recognition (FR) systems have proven highly effective in compromising pure FR models, yet adversarial examples may be ineffective to the complete FR systems as Face Anti-Spoofing (FAS) models are often incorporated and can detect a significant number of them. To address this under-explored and essential problem, we propose a novel setting of adversarially attacking both FR and FAS models simultaneously, aiming to enhance the practicability of adversarial attacks on FR systems. In particular, we introduce a new attack method, namely Style-aligned Distribution Biasing (SDB), to improve the capacity of black-box attacks on both FR and FAS models. Specifically, our SDB framework consists of three key components. Firstly, to enhance the transferability of FAS models, we design a Distribution-aware Score Biasing module to optimize adversarial face examples away from the distribution of spoof images utilizing scores. Secondly, to mitigate the substantial style differences between live images and adversarial examples initialized with spoof images, we introduce an Instance Style Alignment module that aligns the style of adversarial examples with live images. In addition, to alleviate the conflicts between the gradients of FR and FAS models, we propose a Gradient Consistency Maintenance module to minimize disparities between the gradients using Hessian approximation. Extensive experiments showcase the superiority of our proposed attack method to state-of-the-art adversarial attacks.

Via

Access Paper or Ask Questions

StraightPCF: Straight Point Cloud Filtering

May 14, 2024

Dasith de Silva Edirimuni, Xuequan Lu, Gang Li, Lei Wei, Antonio Robles-Kelly, Hongdong Li

Abstract:Point cloud filtering is a fundamental 3D vision task, which aims to remove noise while recovering the underlying clean surfaces. State-of-the-art methods remove noise by moving noisy points along stochastic trajectories to the clean surfaces. These methods often require regularization within the training objective and/or during post-processing, to ensure fidelity. In this paper, we introduce StraightPCF, a new deep learning based method for point cloud filtering. It works by moving noisy points along straight paths, thus reducing discretization errors while ensuring faster convergence to the clean surfaces. We model noisy patches as intermediate states between high noise patch variants and their clean counterparts, and design the VelocityModule to infer a constant flow velocity from the former to the latter. This constant flow leads to straight filtering trajectories. In addition, we introduce a DistanceModule that scales the straight trajectory using an estimated distance scalar to attain convergence near the clean surface. Our network is lightweight and only has $\sim530K$ parameters, being 17% of IterativePFN (a most recent point cloud filtering network). Extensive experiments on both synthetic and real-world data show our method achieves state-of-the-art results. Our method also demonstrates nice distributions of filtered points without the need for regularization. The implementation code can be found at: https://github.com/ddsediri/StraightPCF.

* This paper has been accepted to the IEEE/CVF CVPR Conference, 2024

Via

Access Paper or Ask Questions

DGMamba: Domain Generalization via Generalized State Space Model

Apr 11, 2024

Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

Figure 1 for DGMamba: Domain Generalization via Generalized State Space Model

Figure 2 for DGMamba: Domain Generalization via Generalized State Space Model

Figure 3 for DGMamba: Domain Generalization via Generalized State Space Model

Figure 4 for DGMamba: Domain Generalization via Generalized State Space Model

Abstract:Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can hardly be applied to DG to address distribution shifts, due to the hidden state issues and inappropriate scan mechanisms. In this paper, we propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains and meanwhile has the advantages of global receptive fields, and efficient linear complexity. Our DGMamba compromises two core components: Hidden State Suppressing~(HSS) and Semantic-aware Patch refining~(SPR). In particular, HSS is introduced to mitigate the influence of hidden states associated with domain-specific features during output prediction. SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI). Concretely, PFS aims to shuffle the non-semantic patches within images, creating more flexible and effective sequences from images, and DCI is designed to regularize Mamba with the combination of mismatched non-semantic and semantic information by fusing patches among domains. Extensive experiments on four commonly used DG benchmarks demonstrate that the proposed DGMamba achieves remarkably superior results to state-of-the-art models. The code will be made publicly available.

Via

Access Paper or Ask Questions