Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Lu

Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Jul 11, 2024

Chang Sun, Hui Yuan, Xiaolong Mao, Xin Lu, Raouf Hamzaoui

Figure 1 for Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Figure 2 for Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Figure 3 for Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Figure 4 for Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

Abstract:In point cloud geometry compression, most octreebased context models use the cross-entropy between the onehot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classification problem. As a result, it fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. We first analyze why the cross-entropy loss function fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. Then, we propose an attention-based child node number prediction (ACNP) module to enhance the context models. The proposed module can predict the number of occupied child nodes and map it into an 8- dimensional vector to assist the context model in predicting the probability distribution of the occupancy of the current node for efficient entropy coding. Experimental results demonstrate that the proposed module enhances the coding efficiency of octree-based context models.

* IEEE Signal Processing Letters, 2024
* 2 figures and 2 tables

Via

Access Paper or Ask Questions

PCAC-GAN:ASparse-Tensor-Based Generative Adversarial Network for 3D Point Cloud Attribute Compression

Jul 09, 2024

Xiaolong Mao, Hui Yuan, Xin Lu, Raouf Hamzaoui, Wei Gao

Figure 1 for PCAC-GAN:ASparse-Tensor-Based Generative Adversarial Network for 3D Point Cloud Attribute Compression

Figure 2 for PCAC-GAN:ASparse-Tensor-Based Generative Adversarial Network for 3D Point Cloud Attribute Compression

Figure 3 for PCAC-GAN:ASparse-Tensor-Based Generative Adversarial Network for 3D Point Cloud Attribute Compression

Figure 4 for PCAC-GAN:ASparse-Tensor-Based Generative Adversarial Network for 3D Point Cloud Attribute Compression

Abstract:Learning-based methods have proven successful in compressing geometric information for point clouds. For attribute compression, however, they still lag behind non-learning-based methods such as the MPEG G-PCC standard. To bridge this gap, we propose a novel deep learning-based point cloud attribute compression method that uses a generative adversarial network (GAN) with sparse convolution layers. Our method also includes a module that adaptively selects the resolution of the voxels used to voxelize the input point cloud. Sparse vectors are used to represent the voxelized point cloud, and sparse convolutions process the sparse tensors, ensuring computational efficiency. To the best of our knowledge, this is the first application of GANs to compress point cloud attributes. Our experimental results show that our method outperforms existing learning-based techniques and rivals the latest G-PCC test model (TMC13v23) in terms of visual quality.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Jul 02, 2024

Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Hailiang Huang, Guanhua Chen, Yun Chen

Figure 1 for SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Figure 2 for SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Figure 3 for SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Figure 4 for SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Abstract:The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. Inspired by the social facilitation concept, SoP generates and optimizes multiple jailbreak characters to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SoP can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SoP achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SoP. Code is available at https://github.com/Yang-Yan-Yang-Yan/SoP.

Via

Access Paper or Ask Questions

TraceNet: Segment one thing efficiently

Jun 21, 2024

Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt

Figure 1 for TraceNet: Segment one thing efficiently

Figure 2 for TraceNet: Segment one thing efficiently

Figure 3 for TraceNet: Segment one thing efficiently

Figure 4 for TraceNet: Segment one thing efficiently

Abstract:Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the cost of computation on the entire image to identify all instances. To address this, we propose and formulate a one tap driven single instance segmentation task that segments a single instance selected by a user via a positive tap. This task, in contrast to the broader task of segmenting anything as suggested in the Segment Anything Model \cite{sam}, focuses on efficient segmentation of a single instance specified by the user. To solve this problem, we present TraceNet, which explicitly locates the selected instance by way of receptive field tracing. TraceNet identifies image regions that are related to the user tap and heavy computations are only performed on selected regions of the image. Therefore overall computation cost and memory consumption are reduced during inference. We evaluate the performance of TraceNet on instance IoU average over taps and the proportion of the region that a user tap can fall into for a high-quality single-instance mask. Experimental results on MS-COCO and LVIS demonstrate the effectiveness and efficiency of the proposed approach. TraceNet can jointly achieve the efficiency and interactivity, filling in the gap between needs for efficient mobile inference and recent research trend towards multimodal and interactive segmentation models.

Via

Access Paper or Ask Questions

Non-autoregressive Personalized Bundle Generation

Jun 11, 2024

Wenchuan Yang, Cheng Yang, Jichao Li, Yuejin Tan, Xin Lu, Chuan Shi

Figure 1 for Non-autoregressive Personalized Bundle Generation

Figure 2 for Non-autoregressive Personalized Bundle Generation

Figure 3 for Non-autoregressive Personalized Bundle Generation

Figure 4 for Non-autoregressive Personalized Bundle Generation

Abstract:The personalized bundle generation problem, which aims to create a preferred bundle for user from numerous candidate items, receives increasing attention in recommendation. However, existing works ignore the order-invariant nature of the bundle and adopt sequential modeling methods as the solution, which might introduce inductive bias and cause a large latency in prediction. To address this problem, we propose to perform the bundle generation via non-autoregressive mechanism and design a novel encoder-decoder framework named BundleNAT, which can effectively output the targeted bundle in one-shot without relying on any inherent order. In detail, instead of learning sequential dependency, we propose to adopt pre-training techniques and graph neural network to fully embed user-based preference and item-based compatibility information, and use a self-attention based encoder to further extract global dependency pattern. We then design a permutation-equivariant decoding architecture that is able to directly output the desired bundle in a one-shot manner. Experiments on three real-world datasets from Youshu and Netease show the proposed BundleNAT significantly outperforms the current state-of-the-art methods in average by up to 35.92%, 10.97% and 23.67% absolute improvements in Precision, Precision+, and Recall, respectively.

* Submitted to Information Processing & Management

Via

Access Paper or Ask Questions

A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Mar 15, 2024

Hongyang Zhu, Xin Lu, Yanwei Qin, Xinran Yu, Tianjiao Sun, Yunsong Zhao

Figure 1 for A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Figure 2 for A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Figure 3 for A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Figure 4 for A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Abstract:Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on the sinogram by innovatively updating the response inconsistency compensation coefficients of detector units, which is achieved by employing the group sparse constraint and the projection-view direction sparse constraint on the stripe artifacts. Simultaneously, we apply the sparse constraint on the reconstructed image to further rectified ring artifacts in the image domain. The key advantage of the proposed method lies in considering the relationship between the response inconsistency compensation coefficients of the detector units and the projection views, which enables a more accurate correction of the response of the detector units. An alternating minimization method is designed to solve the model. Comparative experiments on real photon counting detector data demonstrate that the proposed method not only surpasses existing methods in removing ring artifacts but also excels in preserving structural details and image fidelity.

Via

Access Paper or Ask Questions

Vanilla Transformers are Transfer Capability Teachers

Mar 04, 2024

Xin Lu, Yanyan Zhao, Bing Qin

Abstract:Recently, Mixture of Experts (MoE) Transformers have garnered increasing attention due to their advantages in model capacity and computational efficiency. However, studies have indicated that MoE Transformers underperform vanilla Transformers in many downstream tasks, significantly diminishing the practical value of MoE models. To explain this issue, we propose that the pre-training performance and transfer capability of a model are joint determinants of its downstream task performance. MoE models, in comparison to vanilla models, have poorer transfer capability, leading to their subpar performance in downstream tasks. To address this issue, we introduce the concept of transfer capability distillation, positing that although vanilla models have weaker performance, they are effective teachers of transfer capability. The MoE models guided by vanilla models can achieve both strong pre-training performance and transfer capability, ultimately enhancing their performance in downstream tasks. We design a specific distillation method and conduct experiments on the BERT architecture. Experimental results show a significant improvement in downstream performance of MoE models, and many further evidences also strongly support the concept of transfer capability distillation. Finally, we attempt to interpret transfer capability distillation and provide some insights from the perspective of model feature.

Via

Access Paper or Ask Questions

How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider Transformer Models

Mar 04, 2024

Xin Lu, Yanyan Zhao, Bing Qin

Abstract:Pre-trained language models have been proven to possess strong base capabilities, which not only excel in in-distribution language modeling but also show powerful abilities in out-of-distribution language modeling, transfer learning and few-shot learning. Unlike existing work focusing on the influence of scale on base capabilities, our work examines the influence of architecture on those. Specifically, our concern is: How does architecture influence the base capabilities of pre-trained language models? In this work, we attempt to explain and reverse the decline in base capabilities caused by the architecture of FFN-Wider Transformers, seeking to provide some insights. Through analysis, we found the contribution ratio of Multi-Head Attention (a combination function) to pre-trained language modeling is a key factor affecting base capabilities. FFN-Wider Transformers reduce the contribution ratio of this combination function, leading to a decline in base capabilities. We confirmed this by experiments and proposed Combination Enhancement Architecture (CEA) to address the decline in base capabilities of such models. Significantly, we extended our explanation and CEA to Mixture of Experts (MoE) architecture Transformers, which also alleviated their decline in base capabilities to some extent, proving our work can offer useful guidance for architecture analysis, architecture improvement and architecture design.

Via

Access Paper or Ask Questions

Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval

Nov 10, 2023

Xin Lu, Shikun Chen, Yichao Cao, Xin Zhou, Xiaobo Lu

Abstract:In recent years, hashing methods have been popular in the large-scale media search for low storage and strong representation capabilities. To describe objects with similar overall appearance but subtle differences, more and more studies focus on hashing-based fine-grained image retrieval. Existing hashing networks usually generate both local and global features through attention guidance on the same deep activation tensor, which limits the diversity of feature representations. To handle this limitation, we substitute convolutional descriptors for attention-guided features and propose an Attributes Grouping and Mining Hashing (AGMH), which groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation for efficient fine-grained image retrieval. Specifically, an Attention Dispersion Loss (ADL) is designed to force the descriptors to attend to various local regions and capture diverse subtle details. Moreover, we propose a Stepwise Interactive External Attention (SIEA) to mine critical attributes in each descriptor and construct correlations between fine-grained attributes and objects. The attention mechanism is dedicated to learning discrete attributes, which will not cost additional computations in hash codes generation. Finally, the compact binary codes are learned by preserving pairwise similarities. Experimental results demonstrate that AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.

* Proceedings of the 31st ACM International Conference on on Multimedia (MM '23), 2023, 6558-6566

Via

Access Paper or Ask Questions

Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning

Sep 07, 2023

Chengmin Zhou, Xin Lu, Jiapeng Dai, Bingding Huang, Xiaoxu Liu, Pasi Fränti

Abstract:Motion planning is the soul of robot decision making. Classical planning algorithms like graph search and reaction-based algorithms face challenges in cases of dense and dynamic obstacles. Deep learning algorithms generate suboptimal one-step predictions that cause many collisions. Reinforcement learning algorithms generate optimal or near-optimal time-sequential predictions. However, they suffer from slow convergence, suboptimal converged results, and overfittings. This paper introduces a hybrid algorithm for robotic motion planning: long short-term memory (LSTM) pooling and skip connection for attention-based discrete soft actor critic (LSA-DSAC). First, graph network (relational graph) and attention network (attention weight) interpret the environmental state for the learning of the discrete soft actor critic algorithm. The expressive power of attention network outperforms that of graph in our task by difference analysis of these two representation methods. However, attention based DSAC faces the overfitting problem in training. Second, the skip connection method is integrated to attention based DSAC to mitigate overfitting and improve convergence speed. Third, LSTM pooling is taken to replace the sum operator of attention weigh and eliminate overfitting by slightly sacrificing convergence speed at early-stage training. Experiments show that LSA-DSAC outperforms the state-of-the-art in training and most evaluations. The physical robot is also implemented and tested in the real world.

Via

Access Paper or Ask Questions