Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Xu

Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving

Mar 06, 2024

He Li, Ruihua Han, Zirui Zhao, Wei Xu, Qi Hao, Shuai Wang, Chengzhong Xu

Abstract:Virtual reality (VR) is a promising data engine for autonomous driving (AD). However, data fidelity in this paradigm is often degraded by VR inconsistency, for which the existing VR approaches become ineffective, as they ignore the inter-dependency between low-level VR synchronizer designs (i.e., data collector) and high-level VR synthesizer designs (i.e., data processor). This paper presents a seamless virtual reality SVR platform for AD, which mitigates such inconsistency, enabling VR agents to interact with each other in a shared symbiotic world. The crux to SVR is an integrated synchronizer and synthesizer IS2 design, which consists of a drift-aware lidar-inertial synchronizer for VR colocation and a motion-aware deep visual synthesis network for augmented reality image generation. We implement SVR on car-like robots in two sandbox platforms, achieving a cm-level VR colocalization accuracy and 3.2% VR image deviation, thereby avoiding missed collisions or model clippings. Experiments show that the proposed SVR reduces the intervention times, missed turns, and failure rates compared to other benchmarks. The SVR-trained neural network can handle unseen situations in real-world environments, by leveraging its knowledge learnt from the VR space.

Via

Access Paper or Ask Questions

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Mar 03, 2024

Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, Xiang Bai

Figure 1 for Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Figure 2 for Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Figure 3 for Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Figure 4 for Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Abstract:Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. However, existing methods for model adaptation usually update all model parameters, i.e., full fine-tuning paradigm, which is inefficient as it relies on high computational costs (e.g., training GPU memory) and massive storage space. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency. To achieve this goal, we freeze the parameters of the default pre-trained models and then propose the Dynamic Adapter, which generates a dynamic scale for each token, considering the token significance to the downstream task. We further seamlessly integrate Dynamic Adapter with Prompt Tuning (DAPT) by constructing Internal Prompts, capturing the instance-specific features for interaction. Extensive experiments conducted on five challenging datasets demonstrate that the proposed DAPT achieves superior performance compared to the full fine-tuning counterparts while significantly reducing the trainable parameters and training GPU memory by 95% and 35%, respectively. Code is available at https://github.com/LMD0311/DAPT.

* Accepted to CVPR 2024. Code is available at https://github.com/LMD0311/DAPT. We will post a camera-ready version later

Via

Access Paper or Ask Questions

ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Mar 02, 2024

Wei Xu, Yi Wan

Figure 1 for ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Figure 2 for ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Figure 3 for ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Figure 4 for ELA: Efficient Local Attention for Deep Convolutional Neural Networks

Abstract:The attention mechanism has gained significant recognition in the field of computer vision due to its ability to effectively enhance the performance of deep neural networks. However, existing methods often struggle to effectively utilize spatial information or, if they do, they come at the cost of reducing channel dimensions or increasing the complexity of neural networks. In order to address these limitations, this paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure. By analyzing the limitations of the Coordinate Attention method, we identify the lack of generalization ability in Batch Normalization, the adverse effects of dimension reduction on channel attention, and the complexity of attention generation process. To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques. This approach enables accurate localization of regions of interest by efficiently encoding two 1D positional feature maps without the need for dimension reduction, while allowing for a lightweight implementation. We carefully design three hyperparameters in ELA, resulting in four different versions: ELA-T, ELA-B, ELA-S, and ELA-L, to cater to the specific requirements of different visual tasks such as image classification, object detection and sementic segmentation. ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab. Extensive evaluations on the ImageNet, MSCOCO, and Pascal VOC datasets demonstrate the superiority of the proposed ELA module over current state-of-the-art methods in all three aforementioned visual tasks.

Via

Access Paper or Ask Questions

Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

Mar 01, 2024

Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Qianqian Yang, Wei Xu, Zhaoyang Zhang

Figure 1 for Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

Figure 2 for Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

Figure 3 for Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

Figure 4 for Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

Abstract:In this paper, the problem of joint transmission and computation resource allocation for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. In the considered model, the base station (BS) needs to transmit a large amount of data to multiple users with RSMA. Due to limited communication resources, the BS is required to utilize semantic communication techniques to compress the large-sized data. The semantic communication is enabled by shared probability graphs between the BS and the users. The probability graph can be used to further compress the transmission data at the BS, while the received compressed semantic information can be recovered through using the same shared probability graph at each user side. The semantic information compression progress consumes additional computation power at the BS, which inevitably decreases the transmission power due to limited total power budget. Considering both the effect of semantic compression ratio and computation power, the semantic rate expression for RSMA is first obtained. Then, based on the obtained rate expression, an optimization problem is formulated with the aim of maximizing the sum of semantic rates of all users under total power, semantic compression ratio, and rate allocation constraints. To tackle this problem, an iterative algorithm is proposed, where the rate allocation and transmit beamforming design subproblem is solved using a successive convex approximation method, and the semantic compression ratio subproblem is addressed using a greedy algorithm. Numerical results validate the effectiveness of the proposed scheme.

Via

Access Paper or Ask Questions

PointMamba: A Simple State Space Model for Point Cloud Analysis

Feb 19, 2024

Dingkang Liang, Xin Zhou, Xinyu Wang, Xingkui Zhu, Wei Xu, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Figure 1 for PointMamba: A Simple State Space Model for Point Cloud Analysis

Figure 2 for PointMamba: A Simple State Space Model for Point Cloud Analysis

Figure 3 for PointMamba: A Simple State Space Model for Point Cloud Analysis

Figure 4 for PointMamba: A Simple State Space Model for Point Cloud Analysis

Abstract:Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity and is difficult to extend to long sequence modeling due to limited computational resources and so on. Recently, state space models (SSM), a new family of deep sequence models, have presented great potential for sequence modeling in NLP tasks. In this paper, taking inspiration from the success of SSM in NLP, we propose PointMamba, a framework with global modeling and linear complexity. Specifically, by taking embedded point patches as input, we proposed a reordering strategy to enhance SSM's global modeling ability by providing a more logical geometric scanning order. The reordered point tokens are then sent to a series of Mamba blocks to causally capture the point cloud structure. Experimental results show our proposed PointMamba outperforms the transformer-based counterparts on different point cloud analysis datasets, while significantly saving about 44.3% parameters and 25% FLOPs, demonstrating the potential option for constructing foundational 3D vision models. We hope our PointMamba can provide a new perspective for point cloud analysis. The code is available at https://github.com/LMD0311/PointMamba.

* The code is available at https://github.com/LMD0311/PointMamba

Via

Access Paper or Ask Questions

NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Feb 19, 2024

Jonathan Zheng, Alan Ritter, Wei Xu

Figure 1 for NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Figure 2 for NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Figure 3 for NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Figure 4 for NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Abstract:The performance of Large Language Models (LLMs) degrades from the temporal drift between data used for model training and newer text seen during inference. One understudied avenue of language change causing data drift is the emergence of neologisms -- new word forms -- over time. We create a diverse resource of recent English neologisms by using several popular collection methods. We analyze temporal drift using neologisms by comparing sentences containing new words with near-identical sentences that replace neologisms with existing substitute words. Model performance is nearly halved in machine translation when a single neologism is introduced in a sentence. Motivated by these results, we construct a benchmark to evaluate LLMs' ability to generalize to neologisms with various natural language understanding tasks and model perplexity. Models with later knowledge cutoff dates yield lower perplexities and perform better in downstream tasks. LLMs are also affected differently based on the linguistic origins of words, indicating that neologisms are complex for static LLMs to address. We will release our benchmark and code for reproducing our experiments.

Via

Access Paper or Ask Questions

FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

Feb 18, 2024

Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li

Abstract:Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly inform patient treatment. FactPICO consists of 345 plain language summaries of RCT abstracts generated from three LLMs (i.e., GPT-4, Llama-2, and Alpaca), with fine-grained evaluation and natural language rationales from experts. We assess the factuality of critical elements of RCTs in those summaries: Populations, Interventions, Comparators, Outcomes (PICO), as well as the reported findings concerning these. We also evaluate the correctness of the extra information (e.g., explanations) added by LLMs. Using FactPICO, we benchmark a range of existing factuality metrics, including the newly devised ones based on LLMs. We find that plain language summarization of medical evidence is still challenging, especially when balancing between simplicity and factuality, and that existing metrics correlate poorly with expert judgments on the instance level.

Via

Access Paper or Ask Questions

Digital versus Analog Transmissions for Federated Learning over Wireless Networks

Feb 15, 2024

Jiacheng Yao, Wei Xu, Zhaohui Yang, Xiaohu You, Mehdi Bennis, H. Vincent Poor

Figure 1 for Digital versus Analog Transmissions for Federated Learning over Wireless Networks

Figure 2 for Digital versus Analog Transmissions for Federated Learning over Wireless Networks

Abstract:In this paper, we quantitatively compare these two effective communication schemes, i.e., digital and analog ones, for wireless federated learning (FL) over resource-constrained networks, highlighting their essential differences as well as their respective application scenarios. We first examine both digital and analog transmission methods, together with a unified and fair comparison scheme under practical constraints. A universal convergence analysis under various imperfections is established for FL performance evaluation in wireless networks. These analytical results reveal that the fundamental difference between the two paradigms lies in whether communication and computation are jointly designed or not. The digital schemes decouple the communication design from specific FL tasks, making it difficult to support simultaneous uplink transmission of massive devices with limited bandwidth. In contrast, the analog communication allows over-the-air computation (AirComp), thus achieving efficient spectrum utilization. However, computation-oriented analog transmission reduces power efficiency, and its performance is sensitive to computational errors. Finally, numerical simulations are conducted to verify these theoretical observations.

* Accepted by ICC 2024

Via

Access Paper or Ask Questions

Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Feb 12, 2024

Wei Xu, An Liu, Yiting Zhang, Vincent Lau

Figure 1 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 2 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 3 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Figure 4 for Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Abstract:Federated learning (FL) is a machine learning paradigm where the clients possess decentralized training data and the central server handles aggregation and scheduling. Typically, FL algorithms involve clients training their local models using stochastic gradient descent (SGD), which carries drawbacks such as slow convergence and being prone to getting stuck in suboptimal solutions. In this work, we propose a message passing based Bayesian federated learning (BFL) framework to avoid these drawbacks.Specifically, we formulate the problem of deep neural network (DNN) learning and compression and as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an efficient BFL algorithm called EMTDAMP, where expectation maximization (EM) and turbo deep approximate message passing (TDAMP) are combined to achieve distributed learning and compression. The central server aggregates local posterior distributions to update global posterior distributions and update hyperparameters based on EM to accelerate convergence. The clients perform TDAMP to achieve efficient approximate message passing over DNN with joint prior distribution. We detail the application of EMTDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EMTDAMP.

Via

Access Paper or Ask Questions

Non-autoregressive Generative Models for Reranking Recommendation

Feb 10, 2024

Yuxin Ren, Qiya Yang, Yichun Wu, Wei Xu, Yalong Wang, Zhiqiang Zhang

Abstract:In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items.The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. Generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. Hence, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To address challenges related to sparse training samples and dynamic candidates impacting model convergence, we introduce a matching model. Considering the diverse nature of user feedback, we propose a sequence-level unlikelihood training objective to distinguish feasible from unfeasible sequences. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments on publicly available datasets validate the superior performance of our proposed approach compared to the existing state-of-the-art reranking methods. Furthermore, our method has been fully deployed in a popular video app Kuaishou with over 300 million daily active users, significantly enhancing online recommendation quality, and demonstrating the effectiveness and efficiency of our approach.

* Work in progress

Via

Access Paper or Ask Questions