Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Wang

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Jun 05, 2024

Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

Figure 1 for NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Figure 2 for NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Figure 3 for NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Figure 4 for NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Abstract:Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions.

* Findings of ACL 2024

Via

Access Paper or Ask Questions

Information Leakage from Embedding in Large Language Models

May 22, 2024

Zhipeng Wan, Anda Cheng, Yinggui Wang, Lei Wang

Figure 1 for Information Leakage from Embedding in Large Language Models

Figure 2 for Information Leakage from Embedding in Large Language Models

Figure 3 for Information Leakage from Embedding in Large Language Models

Figure 4 for Information Leakage from Embedding in Large Language Models

Abstract:The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.

Via

Access Paper or Ask Questions

SignLLM: Sign Languages Production Large Language Models

May 17, 2024

Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

Figure 1 for SignLLM: Sign Languages Production Large Language Models

Figure 2 for SignLLM: Sign Languages Production Large Language Models

Figure 3 for SignLLM: Sign Languages Production Large Language Models

Figure 4 for SignLLM: Sign Languages Production Large Language Models

Abstract:In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.

* 33 pages, website at https://signllm.github.io/

Via

Access Paper or Ask Questions

On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

May 13, 2024

Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

Figure 1 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 2 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 3 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Figure 4 for On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

Abstract:Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. Adversarial samples in these studies are designed to attack only one dimension of either bitrate or distortion, targeting a submodel with a specific compression ratio. However, adversaries in real-world scenarios are neither confined to singular dimensional attacks nor always have control over compression ratios. This variability highlights the inadequacy of existing research in comprehensively assessing the adversarial robustness of LIC algorithms in practical applications. To tackle this issue, this paper presents two joint rate-distortion attack paradigms at both submodel and algorithm levels, i.e., Specific-ratio Rate-Distortion Attack (SRDA) and Agnostic-ratio Rate-Distortion Attack (ARDA). Additionally, a suite of multi-granularity assessment tools is introduced to evaluate the attack results from various perspectives. On this basis, extensive experiments on eight prominent LIC algorithms are conducted to offer a thorough analysis of their inherent vulnerabilities. Furthermore, we explore the efficacy of two defense techniques in improving the performance under joint rate-distortion attacks. The findings from these experiments can provide a valuable reference for the development of compression algorithms with enhanced adversarial robustness.

Via

Access Paper or Ask Questions

Ditto: Quantization-aware Secure Inference of Transformers upon MPC

May 09, 2024

Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

Figure 1 for Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Figure 2 for Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Figure 3 for Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Figure 4 for Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Abstract:Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into the MPC domain remains unclear. To bridge this gap, we propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference. Concretely, we first incorporate an MPC-friendly quantization into Transformer inference and employ a quantization-aware distillation procedure to maintain the model utility. Then, we propose novel MPC primitives to support the type conversions that are essential in quantization and implement the quantization-aware MPC execution of secure quantized inference. This approach significantly decreases both computation and communication overhead, leading to improvements in overall efficiency. We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto. The results demonstrate that Ditto is about $3.14\sim 4.40\times$ faster than MPCFormer (ICLR 2023) and $1.44\sim 2.35\times$ faster than the state-of-the-art work PUMA with negligible utility degradation.

* to be published in ICML 2024

Via

Access Paper or Ask Questions

A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

May 02, 2024

Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

Abstract:Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study presents a new self-alignment method under swaying conditions, which can determine the latitude and attitude simultaneously by utilizing all observation vectors without solving the Wahba problem, and it is different from the existing methods. By constructing the dyadic tensor of each observation and reference vector itself, all equations related to observation and reference vectors are accumulated into one equation, where the latitude variable is extracted and solved according to the same eigenvalues of similar matrices on both sides of the equation, meanwhile the attitude is obtained by eigenvalue decomposition. Simulation and experiment tests verify the effectiveness of the proposed methods, and the alignment result is better than TRIAD in convergence speed and stability and comparable with OBA method in alignment accuracy with or without latitude. It is useful for guiding the design of initial alignment in autonomous vehicle applications.

Via

Access Paper or Ask Questions

SATO: Stable Text-to-Motion Framework

May 02, 2024

Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan Liu, Chen Chen

Figure 1 for SATO: Stable Text-to-Motion Framework

Figure 2 for SATO: Stable Text-to-Motion Framework

Figure 3 for SATO: Stable Text-to-Motion Framework

Figure 4 for SATO: Stable Text-to-Motion Framework

Abstract:Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists of three modules, each dedicated to stable attention, stable prediction, and maintaining a balance between accuracy and robustness trade-off. We present a methodology for constructing an SATO that satisfies the stability of attention and prediction. To verify the stability of the model, we introduced a new textual synonym perturbation dataset based on HumanML3D and KIT-ML. Results show that SATO is significantly more stable against synonyms and other slight perturbations while keeping its high accuracy performance.

Via

Access Paper or Ask Questions

Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

Apr 30, 2024

Lei Wang, Desen Yuan

Figure 1 for Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

Figure 2 for Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

Figure 3 for Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

Figure 4 for Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

Abstract:In this paper, we propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS). Assuming that each SOS is the observed sample of a normal distribution and the MOS is its unknown expectation, the MOS inference is formulated as a maximum likelihood estimation problem, where the perceptual correlation of pairwise images is considered in modeling the likelihood of SOS. More specifically, by means of the quality-aware representations learned from the self-supervised backbone, we introduce a learnable relative quality measure to predict the MOS difference between two images. Then, the current image's maximum likelihood estimation towards MOS is represented by the sum of another reference image's estimated MOS and their relative quality. Ideally, no matter which image is selected as the reference, the MOS of the current image should remain unchanged, which is termed perceptual cons tancy constrained calibration (PC3). Finally, we alternatively optimize the relative quality measure's parameter and the current image's estimated MOS via backpropagation and Newton's method respectively. Experiments show that the proposed method is efficient in calibrating the biased SOS and significantly improves IQA model learning when only SOSs are available.

Via

Access Paper or Ask Questions

Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

Apr 30, 2024

Lei Wang, Desen Yuan

Figure 1 for Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

Figure 2 for Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

Figure 3 for Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

Figure 4 for Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

Abstract:Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains unreliable in real-world applications due to its vulnerability to adversarial perturbations and the inexplicit black-box structure. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL), and a score reflection attack method for IQA model. More specifically, we assume that each image is composed of Causal Perception Representation (CPR) and non-causal perception representation (N-CPR). CPR serves as the causation of the subjective quality label, which is invariant to the imperceptible adversarial perturbations. Inversely, N-CPR presents spurious associations with the subjective quality label, which may significantly change with the adversarial perturbations. To extract the CPR from each input image, we develop a soft ranking based channel-wise activation function to mediate the causally sufficient (beneficial for high prediction accuracy) and necessary (beneficial for high robustness) deep features, and based on intervention employ minimax game to optimize. Experiments on four benchmark databases show that the proposed CPRL method outperforms many state-of-the-art adversarial defense methods and provides explicit model interpretation.

Via

Access Paper or Ask Questions

Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Apr 30, 2024

Lei Wang, Desen Yuan

Figure 1 for Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Figure 2 for Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Figure 3 for Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Figure 4 for Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Abstract:Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account extensive information about the image itself, which limits their performance in less annotated scenarios. Generally speaking, image quality datasets usually contain similar scenes or distortions, and it is inevitable for subjects to compare images to score a reasonable score when scoring. Therefore, In this paper, we proposed Subjective Image Quality Score Preprocessing Method perceptual similarity Subjective Preprocessing (PSP), which exploit the perceptual similarity between images to alleviate subjective bias in less annotated scenarios. Specifically, we model subjective scoring as a conditional probability model based on perceptual similarity with previously scored images, called subconscious reference scoring. The reference images are stored by a neighbor dictionary, which is obtained by a normalized vector dot-product based nearest neighbor search of the images' perceptual depth features. Then the preprocessed score is updated by the exponential moving average (EMA) of the subconscious reference scoring, called similarity regularized EMA. Our experiments on multiple datasets (LIVE, TID2013, CID2013) show that this method can effectively remove the bias of the subjective scores. Additionally, Experiments prove that the Preprocesed dataset can improve the performance of downstream IQA tasks very well.

Via

Access Paper or Ask Questions