Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Liu

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Feb 24, 2025

Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang(+1 more)

Figure 1 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Figure 2 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Figure 3 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Figure 4 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Abstract:This paper comprehensively evaluates several recently proposed optimizers for 4-bit training, revealing that low-bit precision amplifies sensitivity to learning rates and often causes unstable gradient norms, leading to divergence at higher learning rates. Among these, SPAM, a recent optimizer featuring momentum reset and spike-aware gradient clipping, achieves the best performance across various bit levels, but struggles to stabilize gradient norms, requiring careful learning rate tuning. To address these limitations, we propose Stable-SPAM, which incorporates enhanced gradient normalization and clipping techniques. In particular, Stable-SPAM (1) adaptively updates the clipping threshold for spiked gradients by tracking their historical maxima; (2) normalizes the entire gradient matrix based on its historical $l_2$-norm statistics; and $(3)$ inherits momentum reset from SPAM to periodically reset the first and second moments of Adam, mitigating the accumulation of spiked gradients. Extensive experiments show that Stable-SPAM effectively stabilizes gradient norms in 4-bit LLM training, delivering superior performance compared to Adam and SPAM. Notably, our 4-bit LLaMA-1B model trained with Stable-SPAM outperforms the BF16 LLaMA-1B trained with Adam by up to $2$ perplexity. Furthermore, when both models are trained in 4-bit, Stable-SPAM achieves the same loss as Adam while requiring only about half the training steps. Code is available at https://github.com/TianjinYellow/StableSPAM.git.

Via

Access Paper or Ask Questions

Causal Learning for Heterogeneous Subgroups Based on Nonlinear Causal Kernel Clustering

Jan 20, 2025

Lu Liu, Yang Tang, Kexuan Zhang, Qiyu Sun

Abstract:Due to the challenge posed by multi-source and heterogeneous data collected from diverse environments, causal relationships among features can exhibit variations influenced by different time spans, regions, or strategies. This diversity makes a single causal model inadequate for accurately representing complex causal relationships in all observational data, a crucial consideration in causal learning. To address this challenge, we introduce the nonlinear Causal Kernel Clustering method designed for heterogeneous subgroup causal learning, illuminating variations in causal relationships across diverse subgroups. It comprises two primary components. First, the construction of a sample mapping function forms the basis of the subsequent nonlinear causal kernel. This function assesses the differences in potential nonlinear causal relationships in various samples, supported by our causal identifiability theory. Second, a nonlinear causal kernel is proposed for clustering heterogeneous subgroups. Experimental results showcase the exceptional performance of our method in accurately identifying heterogeneous subgroups and effectively enhancing causal learning, leading to a great reduction in prediction error.

Via

Access Paper or Ask Questions

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Jan 12, 2025

Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu

Figure 1 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 2 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 3 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Figure 4 for SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Abstract:Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks, yet their training remains highly resource-intensive and susceptible to critical challenges such as training instability. A predominant source of this instability stems from gradient and loss spikes, which disrupt the learning process, often leading to costly interventions like checkpoint recovery and experiment restarts, further amplifying inefficiencies. This paper presents a comprehensive investigation into gradient spikes observed during LLM training, revealing their prevalence across multiple architectures and datasets. Our analysis shows that these spikes can be up to $1000\times$ larger than typical gradients, substantially deteriorating model performance. To address this issue, we propose Spike-Aware Adam with Momentum Reset SPAM, a novel optimizer designed to counteract gradient spikes through momentum reset and spike-aware gradient clipping. Extensive experiments, including both pre-training and fine-tuning, demonstrate that SPAM consistently surpasses Adam and its variants across various tasks, including (1) LLM pre-training from 60M to 1B, (2) 4-bit LLM pre-training,(3) reinforcement learning, and (4) Time Series Forecasting. Additionally, SPAM facilitates memory-efficient training by enabling sparse momentum, where only a subset of momentum terms are maintained and updated. When operating under memory constraints, SPAM outperforms state-of-the-art memory-efficient optimizers such as GaLore and Adam-Mini. Our work underscores the importance of mitigating gradient spikes in LLM training and introduces an effective optimization strategy that enhances both training stability and resource efficiency at scale. Code is available at https://github.com/TianjinYellow/SPAM-Optimizer.git

Via

Access Paper or Ask Questions

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Dec 26, 2024

Huiyu Duan, Qiang Hu, Jiarui Wang, Liu Yang, Zitong Xu, Lu Liu, Xiongkuo Min, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang(+1 more)

Figure 1 for FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Figure 2 for FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Figure 3 for FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Figure 4 for FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Abstract:The rapid growth of user-generated content (UGC) videos has produced an urgent need for effective video quality assessment (VQA) algorithms to monitor video quality and guide optimization and recommendation procedures. However, current VQA models generally only give an overall rating for a UGC video, which lacks fine-grained labels for serving video processing and recommendation applications. To address the challenges and promote the development of UGC videos, we establish the first large-scale Fine-grained Video quality assessment Database, termed FineVD, which comprises 6104 UGC videos with fine-grained quality scores and descriptions across multiple dimensions. Based on this database, we propose a Fine-grained Video Quality assessment (FineVQ) model to learn the fine-grained quality of UGC videos, with the capabilities of quality rating, quality scoring, and quality attribution. Extensive experimental results demonstrate that our proposed FineVQ can produce fine-grained video-quality results and achieve state-of-the-art performance on FineVD and other commonly used UGC-VQA datasets. Both Both FineVD and FineVQ will be made publicly available.

Via

Access Paper or Ask Questions

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Dec 17, 2024

Lu Liu, Huiyu Duan, Qiang Hu, Liu Yang, Chunlei Cai, Tianxiao Ye, Huayu Liu, Xiaoyun Zhang, Guangtao Zhai

Figure 1 for F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Figure 2 for F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Figure 3 for F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Figure 4 for F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Abstract:Artificial intelligence generative models exhibit remarkable capabilities in content creation, particularly in face image generation, customization, and restoration. However, current AI-generated faces (AIGFs) often fall short of human preferences due to unique distortions, unrealistic details, and unexpected identity shifts, underscoring the need for a comprehensive quality evaluation framework for AIGFs. To address this need, we introduce FaceQ, a large-scale, comprehensive database of AI-generated Face images with fine-grained Quality annotations reflecting human preferences. The FaceQ database comprises 12,255 images generated by 29 models across three tasks: (1) face generation, (2) face customization, and (3) face restoration. It includes 32,742 mean opinion scores (MOSs) from 180 annotators, assessed across multiple dimensions: quality, authenticity, identity (ID) fidelity, and text-image correspondence. Using the FaceQ database, we establish F-Bench, a benchmark for comparing and evaluating face generation, customization, and restoration models, highlighting strengths and weaknesses across various prompts and evaluation dimensions. Additionally, we assess the performance of existing image quality assessment (IQA), face quality assessment (FQA), AI-generated content image quality assessment (AIGCIQA), and preference evaluation metrics, manifesting that these standard metrics are relatively ineffective in evaluating authenticity, ID fidelity, and text-image correspondence. The FaceQ database will be publicly available upon publication.

Via

Access Paper or Ask Questions

Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Dec 02, 2024

Xi Lin, Weiliang Xu, Yixian Mao, Jing Wang, Meixuan Lv, Lu Liu, Xihui Luo, Xinming Li

Figure 1 for Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Figure 2 for Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Figure 3 for Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Figure 4 for Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Abstract:Vision-based tactile sensors, through high-resolution optical measurements, can effectively perceive the geometric shape of objects and the force information during the contact process, thus helping robots acquire higher-dimensional tactile data. Vision-based tactile sensor simulation supports the acquisition and understanding of tactile information without physical sensors by accurately capturing and analyzing contact behavior and physical properties. However, the complexity of contact dynamics and lighting modeling limits the accurate reproduction of real sensor responses in simulations, making it difficult to meet the needs of different sensor setups and affecting the reliability and effectiveness of strategy transfer to practical applications. In this letter, we propose a contact-condition guided diffusion model that maps RGB images of objects and contact force data to high-fidelity, detail-rich vision-based tactile sensor images. Evaluations show that the three-channel tactile images generated by this method achieve a 60.58% reduction in mean squared error and a 38.1% reduction in marker displacement error compared to existing approaches based on lighting model and mechanical model, validating the effectiveness of our approach. The method is successfully applied to various types of tactile vision sensors and can effectively generate corresponding tactile images under complex loads. Additionally, it demonstrates outstanding reconstruction of fine texture features of objects in a Montessori tactile board texture generation task.

Via

Access Paper or Ask Questions

Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Nov 18, 2024

Muhammad Saad Zia, Ashiq Anjum, Lu Liu, Anthony Conway, Anasol Pena Rios

Figure 1 for Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Figure 2 for Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Figure 3 for Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Figure 4 for Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Abstract:Physics Informed Machine Learning has emerged as a popular approach in modelling and simulation for digital twins to generate accurate models of processes and behaviours of real-world systems. However, despite their success in generating accurate and reliable models, the existing methods either use simple regularizations in loss functions to offer limited physics integration or are too specific in architectural definitions to be generalized to a wide variety of physical systems. This paper presents a generic approach based on a novel physics-encoded residual neural network architecture to combine data-driven and physics-based analytical models to address these limitations. Our method combines physics blocks as mathematical operators from physics-based models with learning blocks comprising feed-forward layers. Intermediate residual blocks are incorporated for stable gradient flow as they train on physical system observation data. This way, the model learns to comply with the geometric and kinematic aspects of the physical system. Compared to conventional neural network-based methods, our method improves generalizability with substantially low data requirements and model complexity in terms of parameters, especially in scenarios where prior physics knowledge is either elementary or incomplete. We investigate our approach in two application domains. The first is a basic robotic motion model using Euler Lagrangian equations of motion as physics prior. The second application is a complex scenario of a steering model for a self-driving vehicle in a simulation. In both applications, our method outperforms both conventional neural network based approaches as-well as state-of-the-art Physics Informed Machine Learning methods.

Via

Access Paper or Ask Questions

Symbol Level Precoding for Systems with Improper Gaussian Interference

Sep 11, 2024

Lu Liu, Rang Liu, Ly V. Nguyen, A. Lee Swindlehurst

Figure 1 for Symbol Level Precoding for Systems with Improper Gaussian Interference

Figure 2 for Symbol Level Precoding for Systems with Improper Gaussian Interference

Figure 3 for Symbol Level Precoding for Systems with Improper Gaussian Interference

Figure 4 for Symbol Level Precoding for Systems with Improper Gaussian Interference

Abstract:This paper focuses on precoding design in multi-antenna systems with improper Gaussian interference (IGI), characterized by correlated real and imaginary parts. We first study block level precoding (BLP) and symbol level precoding (SLP) assuming the receivers apply a pre-whitening filter to decorrelate and normalize the IGI. We then shift to the scenario where the base station (BS) incorporates the IGI statistics in the SLP design, which allows the receivers to employ a standard detection algorithm without pre-whitenting. Finally we address the case where the channel and statistics of the IGI are unknown, and we formulate robust BLP and SLP designs that minimize the worst case performance in such settings. Interestingly, we show that for BLP, the worst-case IGI is in fact proper, while for SLP the worst case occurs when the interference signal is maximally improper, with fully correlated real and imaginary parts. Numerical results reveal the superior performance of SLP in terms of symbol error rate (SER) and energy efficiency (EE), especially for the case where there is uncertainty in the non-circularity of the jammer.

* 13 pages, 12 figures

Via

Access Paper or Ask Questions

An Empirical Study on Information Extraction using Large Language Models

Sep 04, 2024

Ridong Han, Chaohao Yang, Tao Peng, Prayag Tiwari, Xiang Wan, Lu Liu, Benyou Wang

Figure 1 for An Empirical Study on Information Extraction using Large Language Models

Figure 2 for An Empirical Study on Information Extraction using Large Language Models

Figure 3 for An Empirical Study on Information Extraction using Large Language Models

Figure 4 for An Empirical Study on Information Extraction using Large Language Models

Abstract:Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstrate the latest representative progress in LLMs' information extraction ability, we assess the information extraction ability of GPT-4 (the latest version of GPT at the time of writing this paper) from four perspectives: Performance, Evaluation Criteria, Robustness, and Error Types. Our results suggest a visible performance gap between GPT-4 and state-of-the-art (SOTA) IE methods. To alleviate this problem, considering the LLMs' human-like characteristics, we propose and analyze the effects of a series of simple prompt-based methods, which can be generalized to other LLMs and NLP tasks. Rich experiments show our methods' effectiveness and some of their remaining issues in improving GPT-4's information extraction ability.

* This article has an original arxiv version entitled "Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors", whose url link is arXiv/2305.14450

Via

Access Paper or Ask Questions

Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters

Feb 21, 2024

Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian

Abstract:Animating virtual characters has always been a fundamental research problem in virtual reality (VR). Facial animations play a crucial role as they effectively convey emotions and attitudes of virtual humans. However, creating such facial animations can be challenging, as current methods often involve utilization of expensive motion capture devices or significant investments of time and effort from human animators in tuning animation parameters. In this paper, we propose a holistic solution to automatically animate virtual human faces. In our solution, a deep learning model was first trained to retarget the facial expression from input face images to virtual human faces by estimating the blendshape coefficients. This method offers the flexibility of generating animations with characters of different appearances and blendshape topologies. Second, a practical toolkit was developed using Unity 3D, making it compatible with the most popular VR applications. The toolkit accepts both image and video as input to animate the target virtual human faces and enables users to manipulate the animation results. Furthermore, inspired by the spirit of Human-in-the-loop (HITL), we leveraged user feedback to further improve the performance of the model and toolkit, thereby increasing the customization properties to suit user preferences. The whole solution, for which we will make the code public, has the potential to accelerate the generation of facial animations for use in VR applications.

* 9 pages. To appear in IEEE-VR

Via

Access Paper or Ask Questions