Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiang Wang

Victor

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Jul 12, 2024

Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

Figure 1 for BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Figure 2 for BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Figure 3 for BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Figure 4 for BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Abstract:Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.

Via

Access Paper or Ask Questions

$β$-DPO: Direct Preference Optimization with Dynamic $β$

Jul 11, 2024

Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

Abstract:Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $\beta$, as well as to the quality of the preference data. We analyze the impact of $\beta$ and data quality on DPO, uncovering that optimal $\beta$ values vary with the informativeness of pairwise data. Addressing the limitations of static $\beta$ values, we introduce a novel framework that dynamically calibrates $\beta$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $\beta$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $\beta$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{https://github.com/junkangwu/beta-DPO}.

Via

Access Paper or Ask Questions

Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Jul 10, 2024

An Zhang, Han Wang, Xiang Wang, Tat-Seng Chua

Figure 1 for Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Figure 2 for Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Figure 3 for Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Figure 4 for Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Abstract:Domain Generalization (DG), designed to enhance out-of-distribution (OOD) generalization, is all about learning invariance against domain shifts utilizing sufficient supervision signals. Yet, the scarcity of such labeled data has led to the rise of unsupervised domain generalization (UDG) - a more important yet challenging task in that models are trained across diverse domains in an unsupervised manner and eventually tested on unseen domains. UDG is fast gaining attention but is still far from well-studied. To close the research gap, we propose a novel learning framework designed for UDG, termed the Disentangled Masked Auto Encoder (DisMAE), aiming to discover the disentangled representations that faithfully reveal the intrinsic features and superficial variations without access to the class label. At its core is the distillation of domain-invariant semantic features, which cannot be distinguished by domain classifier, while filtering out the domain-specific variations (for example, color schemes and texture patterns) that are unstable and redundant. Notably, DisMAE co-trains the asymmetric dual-branch architecture with semantic and lightweight variation encoders, offering dynamic data manipulation and representation level augmentation capabilities. Extensive experiments on four benchmark datasets (i.e., DomainNet, PACS, VLCS, Colored MNIST) with both DG and UDG tasks demonstrate that DisMAE can achieve competitive OOD performance compared with the state-of-the-art DG and UDG baselines, which shed light on potential research line in improving the generalization ability with large-scale unlabeled data.

Via

Access Paper or Ask Questions

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Jul 10, 2024

Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

Figure 1 for Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Figure 2 for Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Figure 3 for Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Figure 4 for Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Abstract:This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $\beta$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $\beta'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO.

Via

Access Paper or Ask Questions

Language Models Encode Collaborative Signals in Recommendation

Jul 07, 2024

Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, Tat-Seng Chua

Figure 1 for Language Models Encode Collaborative Signals in Recommendation

Figure 2 for Language Models Encode Collaborative Signals in Recommendation

Figure 3 for Language Models Encode Collaborative Signals in Recommendation

Figure 4 for Language Models Encode Collaborative Signals in Recommendation

Abstract:Recent studies empirically indicate that language models (LMs) encode rich world knowledge beyond mere semantics, attracting significant attention across various fields. However, in the recommendation domain, it remains uncertain whether LMs implicitly encode user preference information. Contrary to the prevailing understanding that LMs and traditional recommender models learn two distinct representation spaces due to a huge gap in language and behavior modeling objectives, this work rethinks such understanding and explores extracting a recommendation space directly from the language representation space. Surprisingly, our findings demonstrate that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance. This outcome suggests the homomorphism between the language representation space and an effective recommendation space, implying that collaborative signals may indeed be encoded within advanced LMs. Motivated by these findings, we propose a simple yet effective collaborative filtering (CF) model named AlphaRec, which utilizes language representations of item textual metadata (e.g., titles) instead of traditional ID-based embeddings. Specifically, AlphaRec is comprised of three main components: a multilayer perceptron (MLP), graph convolution, and contrastive learning (CL) loss function, making it extremely easy to implement and train. Our empirical results show that AlphaRec outperforms leading ID-based CF models on multiple datasets, marking the first instance of such a recommender with text embeddings achieving this level of performance. Moreover, AlphaRec introduces a new language-representation-based CF paradigm with several desirable advantages: being easy to implement, lightweight, rapid convergence, superior zero-shot recommendation abilities in new domains, and being aware of user intention.

* Codes are available at https://github.com/LehengTHU/AlphaRec

Via

Access Paper or Ask Questions

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Jul 04, 2024

Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu

Abstract:Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at https://github.com/ChengpengLi1003/DotaMath.

* Work in progress

Via

Access Paper or Ask Questions

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Jun 18, 2024

Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang

Figure 1 for Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Figure 2 for Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Figure 3 for Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Figure 4 for Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Abstract:Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations. Firstly, towards unbiased and explainable VAD system, we construct the first large-scale multimodal VAD instruction-tuning benchmark, i.e., VAD-Instruct50k. This dataset is created using a carefully designed semi-automatic labeling paradigm. Efficient single-frame annotations are applied to the collected untrimmed videos, which are then synthesized into high-quality analyses of both abnormal and normal video clips using a robust off-the-shelf video captioner and a large language model (LLM). Building upon the VAD-Instruct50k dataset, we develop a customized solution for interpretable video anomaly detection. We train a lightweight temporal sampler to select frames with high anomaly response and fine-tune a multimodal large language model (LLM) to generate explanatory content. Extensive experimental results validate the generality and interpretability of the proposed Holmes-VAD, establishing it as a novel interpretable technique for real-world video anomaly analysis. To support the community, our benchmark and model will be publicly available at https://github.com/pipixin321/HolmesVAD.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

On Softmax Direct Preference Optimization for Recommendation

Jun 14, 2024

Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

Figure 1 for On Softmax Direct Preference Optimization for Recommendation

Figure 2 for On Softmax Direct Preference Optimization for Recommendation

Figure 3 for On Softmax Direct Preference Optimization for Recommendation

Figure 4 for On Softmax Direct Preference Optimization for Recommendation

Abstract:Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-tuning LM with a language modeling loss. However, the current objective fails to fully leverage preference data and is not optimized for personalized ranking tasks, which hinders the performance of LM-based recommenders. Inspired by the current advancement of Direct Preference Optimization (DPO) in human preference alignment and the success of softmax loss in recommendations, we propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives, rather than solely focusing on positives. Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders, connected to softmax sampling strategies. Theoretically, we bridge S-DPO with the softmax loss over negative sampling and find that it has a side effect of mining hard negatives, which assures its exceptional capabilities in recommendation tasks. Empirically, extensive experiments conducted on three real-world datasets demonstrate the superiority of S-DPO to effectively model user preference and further boost recommendation performance while mitigating the data likelihood decline issue of DPO. Our codes are available at https://github.com/chenyuxin1999/S-DPO.

Via

Access Paper or Ask Questions

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Jun 09, 2024

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

Figure 1 for Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Figure 2 for Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Figure 3 for Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Figure 4 for Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Abstract:Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses. Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. For the event memory module, long and short-term memory banks are employed to separately focus on historical and ongoing sessions, while a topic-based retrieval mechanism is introduced to enhance the accuracy of memory retrieval. Furthermore, the persona module conducts dynamic persona modeling for both users and agents. The integration of retrieved memories and extracted personas is subsequently fed into the generator to induce appropriate responses. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated across various illustrative benchmarks, models, and tasks. The code is released at https://github.com/leolee99/LD-Agent.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Jun 05, 2024

Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He

Figure 1 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 2 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 3 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 4 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Abstract:Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing (NLP), the Transformer-based SNNs are still encounting the lower accuracy w.r.t the ANN counterparts. In this work, we introduce a novel ANN-to-SNN conversion method called SpikeZIP-TF, where ANN and SNN are exactly equivalent, thus incurring no accuracy degradation. SpikeZIP-TF achieves 83.82% accuracy on CV dataset (ImageNet) and 93.79% accuracy on NLP dataset (SST-2), which are higher than SOTA Transformer-based SNNs. The code is available in GitHub: https://github.com/Intelligent-Computing-Research-Group/SpikeZIP_transformer

* * These authors contributed equally to this work

Via

Access Paper or Ask Questions