Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Chen

Soochow University

FedDTPT: Federated Discrete and Transferable Prompt Tuning for Black-Box Large Language Models

Nov 01, 2024

Jiaqi Wu, Simin Chen, Yuzhe Yang, Yijiang Li, Shiyue Hou, Rui Jing, Zehua Wang, Wei Chen, Zijian Tian

Abstract:In recent years, large language models (LLMs) have significantly advanced the field of natural language processing (NLP). By fine-tuning LLMs with data from specific scenarios, these foundation models can better adapt to various downstream tasks. However, the fine-tuning process poses privacy leakage risks, particularly in centralized data processing scenarios. To address user privacy concerns, federated learning (FL) has been introduced to mitigate the risks associated with centralized data collection from multiple sources. Nevertheless, the privacy of LLMs themselves is equally critical, as potential malicious attacks challenge their security, an issue that has received limited attention in current research. Consequently, establishing a trusted multi-party model fine-tuning environment is essential. Additionally, the local deployment of large LLMs incurs significant storage costs and high computational demands. To address these challenges, we propose for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models. In the client optimization phase, we adopt a token-level discrete prompt optimization method that leverages a feedback loop based on prediction accuracy to drive gradient-free prompt optimization through the MLM API. For server optimization, we employ an attention mechanism based on semantic similarity to filter all local prompt tokens, along with an embedding distance elbow detection and DBSCAN clustering strategy to enhance the filtering process. Experimental results demonstrate that, compared to state-of-the-art methods, our approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting. Moreover, the optimized prompts are transferable.

Via

Access Paper or Ask Questions

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Oct 31, 2024

Jianqun Zhou, Yuanlei Zheng, Wei Chen, Qianqian Zheng, Zeyuan Shang, Wei Zhang, Rui Meng, Xiaoyu Shen

Figure 1 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 2 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 3 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 4 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Abstract:Instruction-following capabilities in large language models (LLMs) have significantly progressed, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes. Moreover, although instruction fine-tuning and increased model size lead to better performance, most models fall short of achieving comprehensive instruction compliance as assessed by our benchmark.

Via

Access Paper or Ask Questions

RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Oct 31, 2024

Zhenbiao Cao, Yuanlei Zheng, Zhihao Fan, Xiaojin Zhang, Wei Chen

Figure 1 for RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Figure 2 for RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Figure 3 for RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Figure 4 for RSL-SQL: Robust Schema Linking in Text-to-SQL Generation

Abstract:Text-to-SQL generation aims to translate natural language questions into SQL statements. In large language models (LLMs) based Text-to-SQL, schema linking is a widely adopted strategy to streamline the input for LLMs by selecting only relevant schema elements, therefore reducing noise and computational overhead. However, schema linking faces risks that requires caution, including the potential omission of necessary elements and disruption of database structural integrity. To address these challenges, we propose a novel framework called RSL-SQL that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. Our approach improves the recall of schema linking through forward and backward pruning and hedges the risk by voting between full schema and contextual information augmented simplified schema. Experiments on the BIRD and Spider benchmarks demonstrate that our approach achieves state-of-the-art execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on Spider using GPT-4o. Furthermore, our approach outperforms a series of GPT-4 based Text-to-SQL systems when adopting DeepSeek (much cheaper) with same intact prompts. Extensive analysis and ablation studies confirm the effectiveness of each component in our framework. The codes are available at https://github.com/Laqcce-cao/RSL-SQL.

Via

Access Paper or Ask Questions

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Oct 30, 2024

Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng

Abstract:Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).

* accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment

Oct 29, 2024

Jiaqi Wu, Simin Chen, Zehua Wang, Wei Chen, Zijian Tian, F. Richard Yu, Victor C. M. Leung

Abstract:As the volume of image data grows, data-oriented cloud computing in Internet of Video Things (IoVT) systems encounters latency issues. Task-oriented edge computing addresses this by shifting data analysis to the edge. However, limited computational power of edge devices poses challenges for executing visual tasks. Existing methods struggle to balance high model performance with low resource consumption; lightweight neural networks often underperform, while device-specific models designed by Neural Architecture Search (NAS) fail to adapt to heterogeneous devices. For these issues, we propose a novel co-design framework to optimize neural network architecture and deployment strategies during inference for high-throughput. Specifically, it implements a dynamic model structure based on re-parameterization, coupled with a Roofline-based model partitioning strategy to enhance the computational performance of edge devices. We also employ a multi-objective co-optimization approach to balance throughput and accuracy. Additionally, we derive mathematical consistency and convergence of partitioned models. Experimental results demonstrate significant improvements in throughput (12.05\% on MNIST, 18.83\% on ImageNet) and superior classification accuracy compared to baseline algorithms. Our method consistently achieves stable performance across different devices, underscoring its adaptability. Simulated experiments further confirm its efficacy in high-accuracy, real-time detection for small objects in IoVT systems.

Via

Access Paper or Ask Questions

KANsformer for Scalable Beamforming

Oct 28, 2024

Xinke Xie, Yang Lu, Chong-Yung Chi, Wei Chen, Bo Ai, Dusit Niyato

Figure 1 for KANsformer for Scalable Beamforming

Figure 2 for KANsformer for Scalable Beamforming

Figure 3 for KANsformer for Scalable Beamforming

Figure 4 for KANsformer for Scalable Beamforming

Abstract:This paper proposes an unsupervised deep-learning (DL) approach by integrating transformer and Kolmogorov-Arnold networks (KAN) termed KANsformer to realize scalable beamforming for mobile communication systems. Specifically, we consider a classic multi-input-single-output energy efficiency maximization problem subject to the total power budget. The proposed KANsformer first extracts hidden features via a multi-head self-attention mechanism and then reads out the desired beamforming design via KAN. Numerical results are provided to evaluate the KANsformer in terms of generalization performance, transfer learning and ablation experiment. Overall, the KANsformer outperforms existing benchmark DL approaches, and is adaptable to the change in the number of mobile users with real-time and near-optimal inference.

Via

Access Paper or Ask Questions

FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Oct 23, 2024

Wei Chen, Meng Yuan, Zhao Zhang, Ruobing Xie, Fuzhen Zhuang, Deqing Wang, Rui Liu

Figure 1 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 2 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 3 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Figure 4 for FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning

Abstract:As trustworthy AI continues to advance, the fairness issue in recommendations has received increasing attention. A recommender system is considered unfair when it produces unequal outcomes for different user groups based on user-sensitive attributes (e.g., age, gender). Some researchers have proposed data augmentation-based methods aiming at alleviating user-level unfairness by altering the skewed distribution of training data among various user groups. Despite yielding promising results, they often rely on fairness-related assumptions that may not align with reality, potentially reducing the data quality and negatively affecting model effectiveness. To tackle this issue, in this paper, we study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework aiming at improving fairness in recommender system. First, FairDgcl develops an adversarial contrastive network with a view generator and a view discriminator to learn generating fair augmentation strategies in an adversarial style. Then, we propose two dynamic, learnable models to generate contrastive views within contrastive learning framework, which automatically fine-tune the augmentation strategies. Meanwhile, we theoretically show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy. Lastly, comprehensive experiments conducted on four real-world datasets demonstrate the effectiveness of the proposed FairDgcl.

* 12 pages, submitted to TKDE

Via

Access Paper or Ask Questions

Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Oct 16, 2024

Wei Chen, Yuxuan Liang

Figure 1 for Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Figure 2 for Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Figure 3 for Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Figure 4 for Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Abstract:The widespread deployment of sensing devices leads to a surge in data for spatio-temporal forecasting applications such as traffic flow, air quality, and wind energy. Although spatio-temporal graph neural networks have achieved success in modeling various static spatio-temporal forecasting scenarios, real-world spatio-temporal data are typically received in a streaming manner, and the network continuously expands with the installation of new sensors. Thus, spatio-temporal forecasting in streaming scenarios faces dual challenges: the inefficiency of retraining models over newly arrived data and the detrimental effects of catastrophic forgetting over long-term history. To address these challenges, we propose a novel prompt tuning-based continuous forecasting method, following two fundamental tuning principles guided by empirical and theoretical analysis: expand and compress, which effectively resolve the aforementioned problems with lightweight tuning parameters. Specifically, we integrate the base spatio-temporal graph neural network with a continuous prompt pool, utilizing stored prompts (i.e., few learnable parameters) in memory, and jointly optimize them with the base spatio-temporal graph neural network. This method ensures that the model sequentially learns from the spatio-temporal data stream to accomplish tasks for corresponding periods. Extensive experimental results on multiple real-world datasets demonstrate the multi-faceted superiority of our method over the state-of-the-art baselines, including effectiveness, efficiency, universality, etc.

Via

Access Paper or Ask Questions

Learning to Customize Text-to-Image Diffusion In Diverse Context

Oct 14, 2024

Taewook Kim, Wei Chen, Qiang Qiu

Figure 1 for Learning to Customize Text-to-Image Diffusion In Diverse Context

Figure 2 for Learning to Customize Text-to-Image Diffusion In Diverse Context

Figure 3 for Learning to Customize Text-to-Image Diffusion In Diverse Context

Figure 4 for Learning to Customize Text-to-Image Diffusion In Diverse Context

Abstract:Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to generalize to new contexts in future text prompts. Existing customization methods are built on the success of effectively representing personal concepts as textual embeddings. Thus, in this work, we resort to diversifying the context of these personal concepts \emph{solely} within the textual space by simply creating a contextually rich set of text prompts, together with a widely used self-supervised learning objective. Surprisingly, this straightforward and cost-effective method significantly improves semantic alignment in the textual space, and this effect further extends to the image space, resulting in higher prompt fidelity for generated images. Additionally, our approach does not require any architectural modifications, making it highly compatible with existing text-to-image customization methods. We demonstrate the broad applicability of our approach by combining it with four different baseline methods, achieving notable CLIP score improvements.

Via

Access Paper or Ask Questions

ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Oct 13, 2024

Zhiguang Zhou, Haoxuan Wang, Zhengqing Zhao, Fengling Zheng, Yongheng Wang, Wei Chen, Yong Wang

Figure 1 for ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Figure 2 for ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Figure 3 for ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Figure 4 for ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Abstract:Chart images, such as bar charts, pie charts, and line charts, are explosively produced due to the wide usage of data visualizations. Accordingly, knowledge mining from chart images is becoming increasingly important, which can benefit downstream tasks like chart retrieval and knowledge graph completion. However, existing methods for chart knowledge mining mainly focus on converting chart images into raw data and often ignore their visual encodings and semantic meanings, which can result in information loss for many downstream tasks. In this paper, we propose ChartKG, a novel knowledge graph (KG) based representation for chart images, which can model the visual elements in a chart image and semantic relations among them including visual encodings and visual insights in a unified manner. Further, we develop a general framework to convert chart images to the proposed KG-based representation. It integrates a series of image processing techniques to identify visual elements and relations, e.g., CNNs to classify charts, yolov5 and optical character recognition to parse charts, and rule-based methods to construct graphs. We present four cases to illustrate how our knowledge-graph-based representation can model the detailed visual elements and semantic relations in charts, and further demonstrate how our approach can benefit downstream applications such as semantic-aware chart retrieval and chart question answering. We also conduct quantitative evaluations to assess the two fundamental building blocks of our chart-to-KG framework, i.e., object recognition and optical character recognition. The results provide support for the usefulness and effectiveness of ChartKG.

Via

Access Paper or Ask Questions