Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Xiang

Rethinking Membership Inference Attacks Against Transfer Learning

Jan 20, 2025

Cong Wu, Jing Chen, Qianru Fang, Kun He, Ziming Zhao, Hao Ren, Guowen Xu, Yang Liu, Yang Xiang

Abstract:Transfer learning, successful in knowledge translation across related tasks, faces a substantial privacy threat from membership inference attacks (MIAs). These attacks, despite posing significant risk to ML model's training data, remain limited-explored in transfer learning. The interaction between teacher and student models in transfer learning has not been thoroughly explored in MIAs, potentially resulting in an under-examined aspect of privacy vulnerabilities within transfer learning. In this paper, we propose a new MIA vector against transfer learning, to determine whether a specific data point was used to train the teacher model while only accessing the student model in a white-box setting. Our method delves into the intricate relationship between teacher and student models, analyzing the discrepancies in hidden layer representations between the student model and its shadow counterpart. These identified differences are then adeptly utilized to refine the shadow model's training process and to inform membership inference decisions effectively. Our method, evaluated across four datasets in diverse transfer learning tasks, reveals that even when an attacker only has access to the student model, the teacher model's training data remains susceptible to MIAs. We believe our work unveils the unexplored risk of membership inference in transfer learning.

Via

Access Paper or Ask Questions

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Jan 19, 2025

Jingran Xie, Shun Lei, Yue Yu, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu

Figure 1 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Figure 2 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Figure 3 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Figure 4 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Abstract:Empathetic dialogue is crucial for natural human-computer interaction, allowing the dialogue system to respond in a more personalized and emotionally aware manner, improving user satisfaction and engagement. The emergence of large language models (LLMs) has revolutionized dialogue generation by harnessing their powerful capabilities and shown its potential in multimodal domains. Many studies have integrated speech with text-based LLMs to take speech question as input and output text response. However, the lack of spoken question-answering datasets that include speech style information to supervised fine-tuning (SFT) limits the performance of these systems. As a result, while these systems excel at understanding speech content, they often struggle to generate empathetic responses. In response, we propose a novel approach that circumvents the need for question-answering data, called Listen, Perceive, and Express (LPE). Our method employs a two-stage training process, initially guiding the LLM to listen the content and perceive the emotional aspects of speech. Subsequently, we utilize Chain-of-Thought (CoT) prompting to unlock the model's potential for expressing empathetic responses based on listened spoken content and perceived emotional cues. We employ experiments to prove the effectiveness of proposed method. To our knowledge, this is the first attempt to leverage CoT for speech-based dialogue.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Dec 18, 2024

Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Min Zhang

Figure 1 for Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Figure 2 for Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Figure 3 for Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Figure 4 for Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across diverse tasks. Despite great success, recent studies show that LVLMs encounter substantial limitations when engaging with visual graphs. To study the reason behind these limitations, we propose VGCure, a comprehensive benchmark covering 22 tasks for examining the fundamental graph understanding and reasoning capacities of LVLMs. Extensive evaluations conducted on 14 LVLMs reveal that LVLMs are weak in basic graph understanding and reasoning tasks, particularly those concerning relational or structurally complex information. Based on this observation, we propose a structure-aware fine-tuning framework to enhance LVLMs with structure learning abilities through 3 self-supervised learning tasks. Experiments validate the effectiveness of our method in improving LVLMs' zero-shot performance on fundamental graph learning tasks, as well as enhancing the robustness of LVLMs against complex visual graphs.

Via

Access Paper or Ask Questions

LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks

Dec 17, 2024

Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

Abstract:Large language models (LLMs) have demonstrated impressive multilingual understanding and reasoning capabilities, driven by extensive pre-training multilingual corpora and fine-tuning instruction data. However, a performance gap persists between high-resource and low-resource language tasks due to language imbalance in the pre-training corpus, even using more low-resource data during fine-tuning. To alleviate this issue, we propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language tasks. An additional language alignment layer is first integrated into the LLM to adapt a pre-trained multilingual encoder, thereby enhancing multilingual alignment through code-switched fine-tuning. The second stage fine-tunes LLM with English-only instruction data while freezing the language alignment layer, allowing LLM to transfer task-specific capabilities from English to low-resource language tasks. Additionally, we introduce the Multilingual Math World Problem (MMWP) benchmark, which spans 21 low-resource, 17 medium-resource, and 10 high-resource languages, enabling comprehensive evaluation of multilingual reasoning. Experimental results show that LinguaLIFT outperforms several competitive baselines across MMWP and other widely used benchmarks.

Via

Access Paper or Ask Questions

A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Oct 29, 2024

Yang Xiang, Li Fan, Tulika Saha, Yushan Pan, Haiyang Zhang, Chengtao Ji

Figure 1 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 2 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 3 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 4 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Abstract:Graph clustering is an essential aspect of network analysis that involves grouping nodes into separate clusters. Recent developments in deep learning have resulted in advanced deep graph clustering techniques, which have proven effective in many applications. Nonetheless, these methods often encounter difficulties when dealing with the complexities of real-world graphs, particularly in the presence of noisy edges. Additionally, many denoising graph clustering strategies tend to suffer from lower performance compared to their non-denoised counterparts, training instability, and challenges in scaling to large datasets. To tackle these issues, we introduce a new framework called the Dual Adaptive Assignment Approach for Robust Graph-Based Clustering (RDSA). RDSA consists of three key components: (i) a node embedding module that effectively integrates the graph's topological features and node attributes; (ii) a structure-based soft assignment module that improves graph modularity by utilizing an affinity matrix for node assignments; and (iii) a node-based soft assignment module that identifies community landmarks and refines node assignments to enhance the model's robustness. We assess RDSA on various real-world datasets, demonstrating its superior performance relative to existing state-of-the-art methods. Our findings indicate that RDSA provides robust clustering across different graph types, excelling in clustering effectiveness and robustness, including adaptability to noise, stability, and scalability.

Via

Access Paper or Ask Questions

Beware of Calibration Data for Pruning Large Language Models

Oct 23, 2024

Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang

Figure 1 for Beware of Calibration Data for Pruning Large Language Models

Figure 2 for Beware of Calibration Data for Pruning Large Language Models

Figure 3 for Beware of Calibration Data for Pruning Large Language Models

Figure 4 for Beware of Calibration Data for Pruning Large Language Models

Abstract:As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Previous research has primarily focused on designing advanced pruning methods, while different calibration data's impact on pruning performance still lacks systematical exploration. We fill this blank and surprisingly observe that the effects of calibration data even value more than designing advanced pruning strategies, especially for high sparsity. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. We conduct experiments on the recent strong open-source LLMs (e.g., DCLM, and LLaMA-3), and the results show that the proposed method outperforms commonly used calibration data and can effectively enhance strong pruning methods (e.g., Wanda, OWL).

* under review

Via

Access Paper or Ask Questions

LLM-based Translation Inference with Iterative Bilingual Understanding

Oct 17, 2024

Andong Chen, Kehai Chen, Yang Xiang, Xuefeng Bai, Muyun Yang, Tiejun Zhao, Min zhang

Figure 1 for LLM-based Translation Inference with Iterative Bilingual Understanding

Figure 2 for LLM-based Translation Inference with Iterative Bilingual Understanding

Figure 3 for LLM-based Translation Inference with Iterative Bilingual Understanding

Figure 4 for LLM-based Translation Inference with Iterative Bilingual Understanding

Abstract:The remarkable understanding and generation capabilities of large language models (LLMs) have greatly improved translation performance. However, incorrect understanding of the sentence to be translated can degrade translation quality. To address this issue, we proposed a novel Iterative Bilingual Understanding Translation (IBUT) method based on the cross-lingual capabilities of LLMs and the dual characteristics of translation tasks. The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. Furthermore, the dual characteristics allow IBUT to generate effective cross-lingual feedback, iteratively refining contextual understanding, thereby reducing errors and improving translation performance. Experimental results showed that the proposed IBUT outperforms several strong comparison methods, especially being generalized to multiple domains (e.g., news, commonsense, and cultural translation benchmarks).

* Work in progress

Via

Access Paper or Ask Questions

Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Oct 08, 2024

Chuqi Chen, Qixuan Zhou, Yahong Yang, Yang Xiang, Tao Luo

Figure 1 for Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Figure 2 for Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Figure 3 for Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Figure 4 for Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Abstract:Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited accuracy. In this paper, we investigate the training dynamics of neural network-based PDE solvers with a focus on the impact of initialization techniques. We assess training difficulty by analyzing the eigenvalue distribution of the kernel and apply the concept of effective rank to quantify this difficulty, where a larger effective rank correlates with faster convergence of the training error. Building upon this, we discover through theoretical analysis and numerical experiments that two initialization techniques, partition of unity (PoU) and variance scaling (VS), enhance the effective rank, thereby accelerating the convergence of training error. Furthermore, comprehensive experiments using popular PDE-solving frameworks, such as PINN, Deep Ritz, and the operator learning framework DeepOnet, confirm that these initialization techniques consistently speed up convergence, in line with our theoretical findings.

Via

Access Paper or Ask Questions

CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Sep 29, 2024

Xuan Tan, Xun Gong, Yang Xiang

Figure 1 for CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Figure 2 for CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Figure 3 for CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Figure 4 for CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Abstract:Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model's ability to predict the camera from which a pedestrian image originates, thus enhancing the model's capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9\% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6\%. Code will be available at: https://github.com/Trangle12/CCAFL.

* Submitted to IEEE TCSVT

Via

Access Paper or Ask Questions

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Sep 29, 2024

Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

Figure 1 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 2 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 3 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 4 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Abstract:Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .

Via

Access Paper or Ask Questions