Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao Cheng

Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels

Jan 21, 2025

Yaxuan Wang, Hao Cheng, Jing Xiong, Qingsong Wen, Han Jia, Ruixuan Song, Liyuan Zhang, Zhaowei Zhu, Yang Liu

Figure 1 for Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels

Figure 2 for Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels

Figure 3 for Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels

Figure 4 for Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels

Abstract:Detecting anomalies in temporal data has gained significant attention across various real-world applications, aiming to identify unusual events and mitigate potential hazards. In practice, situations often involve a mix of segment-level labels (detected abnormal events with segments of time points) and unlabeled data (undetected events), while the ideal algorithmic outcome should be point-level predictions. Therefore, the huge label information gap between training data and targets makes the task challenging. In this study, we formulate the above imperfect information as noisy labels and propose NRdetector, a noise-resilient framework that incorporates confidence-based sample selection, robust segment-level learning, and data-centric point-level detection for multivariate time series anomaly detection. Particularly, to bridge the information gap between noisy segment-level labels and missing point-level labels, we develop a novel loss function that can effectively mitigate the label noise and consider the temporal features. It encourages the smoothness of consecutive points and the separability of points from segments with different labels. Extensive experiments on real-world multivariate time series datasets with 11 different evaluation metrics demonstrate that NRdetector consistently achieves robust results across multiple real-world datasets, outperforming various baselines adapted to operate in our setting.

* Accepted by 2025 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'25)

Via

Access Paper or Ask Questions

Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering

Dec 27, 2024

Shiwen Ni, Hao Cheng, Min Yang

Figure 1 for Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering

Figure 2 for Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering

Figure 3 for Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering

Abstract:Legal question answering (QA) has attracted increasing attention from people seeking legal advice, which aims to retrieve the most applicable answers from a large-scale database of question-answer pairs. Previous methods mainly use a dual-encoder architecture to learn dense representations of both questions and answers. However, these methods could suffer from lacking domain knowledge and sufficient labeled training data. In this paper, we propose a three-stage (\underline{p}re-training, \underline{f}ine-tuning and \underline{r}e-ranking) framework for \underline{l}egal \underline{QA} (called PFR-LQA), which promotes the fine-grained text representation learning and boosts the performance of dense retrieval with the dual-encoder architecture. Concretely, we first conduct domain-specific pre-training on legal questions and answers through a self-supervised training objective, allowing the pre-trained model to be adapted to the legal domain. Then, we perform task-specific fine-tuning of the dual-encoder on legal question-answer pairs by using the supervised learning objective, leading to a high-quality dual-encoder for the specific downstream QA task. Finally, we employ a contextual re-ranking objective to further refine the output representations of questions produced by the document encoder, which uses contextual similarity to increase the discrepancy between the anchor and hard negative samples for better question re-ranking. We conduct extensive experiments on a manually annotated legal QA dataset. Experimental results show that our PFR-LQA method achieves better performance than the strong competitors for legal question answering.

* ICASSP 2025

Via

Access Paper or Ask Questions

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Dec 27, 2024

Jiang Liu, Bolin Li, Haoyuan Li, Tianwei Lin, Wenqiao Zhang, Tao Zhong, Zhelun Yu, Jinghao Wei, Hao Cheng, Hao Jiang(+4 more)

Abstract:Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices. However, due to data privacy concerns, existing open-source EMLLMs rarely have access to private domain-specific data during the pre-training process, making them difficult to directly apply in device-specific domains, such as certain business scenarios. To address this weakness, this paper focuses on the efficient adaptation of EMLLMs to private domains, specifically in two areas: 1) how to reduce data requirements, and 2) how to avoid parameter fine-tuning. Specifically, we propose a tun\textbf{\underline{I}}ng-free, a\textbf{\underline{D}}aptiv\textbf{\underline{E}}, univers\textbf{\underline{AL}} \textbf{\underline{Prompt}} Optimization Framework, abbreviated as \textit{\textbf{\ourmethod{}}} which consists of two stages: 1) Predefined Prompt, based on the reinforcement searching strategy, generate a prompt optimization strategy tree to acquire optimization priors; 2) Prompt Reflection initializes the prompt based on optimization priors, followed by self-reflection to further search and refine the prompt. By doing so, \ourmethod{} elegantly generates the ``ideal prompts'' for processing private domain-specific data. Note that our method requires no parameter fine-tuning and only a small amount of data to quickly adapt to the data distribution of private data. Extensive experiments across multiple tasks demonstrate that our proposed \ourmethod{} significantly improves both efficiency and performance compared to baselines.

Via

Access Paper or Ask Questions

Uncovering Vision Modality Threats in Image-to-Image Tasks

Dec 07, 2024

Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu

Figure 1 for Uncovering Vision Modality Threats in Image-to-Image Tasks

Figure 2 for Uncovering Vision Modality Threats in Image-to-Image Tasks

Figure 3 for Uncovering Vision Modality Threats in Image-to-Image Tasks

Figure 4 for Uncovering Vision Modality Threats in Image-to-Image Tasks

Abstract:Current image generation models can effortlessly produce high-quality, highly realistic images, but this also increases the risk of misuse. In various Text-to-Image or Image-to-Image tasks, attackers can generate a series of images containing inappropriate content by simply editing the language modality input. Currently, to prevent this security threat, the various guard or defense methods that are proposed also focus on defending the language modality. However, in practical applications, threats in the visual modality, particularly in tasks involving the editing of real-world images, pose greater security risks as they can easily infringe upon the rights of the image owner. Therefore, this paper uses a method named typographic attack to reveal that various image generation models also commonly face threats in the vision modality. Furthermore, we also evaluate the defense performance of various existing methods when facing threats in the vision modality and uncover their ineffectiveness. Finally, we propose the Vision Modal Threats in Image Generation Models (VMT-IGMs) dataset, which would serve as a baseline for evaluating the vision modality vulnerability of various image generation models.

Via

Access Paper or Ask Questions

Reassessing Layer Pruning in LLMs: New Insights and Methods

Nov 23, 2024

Yao Lu, Hao Cheng, Yujie Fang, Zeyu Wang, Jiaheng Wei, Dongwei Xu, Qi Xuan, Xiaoniu Yang, Zhaowei Zhu

Figure 1 for Reassessing Layer Pruning in LLMs: New Insights and Methods

Figure 2 for Reassessing Layer Pruning in LLMs: New Insights and Methods

Figure 3 for Reassessing Layer Pruning in LLMs: New Insights and Methods

Figure 4 for Reassessing Layer Pruning in LLMs: New Insights and Methods

Abstract:Although large language models (LLMs) have achieved remarkable success across various domains, their considerable scale necessitates substantial computational resources, posing significant challenges for deployment in resource-constrained environments. Layer pruning, as a simple yet effective compression method, removes layers of a model directly, reducing computational overhead. However, what are the best practices for layer pruning in LLMs? Are sophisticated layer selection metrics truly effective? Does the LoRA (Low-Rank Approximation) family, widely regarded as a leading method for pruned model fine-tuning, truly meet expectations when applied to post-pruning fine-tuning? To answer these questions, we dedicate thousands of GPU hours to benchmarking layer pruning in LLMs and gaining insights across multiple dimensions. Our results demonstrate that a simple approach, i.e., pruning the final 25\% of layers followed by fine-tuning the \texttt{lm\_head} and the remaining last three layer, yields remarkably strong performance. Following this guide, we prune Llama-3.1-8B-It and obtain a model that outperforms many popular LLMs of similar size, such as ChatGLM2-6B, Vicuna-7B-v1.5, Qwen1.5-7B and Baichuan2-7B. We release the optimal model weights on Huggingface, and the code is available on GitHub.

Via

Access Paper or Ask Questions

Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Nov 15, 2024

Xiaofeng Zhang, Yihao Quan, Chaochen Gu, Chen Shen, Xiaosong Yuan, Shaotian Yan, Hao Cheng, Kaijie Wu, Jieping Ye

Figure 1 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Figure 2 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Figure 3 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Figure 4 for Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Abstract:The hallucination problem in multimodal large language models (MLLMs) remains a common issue. Although image tokens occupy a majority of the input sequence of MLLMs, there is limited research to explore the relationship between image tokens and hallucinations. In this paper, we analyze the distribution of attention scores for image tokens across each layer and head of the model, revealing an intriguing and common phenomenon: most hallucinations are closely linked to the pattern of attention sinks in the self-attention matrix of image tokens, where shallow layers exhibit dense attention sinks and deeper layers show sparse attention sinks. We further analyze the attention heads of different layers and find that heads with high-density attention sink in the image part play a positive role in alleviating hallucinations. In this paper, we propose a training-free method named \textcolor{red}{\textbf{E}}nhancing \textcolor{red}{\textbf{A}}ttention \textcolor{red}{\textbf{H}}eads (EAH), an approach designed to enhance the convergence of image tokens attention sinks in the shallow layers. EAH identifies the attention head that shows the vision sink in a shallow layer and extracts its attention matrix. This attention map is then broadcast to other heads in the layer, thereby strengthening the layer to pay more attention to the image itself. With extensive experiments, EAH shows significant hallucination-mitigating performance on different MLLMs and metrics, proving its effectiveness and generality.

Via

Access Paper or Ask Questions

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Nov 08, 2024

Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng

Figure 1 for Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Figure 2 for Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Figure 3 for Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Figure 4 for Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Abstract:Large language models (LMs) are typically adapted to improve performance on new contexts (\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce $GenerativeAdapter$, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning. The adapter generator is trained via self-supervised learning, and can be used to adapt a single frozen LM for any new task simply by mapping the associated task or domain context to a new adapter. We apply $GenerativeAdapter$ to two pretrained LMs (Mistral-7B-Instruct and Llama2-7B-Chat) and evaluate the adapted models in three adaption scenarios: knowledge acquisition from documents, learning from demonstrations, and personalization for users. In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63.5% improvement in F1 score over the model with supervised fine-tuning (from $19.5$ to $31.5$) for contexts as long as 32K tokens. In the MetaICL in-context learning evaluation, our method achieves an average accuracy of $44.9$ across 26 tasks, outperforming the base model. On MSC, our method proves to be highly competitive in memorizing user information from conversations with a 4x reduction in computation and memory costs compared to prompting with full conversation history. Together, these results suggest that $GenerativeAdapter$ should allow for general adaption to a wide range of different contexts.

Via

Access Paper or Ask Questions

Explainable few-shot learning workflow for detecting invasive and exotic tree species

Nov 01, 2024

Caroline M. Gevaert, Alexandra Aguiar Pedro, Ou Ku, Hao Cheng, Pranav Chandramouli, Farzaneh Dadrass Javan, Francesco Nattino, Sonja Georgievska

Figure 1 for Explainable few-shot learning workflow for detecting invasive and exotic tree species

Figure 2 for Explainable few-shot learning workflow for detecting invasive and exotic tree species

Figure 3 for Explainable few-shot learning workflow for detecting invasive and exotic tree species

Figure 4 for Explainable few-shot learning workflow for detecting invasive and exotic tree species

Abstract:Deep Learning methods are notorious for relying on extensive labeled datasets to train and assess their performance. This can cause difficulties in practical situations where models should be trained for new applications for which very little data is available. While few-shot learning algorithms can address the first problem, they still lack sufficient explanations for the results. This research presents a workflow that tackles both challenges by proposing an explainable few-shot learning workflow for detecting invasive and exotic tree species in the Atlantic Forest of Brazil using Unmanned Aerial Vehicle (UAV) images. By integrating a Siamese network with explainable AI (XAI), the workflow enables the classification of tree species with minimal labeled data while providing visual, case-based explanations for the predictions. Results demonstrate the effectiveness of the proposed workflow in identifying new tree species, even in data-scarce conditions. With a lightweight backbone, e.g., MobileNet, it achieves a F1-score of 0.86 in 3-shot learning, outperforming a shallow CNN. A set of explanation metrics, i.e., correctness, continuity, and contrastivity, accompanied by visual cases, provide further insights about the prediction results. This approach opens new avenues for using AI and UAVs in forest management and biodiversity conservation, particularly concerning rare or under-studied species.

Via

Access Paper or Ask Questions

ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Oct 24, 2024

Xiaodong Yu, Ben Zhou, Hao Cheng, Dan Roth

Figure 1 for ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Figure 2 for ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Abstract:Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. However, the former approach fails to surface model's uses of shortcuts and wrong reasoning while the later poses challenges in accommodating alternative solutions. In this work, we seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program. We begin by extracting programs for popular math datasets (GSM8K and MATH) using GPT4-o. For those executable programs verified using the original input-output pairs, they are found to encapsulate the proper reasoning required to solve the original text questions. We then prompt GPT4-o to generate new questions using alternative input-output pairs based the extracted program. We apply the resulting datasets to evaluate a collection of LLMs. In our experiments, we observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.

Via

Access Paper or Ask Questions

Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Oct 24, 2024

Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao

Figure 1 for Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Figure 2 for Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Figure 3 for Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Figure 4 for Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Abstract:Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses. Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100\% ASR on various open-source LLMs. Moreover, it exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3. Beyond improving jailbreak ability, ADV-LLM provides valuable insights for future safety alignment research through its ability to generate large datasets for studying LLM safety. Our code is available at: https://github.com/SunChungEn/ADV-LLM

* 18 pages

Via

Access Paper or Ask Questions