Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shu Wu

Open Problems and a Hypothetical Path Forward in LLM Knowledge Paradigms

Apr 09, 2025

Xiaotian Ye, Mengqi Zhang, Shu Wu

Abstract:Knowledge is fundamental to the overall capabilities of Large Language Models (LLMs). The knowledge paradigm of a model, which dictates how it encodes and utilizes knowledge, significantly affects its performance. Despite the continuous development of LLMs under existing knowledge paradigms, issues within these frameworks continue to constrain model potential. This blog post highlight three critical open problems limiting model capabilities: (1) challenges in knowledge updating for LLMs, (2) the failure of reverse knowledge generalization (the reversal curse), and (3) conflicts in internal knowledge. We review recent progress made in addressing these issues and discuss potential general solutions. Based on observations in these areas, we propose a hypothetical paradigm based on Contextual Knowledge Scaling, and further outline implementation pathways that remain feasible within contemporary techniques. Evidence suggests this approach holds potential to address current shortcomings, serving as our vision for future model paradigms. This blog post aims to provide researchers with a brief overview of progress in LLM knowledge systems, while provide inspiration for the development of next-generation model architectures.

* Blog post preprint, work in progress

Via

Access Paper or Ask Questions

Personalized Text Generation with Contrastive Activation Steering

Mar 07, 2025

Jinghao Zhang, Yuting Liu, Wenjie Wang, Qiang Liu, Shu Wu, Liang Wang, Tat-Seng Chua

Figure 1 for Personalized Text Generation with Contrastive Activation Steering

Figure 2 for Personalized Text Generation with Contrastive Activation Steering

Figure 3 for Personalized Text Generation with Contrastive Activation Steering

Figure 4 for Personalized Text Generation with Contrastive Activation Steering

Abstract:Personalized text generation aims to infer users' writing style preferences from their historical texts and generate outputs that faithfully reflect these stylistic characteristics. Existing solutions primarily adopt two paradigms: retrieval-augmented generation (RAG) and parameter-efficient fine-tuning (PEFT). While these approaches have advanced the field, they suffer from two critical limitations: (1) the entanglement of content semantics and stylistic patterns in historical texts impedes accurate modeling of user-specific writing preferences; and (2) scalability challenges arising from both RAG's inference latency by retrieval operations and PEFT's parameter storage requirements for per user model. To overcome these limitations, we propose StyleVector, a training-free framework that disentangles and represents personalized writing style as a vector in LLM's activation space, enabling style-steered generation during inference without requiring costly retrieval or parameter storage. Comprehensive experiments demonstrate that our framework achieves a significant 8% relative improvement in personalized generation while reducing storage requirements by 1700 times over PEFT method.

Via

Access Paper or Ask Questions

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Feb 22, 2025

Liang Wang, Shaozhen Liu, Yu Rong, Deli Zhao, Qiang Liu, Shu Wu

Abstract:Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy level structures, which offer a more accurate estimation of molecular energy and can be experimentally measured through energy spectra. In this paper, we propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra), thereby infusing the knowledge of quantum mechanics into the molecular representations. Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules. Evaluations on public benchmarks reveal that our pre-trained representations surpass existing methods in predicting molecular properties and modeling dynamics.

* Accepted by ICLR 2025

Via

Access Paper or Ask Questions

STRIVE: Structured Reasoning for Self-Improvement in Claim Verification

Feb 17, 2025

Haisong Gong, Jing Li, Junfei Wu, Qiang Liu, Shu Wu, Liang Wang

Abstract:Claim verification is the task of determining whether a claim is supported or refuted by evidence. Self-improvement methods, where reasoning chains are generated and those leading to correct results are selected for training, have succeeded in tasks like mathematical problem solving. However, in claim verification, this approach struggles. Low-quality reasoning chains may falsely match binary truth labels, introducing faulty reasoning into the self-improvement process and ultimately degrading performance. To address this, we propose STRIVE: Structured Reasoning for Self-Improved Verification. Our method introduces a structured reasoning design with Claim Decomposition, Entity Analysis, and Evidence Grounding Verification. These components improve reasoning quality, reduce errors, and provide additional supervision signals for self-improvement. STRIVE begins with a warm-up phase, where the base model is fine-tuned on a small number of annotated examples to learn the structured reasoning design. It is then applied to generate reasoning chains for all training examples, selecting only those that are correct and structurally sound for subsequent self-improvement training. We demonstrate that STRIVE achieves significant improvements over baseline models, with a 31.4% performance gain over the base model and 20.7% over Chain of Thought on the HOVER datasets, highlighting its effectiveness.

Via

Access Paper or Ask Questions

Diffusion Models for Molecules: A Survey of Methods and Tasks

Feb 13, 2025

Liang Wang, Chao Song, Zhiyuan Liu, Yu Rong, Qiang Liu, Shu Wu

Figure 1 for Diffusion Models for Molecules: A Survey of Methods and Tasks

Figure 2 for Diffusion Models for Molecules: A Survey of Methods and Tasks

Figure 3 for Diffusion Models for Molecules: A Survey of Methods and Tasks

Figure 4 for Diffusion Models for Molecules: A Survey of Methods and Tasks

Abstract:Generative tasks about molecules, including but not limited to molecule generation, are crucial for drug discovery and material design, and have consistently attracted significant attention. In recent years, diffusion models have emerged as an impressive class of deep generative models, sparking extensive research and leading to numerous studies on their application to molecular generative tasks. Despite the proliferation of related work, there remains a notable lack of up-to-date and systematic surveys in this area. Particularly, due to the diversity of diffusion model formulations, molecular data modalities, and generative task types, the research landscape is challenging to navigate, hindering understanding and limiting the area's growth. To address this, this paper conducts a comprehensive survey of diffusion model-based molecular generative methods. We systematically review the research from the perspectives of methodological formulations, data modalities, and task types, offering a novel taxonomy. This survey aims to facilitate understanding and further flourishing development in this area. The relevant papers are summarized at: https://github.com/AzureLeon1/awesome-molecular-diffusion-models.

Via

Access Paper or Ask Questions

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

Jan 17, 2025

Qiang Liu, Xinlong Chen, Yue Ding, Shizhen Xu, Shu Wu, Liang Wang

Abstract:Hallucination has emerged as a significant barrier to the effective application of Large Language Models (LLMs). In this work, we introduce a novel Attention-Guided SElf-Reflection (AGSER) approach for zero-shot hallucination detection in LLMs. The AGSER method utilizes attention contributions to categorize the input query into attentive and non-attentive queries. Each query is then processed separately through the LLMs, allowing us to compute consistency scores between the generated responses and the original answer. The difference between the two consistency scores serves as a hallucination estimator. In addition to its efficacy in detecting hallucinations, AGSER notably reduces computational complexity, requiring only three passes through the LLM and utilizing two sets of tokens. We have conducted extensive experiments with four widely-used LLMs across three different hallucination benchmarks, demonstrating that our approach significantly outperforms existing methods in zero-shot hallucination detection.

Via

Access Paper or Ask Questions

Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate

Dec 06, 2024

Mingqing Zhang, Haisong Gong, Qiang Liu, Shu Wu, Liang Wang

Figure 1 for Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate

Figure 2 for Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate

Figure 3 for Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate

Figure 4 for Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate

Abstract:The rapid spread of rumors on social media platforms during breaking events severely hinders the dissemination of the truth. Previous studies reveal that the lack of annotated resources hinders the direct detection of unforeseen breaking events not covered in yesterday's news. Leveraging large language models (LLMs) for rumor detection holds significant promise. However, it is challenging for LLMs to provide comprehensive responses to complex or controversial issues due to limited diversity. In this work, we propose the Stance Separated Multi-Agent Debate (S2MAD) to address this issue. Specifically, we firstly introduce Stance Separation, categorizing comments as either supporting or opposing the original claim. Subsequently, claims are classified as subjective or objective, enabling agents to generate reasonable initial viewpoints with different prompt strategies for each type of claim. Debaters then follow specific instructions through multiple rounds of debate to reach a consensus. If a consensus is not reached, a judge agent evaluates the opinions and delivers a final verdict on the claim's veracity. Extensive experiments conducted on two real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods in terms of performance and effectively improves the performance of LLMs in breaking event rumor detection.

Via

Access Paper or Ask Questions

GOT4Rec: Graph of Thoughts for Sequential Recommendation

Nov 22, 2024

Zewen Long, Liang Wang, Shu Wu, Qiang Liu

Figure 1 for GOT4Rec: Graph of Thoughts for Sequential Recommendation

Figure 2 for GOT4Rec: Graph of Thoughts for Sequential Recommendation

Figure 3 for GOT4Rec: Graph of Thoughts for Sequential Recommendation

Figure 4 for GOT4Rec: Graph of Thoughts for Sequential Recommendation

Abstract:With the advancement of large language models (LLMs), researchers have explored various methods to optimally leverage their comprehension and generation capabilities in sequential recommendation scenarios. However, several challenges persist in this endeavor. Firstly, most existing approaches rely on the input-output prompting paradigm, which can result in irrelevant or inaccurate responses. Secondly, while there have been attempts to enhance LLMs using prompting strategies such as chain-of-thought (CoT), these efforts have not fully harnessed the reasoning abilities of LLMs or effectively captured the multifaceted information contained within user sequences. To address these limitations, we propose GOT4Rec, a sequential recommendation method that utilizes the graph of thoughts (GoT) prompting strategy. Specifically, we identify and utilize three key types of information within user history sequences: short-term interests, long-term interests and collaborative information from other users. Our approach enables LLMs to independently reason and generate recommendations based on these distinct types of information, subsequently aggregating the results within the GoT framework to derive the final recommended items. This method allows LLMs, with enhanced reasoning capabilities, to more effectively consider the diverse information within user sequences, resulting in more accurate recommendations and more comprehensive explanations. Extensive experiments on real-world datasets demonstrate the effectiveness of GOT4Rec, indicating that it outperforms existing state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/GOT4Rec-ED99.

Via

Access Paper or Ask Questions

Playing Language Game with LLMs Leads to Jailbreaking

Nov 16, 2024

Yu Peng, Zewen Long, Fangming Dong, Congyi Li, Shu Wu, Kai Chen

Figure 1 for Playing Language Game with LLMs Leads to Jailbreaking

Figure 2 for Playing Language Game with LLMs Leads to Jailbreaking

Figure 3 for Playing Language Game with LLMs Leads to Jailbreaking

Figure 4 for Playing Language Game with LLMs Leads to Jailbreaking

Abstract:The advent of large language models (LLMs) has spurred the development of numerous jailbreak techniques aimed at circumventing their security defenses against malicious attacks. An effective jailbreak approach is to identify a domain where safety generalization fails, a phenomenon known as mismatched generalization. In this paper, we introduce two novel jailbreak methods based on mismatched generalization: natural language games and custom language games, both of which effectively bypass the safety mechanisms of LLMs, with various kinds and different variants, making them hard to defend and leading to high attack rates. Natural language games involve the use of synthetic linguistic constructs and the actions intertwined with these constructs, such as the Ubbi Dubbi language. Building on this phenomenon, we propose the custom language games method: by engaging with LLMs using a variety of custom rules, we successfully execute jailbreak attacks across multiple LLM platforms. Extensive experiments demonstrate the effectiveness of our methods, achieving success rates of 93% on GPT-4o, 89% on GPT-4o-mini and 83% on Claude-3.5-Sonnet. Furthermore, to investigate the generalizability of safety alignments, we fine-tuned Llama-3.1-70B with the custom language games to achieve safety alignment within our datasets and found that when interacting through other language games, the fine-tuned models still failed to identify harmful content. This finding indicates that the safety alignment knowledge embedded in LLMs fails to generalize across different linguistic formats, thus opening new avenues for future research in this area.

Via

Access Paper or Ask Questions

Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

Nov 02, 2024

Liang Wang, Qiang Liu, Shaozhen Liu, Xin Sun, Shu Wu

Figure 1 for Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

Figure 2 for Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

Figure 3 for Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

Figure 4 for Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

Abstract:Molecular property prediction (MPP) is integral to drug discovery and material science, but often faces the challenge of data scarcity in real-world scenarios. Addressing this, few-shot molecular property prediction (FSMPP) has been developed. Unlike other few-shot tasks, FSMPP typically employs a pre-trained molecular encoder and a context-aware classifier, benefiting from molecular pre-training and molecular context information. Despite these advancements, existing methods struggle with the ineffective fine-tuning of pre-trained encoders. We attribute this issue to the imbalance between the abundance of tunable parameters and the scarcity of labeled molecules, and the lack of contextual perceptiveness in the encoders. To overcome this hurdle, we propose a parameter-efficient in-context tuning method, named Pin-Tuning. Specifically, we propose a lightweight adapter for pre-trained message passing layers (MP-Adapter) and Bayesian weight consolidation for pre-trained atom/bond embedding layers (Emb-BWC), to achieve parameter-efficient tuning while preventing over-fitting and catastrophic forgetting. Additionally, we enhance the MP-Adapters with contextual perceptiveness. This innovation allows for in-context tuning of the pre-trained encoder, thereby improving its adaptability for specific FSMPP tasks. When evaluated on public datasets, our method demonstrates superior tuning with fewer trainable parameters, improving few-shot predictive performance.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions