Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huajun Chen

Zhejiang University

LightThinker: Thinking Step-by-Step Compression

Feb 21, 2025

Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

Abstract:Large language models (LLMs) have shown remarkable performance in complex reasoning tasks, but their efficiency is hindered by the substantial memory and computational costs associated with generating lengthy tokens. In this paper, we propose LightThinker, a novel method that enables LLMs to dynamically compress intermediate thoughts during reasoning. Inspired by human cognitive processes, LightThinker compresses verbose thought steps into compact representations and discards the original reasoning chains, thereby significantly reducing the number of tokens stored in the context window. This is achieved by training the model on when and how to perform compression through data construction, mapping hidden states to condensed gist tokens, and creating specialized attention masks. Additionally, we introduce the Dependency (Dep) metric to quantify the degree of compression by measuring the reliance on historical tokens during generation. Extensive experiments on four datasets and two models show that LightThinker reduces peak memory usage and inference time, while maintaining competitive accuracy. Our work provides a new direction for improving the efficiency of LLMs in complex reasoning tasks without sacrificing performance. Code will be released at https://github.com/zjunlp/LightThinker.

Via

Access Paper or Ask Questions

K-ON: Stacking Knowledge On the Head Layer of Large Language Model

Feb 10, 2025

Lingbing Guo, Yichi Zhang, Zhongpu Bo, Zhuo Chen, Mengshu Sun, Zhiqiang Zhang, Wen Zhang, Huajun Chen

Figure 1 for K-ON: Stacking Knowledge On the Head Layer of Large Language Model

Figure 2 for K-ON: Stacking Knowledge On the Head Layer of Large Language Model

Figure 3 for K-ON: Stacking Knowledge On the Head Layer of Large Language Model

Figure 4 for K-ON: Stacking Knowledge On the Head Layer of Large Language Model

Abstract:Recent advancements in large language models (LLMs) have significantly improved various natural language processing (NLP) tasks. Typically, LLMs are trained to predict the next token, aligning well with many NLP tasks. However, in knowledge graph (KG) scenarios, entities are the fundamental units and identifying an entity requires at least several tokens. This leads to a granularity mismatch between KGs and natural languages. To address this issue, we propose K-ON, which integrates KG knowledge into the LLM by employing multiple head layers for next k-step prediction. K-ON can not only generate entity-level results in one step, but also enables contrastive loss against entities, which is the most powerful tool in KG representation learning. Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities.

* AAAI 2025 (Oral)

Via

Access Paper or Ask Questions

OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Feb 08, 2025

Zhiqiang Liu, Chengtao Gan, Junjie Wang, Yichi Zhang, Zhongpu Bo, Mengshu Sun, Huajun Chen, Wen Zhang

Figure 1 for OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Figure 2 for OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Figure 3 for OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Figure 4 for OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Abstract:Existing domain-specific Large Language Models (LLMs) are typically developed by fine-tuning general-purposed LLMs with large-scale domain-specific corpora. However, training on large-scale corpora often fails to effectively organize domain knowledge of LLMs, leading to fragmented understanding. Inspired by how humans connect concepts and organize knowledge through mind maps, we aim to emulate this approach by using ontology with hierarchical conceptual knowledge to reorganize LLM's domain knowledge. From this perspective, we propose an ontology-driven self-training framework called OntoTune, which aims to align LLMs with ontology through in-context learning, enabling the generation of responses guided by the ontology. We leverage in-context learning to identify whether the LLM has acquired the specific concept's ontology knowledge, and select the entries not yet mastered by LLM as the training set to further align the LLM with ontology. Compared to existing domain LLMs based on newly collected large-scale domain-specific corpora, our OntoTune, which relies on the existing, long-term developed ontology and LLM itself, significantly reduces data maintenance costs and offers improved generalization ability. We conduct our study in the medical domain to evaluate the effectiveness of OntoTune, utilizing a standardized medical ontology, SNOMED CT as our ontology source. Experimental results demonstrate that OntoTune achieves state-of-the-art performance in both in-ontology task hypernym discovery and out-of-ontology task medical domain QA. Moreover, compared to the latest direct ontology injection method TaxoLLaMA, our OntoTune better preserves original knowledge of LLM. The code and data are available at https://github.com/zjukg/OntoTune.

* Accepted by WWW25

Via

Access Paper or Ask Questions

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Jan 16, 2025

Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Ningyu Zhang, Jiang Yong, Pengjun Xie, Fei Huang, Huajun Chen

Abstract:Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, utility, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, repetitive, and unoriginal outputs. To address these issues, we propose OmniThink, a machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they progressively deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles.

Via

Access Paper or Ask Questions

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Jan 15, 2025

Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen

Figure 1 for A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Figure 2 for A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Figure 3 for A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Figure 4 for A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Abstract:Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often inefficient and unintuitive, posing challenges for researchers. To address these limitations, we present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. We construct a comprehensive multi-modal instruction dataset that pairs text-based instructions with scRNA-seq profiles from diverse tissues and species. Building on this, we develop a multi-modal cell language architecture capable of simultaneously interpreting and processing both modalities. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands. Extensive evaluations demonstrate that InstructCell consistently meets or exceeds the performance of existing single-cell foundation models, while adapting to diverse experimental conditions. More importantly, InstructCell provides an accessible and intuitive tool for exploring complex single-cell data, lowering technical barriers and enabling deeper biological insights.

* 37 pages; 13 figures; Code: https://github.com/zjunlp/Instructcell, Models: https://huggingface.co/zjunlp/Instructcell-chat, https://huggingface.co/zjunlp/InstructCell-instruct

Via

Access Paper or Ask Questions

Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Dec 31, 2024

Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Shaokai Chen, Mengshu Sun, Binbin Hu, Zhiqiang Zhang, Lei Liang, Wen Zhang(+1 more)

Figure 1 for Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Figure 2 for Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Figure 3 for Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Figure 4 for Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Abstract:Large language models (LLMs) have demonstrated exceptional performance in text generation within current NLP research. However, the lack of factual accuracy is still a dark cloud hanging over the LLM skyscraper. Structural knowledge prompting (SKP) is a prominent paradigm to integrate external knowledge into LLMs by incorporating structural representations, achieving state-of-the-art results in many knowledge-intensive tasks. However, existing methods often focus on specific problems, lacking a comprehensive exploration of the generalization and capability boundaries of SKP. This paper aims to evaluate and rethink the generalization capability of the SKP paradigm from four perspectives including Granularity, Transferability, Scalability, and Universality. To provide a thorough evaluation, we introduce a novel multi-granular, multi-level benchmark called SUBARU, consisting of 9 different tasks with varying levels of granularity and difficulty.

* Work in progress

Via

Access Paper or Ask Questions

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Dec 28, 2024

Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei(+3 more)

Figure 1 for OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Figure 2 for OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Figure 3 for OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Abstract:We introduce OneKE, a dockerized schema-guided knowledge extraction system, which can extract knowledge from the Web and raw PDF Books, and support various domains (science, news, etc.). Specifically, we design OneKE with multiple agents and a configure knowledge base. Different agents perform their respective roles, enabling support for various extraction scenarios. The configure knowledge base facilitates schema configuration, error case debugging and correction, further improving the performance. Empirical evaluations on benchmark datasets demonstrate OneKE's efficacy, while case studies further elucidate its adaptability to diverse tasks across multiple domains, highlighting its potential for broad applications. We have open-sourced the Code at https://github.com/zjunlp/OneKE and released a Video at http://oneke.openkg.cn/demo.mp4.

* Work in progress

Via

Access Paper or Ask Questions

UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Nov 11, 2024

Zhiqiang Liu, Mingyang Chen, Yin Hua, Zhuo Chen, Ziqi Liu, Lei Liang, Huajun Chen, Wen Zhang

Figure 1 for UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Figure 2 for UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Figure 3 for UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Figure 4 for UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Abstract:Beyond-triple fact representations including hyper-relational facts with auxiliary key-value pairs, temporal facts with additional timestamps, and nested facts implying relationships between facts, are gaining significant attention. However, existing link prediction models are usually designed for one specific type of facts, making it difficult to generalize to other fact representations. To overcome this limitation, we propose a Unified Hierarchical Representation learning framework (UniHR) for unified knowledge graph link prediction. It consists of a unified Hierarchical Data Representation (HiDR) module and a unified Hierarchical Structure Learning (HiSL) module as graph encoder. The HiDR module unifies hyper-relational KGs, temporal KGs, and nested factual KGs into triple-based representations. Then HiSL incorporates intra-fact and inter-fact message passing, focusing on enhancing the semantic information within individual facts and enriching the structural information between facts. Experimental results across 7 datasets from 3 types of KGs demonstrate that our UniHR outperforms baselines designed for one specific kind of KG, indicating strong generalization capability of HiDR form and the effectiveness of HiSL module. Code and data are available at https://github.com/Lza12a/UniHR.

Via

Access Paper or Ask Questions

Exploring Model Kinship for Merging Large Language Models

Oct 16, 2024

Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen

Figure 1 for Exploring Model Kinship for Merging Large Language Models

Figure 2 for Exploring Model Kinship for Merging Large Language Models

Figure 3 for Exploring Model Kinship for Merging Large Language Models

Figure 4 for Exploring Model Kinship for Merging Large Language Models

Abstract:Model merging has become one of the key technologies for enhancing the capabilities and efficiency of Large Language Models (LLMs). However, our understanding of the expected performance gains and principles when merging any two models remains limited. In this work, we introduce model kinship, the degree of similarity or relatedness between LLMs, analogous to biological evolution. With comprehensive empirical analysis, we find that there is a certain relationship between model kinship and the performance gains after model merging, which can help guide our selection of candidate models. Inspired by this, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets. Specifically, we discover that using model kinship as a criterion can assist us in continuously performing model merging, alleviating the degradation (local optima) in model evolution, whereas model kinship can serve as a guide to escape these traps. Code is available at https://github.com/zjunlp/ModelKinship.

* Ongoing work

Via

Access Paper or Ask Questions

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Oct 15, 2024

Chenxi Wang, Xiang Chen, Ningyu Zhang, Bozhong Tian, Haoming Xu, Shumin Deng, Huajun Chen

Figure 1 for MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Figure 2 for MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Figure 3 for MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Figure 4 for MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Abstract:Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. We speculate that this may be due to the strong knowledge priors of the language model suppressing the visual information, leading to hallucinations. Motivated by this, we propose a novel dynamic correction decoding method for MLLMs (DeCo), which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits. Note that DeCo is model agnostic and can be seamlessly incorporated with various classic decoding strategies and applied to different MLLMs. We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines, highlighting its potential to mitigate hallucinations. Code is available at https://github.com/zjunlp/DeCo.

* Ongoing work

Via

Access Paper or Ask Questions