Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Gu

Rethinking Weight Tying: Pseudo-Inverse Tying for Stable LM Training and Updates

Feb 04, 2026

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

Abstract:Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, weight sharing does not guarantee a stable token interface: during training, the correspondence between encoding tokens into hidden states and decoding hidden states into logits can drift, worsening optimization sensitivity and making post-training interventions such as editing, patching, and lightweight adaptation less predictable. We propose Pseudo-Inverse Tying (PIT), which synchronizes embedding and unembedding as coupled projections of a shared latent token memory, guaranteeing a pseudo-inverse-consistent interface throughout training. PIT maintains an orthonormal shared memory, obtained by thin polar decomposition for teacher initialization or random orthonormal initialization from scratch, and introduces a fully learned symmetric positive definite hidden-space transform parameterized via a Cholesky factor. The output head applies this transform to hidden states before the vocabulary projection, while the embedding applies the inverse transform to token vectors using stable triangular solves, avoiding explicit pseudo-inverse recomputation and any vocabulary-sized auxiliary parameters. We evaluate PIT on on-device models spanning 256M-1.3B parameters across pretraining and adaptation, and consistently observe improved training stability, stronger layerwise semantic consistency, and substantially reduced side effects.

* an early-stage version

Via

Access Paper or Ask Questions

SeMe: Training-Free Language Model Merging via Semantic Alignment

May 26, 2025

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

Abstract:Despite the remarkable capabilities of Language Models (LMs) across diverse tasks, no single model consistently outperforms others, necessitating efficient methods to combine their strengths without expensive retraining. Existing model merging techniques, such as parameter averaging and task-guided fusion, often rely on data-dependent computations or fail to preserve internal knowledge, limiting their robustness and scalability. We introduce SeMe (Semantic-based Merging), a novel, data-free, and training-free approach that leverages latent semantic alignment to merge LMs at a fine-grained, layer-wise level. Unlike prior work, SeMe not only preserves model behaviors but also explicitly stabilizes internal knowledge, addressing a critical gap in LM fusion. Through extensive experiments across diverse architectures and tasks, we demonstrate that SeMe outperforms existing methods in both performance and efficiency while eliminating reliance on external data. Our work establishes a new paradigm for knowledge-aware model merging and provides insights into the semantic structure of LMs, paving the way for more scalable and interpretable model composition.

* an early-stage version

Via

Access Paper or Ask Questions

A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation

Mar 17, 2025

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

Figure 1 for A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation

Figure 2 for A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation

Figure 3 for A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation

Figure 4 for A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation

Abstract:Language Models (LMs) are widely used in software engineering for code generation, but they may produce code with errors. Rather than repairing the generated code, an alternative way is to address the underlying failures of models. LM repair offers a lightweight solution to this challenge: it requires minimal data, reduces computational costs, and reduces the side effects. Unlike retraining, LM repair focuses on applying tailored updates to targeted neurons, making it ideal for scenarios with limited resources, high-performance demands, or strict safety requirements. In this paper, we propose \ul{S}emantic \ul{T}argeting for \ul{A}nalytical \ul{R}epair (\textsc{STAR}), a pioneering and novel semantic-based optimization approach for repairing LLMs. \textsc{STAR} realizes main operations in LM repair methods in an optimization process, including locating ``buggy neurons'', solving ``neuron patches'', and patching ``buggy neurons''. Correspondingly, it computes the deltas of weight matrix as the prior information to guide optimization; and attributes the targeted layers and neurons leveraging statistical insights. The neuron patches are computed with a solid semantic-based analytical formula, which directly bridges the changes to logits with the deltas of neurons, by steering latent representations. Compared to the prior work of LM repair (\textsc{MINT}) and optimization methods (\textsc{SGD}), \textsc{STAR} integrates their strengths while mitigating their limitations. \textsc{STAR} supports solving multiple failures together, significantly improving the usefulness. Evaluated on three code generation tasks using popular code LMs, \textsc{STAR} demonstrates superior effectiveness. Additionally, \textsc{STAR} exhibits better efficiency. In terms of side effects, namely the balance between generalization and specificity, \textsc{STAR} outperforms prior work by a significant margin.

* 12 pages, 6 figure, 6 tables, under peer-review

Via

Access Paper or Ask Questions

A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

Jun 17, 2024

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

Figure 1 for A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

Figure 2 for A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

Figure 3 for A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

Figure 4 for A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

Abstract:Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to finetune} but neglects the issue of \textit{where to finetune}. As a pioneering work on answering where to finetune (at the layer level), we conduct a semantic analysis of the LM inference process. We first propose a virtual transition of the latent representation and then trace its factual transition. Based on the deviation in transitions, we estimate the gain of finetuning each model layer, and further, narrow down the scope for finetuning. We perform extensive experiments across well-known LMs and datasets. The results show that our approach is effective and efficient, and outperforms the existing baselines. Our approach is orthogonal to existing efficient techniques, such as PEFT methods, offering practical values on LM finetuning.

* 13 pages, 5 figures, under peer-review

Via

Access Paper or Ask Questions

On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Feb 02, 2024

Jian Gu, Chunyang Chen, Aldeida Aleti

Figure 1 for On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Figure 2 for On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Figure 3 for On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Figure 4 for On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Abstract:Understanding the latent space of language models (LM) is crucial to refining their performance and interpretability. Existing analyses often fall short in providing disentangled (model-centric) insights into LM semantics, and neglect essential aspects of LM adaption. In response, we introduce a pioneering method called vocabulary-defined semantics, which establishes a reference frame within the LM latent space, ensuring disentangled semantic analysis grounded in LM vocabulary. Our approach transcends prior entangled analysis, leveraging LM vocabulary for model-centric insights. Furthermore, we propose a novel technique to compute logits, emphasising differentiability and local isotropy, and introduce a neural clustering module for semantically calibrating data representations during LM adaptation. Through extensive experiments across diverse text understanding datasets, our approach outperforms state-of-the-art methods of retrieval-augmented generation and parameter-efficient finetuning, showcasing its efficacy and broad applicability. Our findings not only shed light on LM mechanics, but also offer practical solutions to enhance LM performance and interpretability.

* under peer review

Via

Access Paper or Ask Questions

Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs

Dec 08, 2023

Jian Gu, Chunyang Chen, Aldeida Aleti

Figure 1 for Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs

Figure 2 for Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs

Figure 3 for Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs

Figure 4 for Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs

Abstract:Large Language Models are successfully adopted in software engineering, especially in code generation. Updating these models with new knowledge is very expensive, and is often required to fully realize their value. In this paper, we propose a novel and effective model editing approach, \textsc{MENT}, to patch LLMs in coding tasks. Based on the mechanism of generative LLMs, \textsc{MENT} enables model editing in next-token predictions, and further supports common coding tasks. \textsc{MENT} is effective, efficient, and reliable. It can correct a neural model by patching 1 or 2 neurons. As the pioneer work on neuron-level model editing of generative models, we formalize the editing process and introduce the involved concepts. Besides, we also introduce new measures to evaluate its generalization ability, and build a benchmark for further study. Our approach is evaluated on three coding tasks, including API-seq recommendation, line-level code generation, and pseudocode-to-code transaction. It outperforms the state-of-the-art by a significant margin on both effectiveness and efficiency measures. In addition, we demonstrate the usages of \textsc{MENT} for LLM reasoning in software engineering. By editing the LLM knowledge with \textsc{MENT}, the directly or indirectly dependent behaviors in the chain-of-thought change accordingly and automatically.

* 12 pages, 5 figures, 6 tables, under peer review

Via

Access Paper or Ask Questions

Towards Top-Down Deep Code Generation in Limited Scopes

Sep 04, 2022

Jian Gu, Harald C. Gall

Figure 1 for Towards Top-Down Deep Code Generation in Limited Scopes

Figure 2 for Towards Top-Down Deep Code Generation in Limited Scopes

Figure 3 for Towards Top-Down Deep Code Generation in Limited Scopes

Figure 4 for Towards Top-Down Deep Code Generation in Limited Scopes

Abstract:Deep code generation is a topic of deep learning for software engineering (DL4SE), which adopts neural models to generate code for the intended functions. Since end-to-end neural methods lack the awareness of domain knowledge and software hierarchy, the results often require manual correction. To systematically explore the potential improvements of code generation, we let it participate in the whole top-down development from intentions to realizations, which is possible in limited scopes. In the process, it benefits from massive samples, features, and knowledge. As the foundation, we suggest building a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information. Moreover, we introduce a three-layer semantic pyramid (SP) to associate text data and code data. It identifies the information of different abstraction levels, and thus introduces the domain knowledge on development and reveals the hierarchy of software. Furthermore, we propose a semantic pyramid framework (SPF) as the approach, focusing on softwares of high modularity and low complexity. SPF divides the code generation process into stages and reserves spots for potential interactions. Eventually, we conceived application scopes for SPF.

* 5 pages, 3 figures, 2 tables, under revision

Via

Access Paper or Ask Questions

Efficient Virtual View Selection for 3D Hand Pose Estimation

Mar 29, 2022

Jian Cheng, Yanguang Wan, Dexin Zuo, Cuixia Ma, Jian Gu, Ping Tan, Hongan Wang, Xiaoming Deng, Yinda Zhang

Figure 1 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 2 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 3 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 4 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Abstract:3D hand pose estimation from single depth is a fundamental problem in computer vision, and has wide applications.However, the existing methods still can not achieve satisfactory hand pose estimation results due to view variation and occlusion of human hand. In this paper, we propose a new virtual view selection and fusion module for 3D hand pose estimation from single depth.We propose to automatically select multiple virtual viewpoints for pose estimation and fuse the results of all and find this empirically delivers accurate and robust pose estimation. In order to select most effective virtual views for pose fusion, we evaluate the virtual views based on the confidence of virtual views using a light-weight network via network distillation. Experiments on three main benchmark datasets including NYU, ICVL and Hands2019 demonstrate that our method outperforms the state-of-the-arts on NYU and ICVL, and achieves very competitive performance on Hands2019-Task1, and our proposed virtual view selection and fusion module is both effective for 3D hand pose estimation.

* Accepted by AAAI2022

Via

Access Paper or Ask Questions

Assemble Foundation Models for Automatic Code Summarization

Jan 13, 2022

Jian Gu, Pasquale Salza, Harald C. Gall

Figure 1 for Assemble Foundation Models for Automatic Code Summarization

Figure 2 for Assemble Foundation Models for Automatic Code Summarization

Figure 3 for Assemble Foundation Models for Automatic Code Summarization

Figure 4 for Assemble Foundation Models for Automatic Code Summarization

Abstract:Automatic code summarization is beneficial to software development and maintenance since it reduces the burden of manual tasks. Currently, artificial intelligence is undergoing a paradigm shift. The foundation models pretrained on massive data and finetuned to downstream tasks surpass specially customized models. This trend inspired us to consider reusing foundation models instead of learning from scratch. Based on this, we propose a flexible and robust approach for automatic code summarization based on neural networks. We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo. Moreover, we utilize Gaussian noise as the simulation of contextual information to optimize the latent representation. Furthermore, we introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning, and design intermediate stage tasks for general sequence-to-sequence learning. Finally, we evaluate AdaMo against a benchmark dataset for code summarization, by comparing it with state-of-the-art models.

* 12 pages, 2 figures, 8 tables, accepted by SANER 2022, the camera-ready version

Via

Access Paper or Ask Questions

Multimodal Representation for Neural Code Search

Jul 23, 2021

Jian Gu, Zimin Chen, Martin Monperrus

Figure 1 for Multimodal Representation for Neural Code Search

Figure 2 for Multimodal Representation for Neural Code Search

Figure 3 for Multimodal Representation for Neural Code Search

Figure 4 for Multimodal Representation for Neural Code Search

Abstract:Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their representation in the shared vector space. In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data. We conduct extensive experiments using a single corpus that is large-scale and multi-language: CodeSearchNet. Our results show that both our tree-serialized representations and multimodal learning model improve the performance of code search. Last, we define intuitive quantification metrics oriented to the completeness of semantic and syntactic information of the code data, to help understand the experimental findings.

* 12 pages, 9 figures, accepted by ICSME 2021, the camera-ready version

Via

Access Paper or Ask Questions