Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Peng

Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs

May 26, 2025

Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou

Abstract:The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL$^+$, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL$^+$ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs. Data and code are available at https://github.com/FudanSELab/LinuxFLBench.

Via

Access Paper or Ask Questions

Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Apr 17, 2025

Mingwei Liu, Juntao Li, Ying Wang, Xueying Du, Zuoyu Ou, Qiuyuan Chen, Bingxu An, Zhao Wei, Yong Xu, Fangming Zou(+2 more)

Abstract:Despite recent advances in Large Language Models (LLMs) for code generation, the quality of LLM-generated code still faces significant challenges. One significant issue is code repetition, which refers to the model's tendency to generate structurally redundant code, resulting in inefficiencies and reduced readability. To address this, we conduct the first empirical study to investigate the prevalence and nature of repetition across 19 state-of-the-art code LLMs using three widely-used benchmarks. Our study includes both quantitative and qualitative analyses, revealing that repetition is pervasive and manifests at various granularities and extents, including character, statement, and block levels. We further summarize a taxonomy of 20 repetition patterns. Building on our findings, we propose DeRep, a rule-based technique designed to detect and mitigate repetition in generated code. We evaluate DeRep using both open-source benchmarks and in an industrial setting. Our results demonstrate that DeRep significantly outperforms baselines in reducing repetition (with an average improvements of 91.3%, 93.5%, and 79.9% in rep-3, rep-line, and sim-line metrics) and enhancing code quality (with a Pass@1 increase of 208.3% over greedy search). Furthermore, integrating DeRep improves the performance of existing repetition mitigation methods, with Pass@1 improvements ranging from 53.7% to 215.7%.

Via

Access Paper or Ask Questions

The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments

Apr 11, 2025

Jiafan Lu, Dongcheng Hu, Yitian Ye, Anqi Liu, Zixian Zhang, Xin Peng

Figure 1 for The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments

Figure 2 for The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments

Figure 3 for The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments

Figure 4 for The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments

Abstract:Indoor poultry farms require inspection robots to maintain precise environmental control, which is crucial for preventing the rapid spread of disease and large-scale bird mortality. However, the complex conditions within these facilities, characterized by areas of intense illumination and water accumulation, pose significant challenges. Traditional navigation methods that rely on a single sensor often perform poorly in such environments, resulting in issues like laser drift and inaccuracies in visual navigation line extraction. To overcome these limitations, we propose a novel composite navigation method that integrates both laser and vision technologies. This approach dynamically computes a fused yaw angle based on the real-time reliability of each sensor modality, thereby eliminating the need for physical navigation lines. Experimental validation in actual poultry house environments demonstrates that our method not only resolves the inherent drawbacks of single-sensor systems, but also significantly enhances navigation precision and operational efficiency. As such, it presents a promising solution for improving the performance of inspection robots in complex indoor poultry farming settings.

Via

Access Paper or Ask Questions

RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Nov 29, 2024

Fan Yang, You Lu, Bihuan Chen, Peng Qin, Xin Peng

Figure 1 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 2 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 3 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Figure 4 for RoadGen: Generating Road Scenarios for Autonomous Vehicle Testing

Abstract:With the rapid development of autonomous vehicles, there is an increasing demand for scenario-based testing to simulate diverse driving scenarios. However, as the base of any driving scenarios, road scenarios (e.g., road topology and geometry) have received little attention by the literature. Despite several advances, they either generate basic road components without a complete road network, or generate a complete road network but with simple road components. The resulting road scenarios lack diversity in both topology and geometry. To address this problem, we propose RoadGen to systematically generate diverse road scenarios. The key idea is to connect eight types of parameterized road components to form road scenarios with high diversity in topology and geometry. Our evaluation has demonstrated the effectiveness and usefulness of RoadGen in generating diverse road scenarios for simulation.

* 7 pages

Via

Access Paper or Ask Questions

AdvFuzz: Finding More Violations Caused by the EGO Vehicle in Simulation Testing by Adversarial NPC Vehicles

Nov 29, 2024

You Lu, Yifan Tian, Dingji Wang, Bihuan Chen, Xin Peng

Figure 1 for AdvFuzz: Finding More Violations Caused by the EGO Vehicle in Simulation Testing by Adversarial NPC Vehicles

Figure 2 for AdvFuzz: Finding More Violations Caused by the EGO Vehicle in Simulation Testing by Adversarial NPC Vehicles

Figure 3 for AdvFuzz: Finding More Violations Caused by the EGO Vehicle in Simulation Testing by Adversarial NPC Vehicles

Figure 4 for AdvFuzz: Finding More Violations Caused by the EGO Vehicle in Simulation Testing by Adversarial NPC Vehicles

Abstract:Recently, there has been a significant escalation in both academic and industrial commitment towards the development of autonomous driving systems (ADSs). A number of simulation testing approaches have been proposed to generate diverse driving scenarios for ADS testing. However, scenarios generated by these previous approaches are static and lack interactions between the EGO vehicle and the NPC vehicles, resulting in a large amount of time on average to find violation scenarios. Besides, a large number of the violations they found are caused by aggressive behaviors of NPC vehicles, revealing none bugs of ADS. In this work, we propose the concept of adversarial NPC vehicles and introduce AdvFuzz, a novel simulation testing approach, to generate adversarial scenarios on main lanes (e.g., urban roads and highways). AdvFuzz allows NPC vehicles to dynamically interact with the EGO vehicle and regulates the behaviors of NPC vehicles, finding more violation scenarios caused by the EGO vehicle more quickly. We compare AdvFuzz with a random approach and three state-of-the-art scenario-based testing approaches. Our experiments demonstrate that AdvFuzz can generate 198.34% more violation scenarios compared to the other four approaches in 12 hours and increase the proportion of violations caused by the EGO vehicle to 87.04%, which is more than 7 times that of other approaches. Additionally, AdvFuzz is at least 92.21% faster in finding one violation caused by the EGO vehicle than that of the other approaches.

* 21 pages

Via

Access Paper or Ask Questions

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

Sep 30, 2024

Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, Yiling Lou

Abstract:Code translation converts code from one programming language to another while maintaining its original functionality, which is crucial for software migration, system refactoring, and cross-platform development. Traditional rule-based methods rely on manually-written rules, which can be time-consuming and often result in less readable code. To overcome this, learning-based methods have been developed, leveraging parallel data to train models for automated code translation. More recently, the advance of Large Language Models (LLMs) further boosts learning-based code translation. Although promising, LLM-translated program still suffers from diverse quality issues (e.g., syntax errors and semantic errors). In particular, it can be challenging for LLMs to self-debug these errors when simply provided with the corresponding error messages. In this work, we propose a novel LLM-based multi-agent system TRANSAGENT, which enhances LLM-based code translation by fixing the syntax errors and semantic errors with the synergy between four LLM-based agents, including Initial Code Translator, Syntax Error Fixer, Code Aligner, and Semantic Error Fixer. The main insight of TRANSAGENT is to first localize the error code block in the target program based on the execution alignment between the target and source program, which can narrow down the fixing space and thus lower down the fixing difficulties. To evaluate TRANSAGENT, we first construct a new benchmark from recent programming tasks to mitigate the potential data leakage issue. On our benchmark, TRANSAGENT outperforms the latest LLM-based code translation technique UniTrans in both translation effectiveness and efficiency; additionally, our evaluation on different LLMs show the generalization of TRANSAGENT and our ablation study shows the contribution of each agent.

Via

Access Paper or Ask Questions

Large Language Model-Based Agents for Software Engineering: A Survey

Sep 04, 2024

Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou

Figure 1 for Large Language Model-Based Agents for Software Engineering: A Survey

Figure 2 for Large Language Model-Based Agents for Software Engineering: A Survey

Figure 3 for Large Language Model-Based Agents for Software Engineering: A Survey

Figure 4 for Large Language Model-Based Agents for Software Engineering: A Survey

Abstract:The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in Software Engineering (SE). The synergy between multiple agents and human interaction brings further promise in tackling complex real-world SE problems. In this work, we present a comprehensive and systematic survey on LLM-based agents for SE. We collect 106 papers and categorize them from two perspectives, i.e., the SE and agent perspectives. In addition, we discuss open challenges and future directions in this critical domain. The repository of this survey is at https://github.com/FudanSELab/Agent4SE-Paper-List.

Via

Access Paper or Ask Questions

Flow Perturbation to Accelerate Unbiased Sampling of Boltzmann distribution

Jul 15, 2024

Xin Peng, Ang Gao

Abstract:Flow-based generative models have been employed for sampling the Boltzmann distribution, but their application to high-dimensional systems is hindered by the significant computational cost of obtaining the Jacobian of the flow. To overcome this challenge, we introduce the flow perturbation method, which incorporates optimized stochastic perturbations into the flow. By reweighting trajectories generated by the perturbed flow, our method achieves unbiased sampling of the Boltzmann distribution with orders of magnitude speedup compared to both brute force Jacobian calculations and the Hutchinson estimator. Notably, it accurately sampled the Chignolin protein with all atomic Cartesian coordinates explicitly represented, which, to our best knowledge, is the largest molecule ever Boltzmann sampled in such detail using generative models.

Via

Access Paper or Ask Questions

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Jun 17, 2024

Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Xin Peng, Tao Ma, Yiling Lou

Figure 1 for Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Figure 2 for Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Figure 3 for Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Figure 4 for Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Abstract:Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77.

Via

Access Paper or Ask Questions

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Apr 05, 2024

Tong Su, Xin Peng, Sarubi Thillainathan, David Guzmán, Surangika Ranathunga, En-Shiun Annie Lee

Figure 1 for Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Figure 2 for Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Figure 3 for Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Figure 4 for Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Abstract:Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies significantly across different languages. We conducted comprehensive empirical experiments with varying LRL domains and sizes to evaluate the performance of 8 PEFT methods with in total of 15 architectures using the SacreBLEU score. We showed that 6 PEFT architectures outperform the baseline for both in-domain and out-domain tests and the Houlsby+Inversion adapter has the best performance overall, proving the effectiveness of PEFT methods.

* Accepted to the Findings of NAACL 2024

Via

Access Paper or Ask Questions