Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiaye Lin

D-LEAF: Localizing and Correcting Hallucinations in Multimodal LLMs via Layer-to-head Attention Diagnostics

Sep 09, 2025

Tiancheng Yang, Lin Zhang, Jiaye Lin, Guimin Hu, Di Wang, Lijie Hu

Abstract:Multimodal Large Language Models (MLLMs) achieve strong performance on tasks like image captioning and visual question answering, but remain prone to hallucinations, where generated text conflicts with the visual input. Prior work links this partly to insufficient visual attention, but existing attention-based detectors and mitigation typically apply uniform adjustments across layers and heads, obscuring where errors originate. In this paper, we first show these methods fail to accurately localize problematic layers. Then, we introduce two diagnostics: Layer Image Attention Entropy (LIAE) which flags anomalous layers, and Image Attention Focus (IAF) which scores attention heads within those layers. Analysis shows that LIAE pinpoints faulty layers and IAF reliably ranks heads that warrant correction. Guided by these signals, we propose Dynamic Layer-wise Entropy and Attention Fusion (D-LEAF), a task-agnostic, attention-guided method that dynamically localizes and corrects errors during inference with negligible overhead. Results show our D-LEAF delivers a 53% relative improvement on standard captioning benchmarks, and on VQA both accuracy and F1-score improve by approximately 4%, substantially suppressing hallucinations while preserving efficiency.

Via

Access Paper or Ask Questions

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

May 27, 2025

Huacan Wang, Ziyi Ni, Shuo Zhang, Shuo Lu, Sen Hu, Ziyang He, Chen Hu, Jiaye Lin, Yifu Guo, Yuntao Du(+1 more)

Figure 1 for RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Figure 2 for RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Figure 3 for RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Figure 4 for RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Abstract:The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than simple scripts. Building such repositories from scratch remains a major challenge. Fortunately, GitHub hosts a vast, evolving collection of open-source repositories, which developers frequently reuse as modular components for complex tasks. Yet, existing frameworks like OpenHands and SWE-Agent still struggle to effectively leverage these valuable resources. Relying solely on README files provides insufficient guidance, and deeper exploration reveals two core obstacles: overwhelming information and tangled dependencies of repositories, both constrained by the limited context windows of current LLMs. To tackle these issues, we propose RepoMaster, an autonomous agent framework designed to explore and reuse GitHub repositories for solving complex tasks. For efficient understanding, RepoMaster constructs function-call graphs, module-dependency graphs, and hierarchical code trees to identify essential components, providing only identified core elements to the LLMs rather than the entire repository. During autonomous execution, it progressively explores related components using our exploration tools and prunes information to optimize context usage. Evaluated on the adjusted MLE-bench, RepoMaster achieves a 110% relative boost in valid submissions over the strongest baseline OpenHands. On our newly released GitTaskBench, RepoMaster lifts the task-pass rate from 24.1% to 62.9% while reducing token usage by 95%. Our code and demonstration materials are publicly available at https://github.com/wanghuacan/RepoMaster.

* A novel approach; Very practical

Via

Access Paper or Ask Questions

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

May 26, 2025

Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang

Abstract:Reward models trained with conventional Reinforcement Learning from AI Feedback (RLAIF) methods suffer from limited generalizability, which hinders the alignment performance of the policy model during reinforcement learning (RL). This challenge stems from various issues, including distribution shift, preference label noise, and mismatches between overly challenging samples and model capacity. In this paper, we attempt to enhance the generalizability of reward models through a data-centric approach, driven by the insight that these issues are inherently intertwined from the perspective of data difficulty. To address this, we propose a novel framework, $\textit{Curriculum-RLAIF}$, which constructs preference pairs with varying difficulty levels and produces a curriculum that progressively incorporates preference pairs of increasing difficulty for reward model training. Our experimental results suggest that reward models trained with Curriculum-RLAIF achieve improved generalizability, significantly increasing the alignment performance of the policy model by a large margin without incurring additional inference costs compared to various non-curriculum baselines. Detailed analysis and comparisons with alternative approaches, including data selection via external pretrained reward models or internal self-selection mechanisms, as well as other curriculum strategies, further demonstrate the superiority of our approach in terms of simplicity, efficiency, and effectiveness.

Via

Access Paper or Ask Questions

LOG-LIO2: A LiDAR-Inertial Odometry with Efficient Uncertainty Analysis

May 02, 2024

Kai Huang, Junqiao Zhao, Jiaye Lin, Zhongyang Zhu, Shuangfu Song, Chen Ye, Tiantian Feng

Abstract:Uncertainty in LiDAR measurements, stemming from factors such as range sensing, is crucial for LIO (LiDAR-Inertial Odometry) systems as it affects the accurate weighting in the loss function. While recent LIO systems address uncertainty related to range sensing, the impact of incident angle on uncertainty is often overlooked by the community. Moreover, the existing uncertainty propagation methods suffer from computational inefficiency. This paper proposes a comprehensive point uncertainty model that accounts for both the uncertainties from LiDAR measurements and surface characteristics, along with an efficient local uncertainty analytical method for LiDAR-based state estimation problem. We employ a projection operator that separates the uncertainty into the ray direction and its orthogonal plane. Then, we derive incremental Jacobian matrices of eigenvalues and eigenvectors w.r.t. points, which enables a fast approximation of uncertainty propagation. This approach eliminates the requirement for redundant traversal of points, significantly reducing the time complexity of uncertainty propagation from $\mathcal{O} (n)$ to $\mathcal{O} (1)$ when a new point is added. Simulations and experiments on public datasets are conducted to validate the accuracy and efficiency of our formulations. The proposed methods have been integrated into a LIO system, which is available at https://github.com/tiev-tongji/LOG-LIO2.

Via

Access Paper or Ask Questions

N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping

Jan 07, 2024

Shuangfu Song, Junqiao Zhao, Kai Huang, Jiaye Lin, Chen Ye, Tiantian Feng

$Figure 1 for N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping$

$Figure 2 for N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping$

$Figure 3 for N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping$

$Figure 4 for N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping$

Abstract:Accurate and dense mapping in large-scale environments is essential for various robot applications. Recently, implicit neural signed distance fields (SDFs) have shown promising advances in this task. However, most existing approaches employ projective distances from range data as SDF supervision, introducing approximation errors and thus degrading the mapping quality. To address this problem, we introduce N3-Mapping, an implicit neural mapping system featuring normal-guided neural non-projective signed distance fields. Specifically, we directly sample points along the surface normal, instead of the ray, to obtain more accurate non-projective distance values from range data. Then these distance values are used as supervision to train the implicit map. For large-scale mapping, we apply a voxel-oriented sliding window mechanism to alleviate the forgetting issue with a bounded memory footprint. Besides, considering the uneven distribution of measured point clouds, a hierarchical sampling strategy is designed to improve training efficiency. Experiments demonstrate that our method effectively mitigates SDF approximation errors and achieves state-of-the-art mapping quality compared to existing approaches.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Aug 29, 2020

Shimin Gong, Jiaye Lin, Jinbei Zhang, Dusit Niyato, Dong In Kim, Mohsen Guizani

Figure 1 for Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Figure 2 for Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Figure 3 for Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Figure 4 for Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks

Abstract:Intelligent reflecting surface (IRS) has been recently employed to reshape the wireless channels by controlling individual scattering elements' phase shifts, namely, passive beamforming. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity and inexact channel information. In this article, we focus on machine learning (ML) approaches for performance maximization in IRS-assisted wireless networks. In general, ML approaches provide enhanced flexibility and robustness against uncertain information and imprecise modeling. Practical challenges still remain mainly due to the demand for a large dataset in offline training and slow convergence in online learning. These observations motivate us to design a novel optimization-driven ML framework for IRS-assisted wireless networks, which takes both advantages of the efficiency in model-based optimization and the robustness in model-free ML approaches. By splitting the decision variables into two parts, one part is obtained by the outer-loop ML approach, while the other part is optimized efficiently by solving an approximate problem. Numerical results verify that the optimization-driven ML approach can improve both the convergence and the reward performance compared to conventional model-free learning approaches.

* submitted to IEEE Communications Magazine

Via

Access Paper or Ask Questions

Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications

May 25, 2020

Jiaye Lin, Yuze Zou, Xiaoru Dong, Shimin Gong, Dinh Thai Hoang, Dusit Niyato

Figure 1 for Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications

Figure 2 for Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications

Figure 3 for Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications

Figure 4 for Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications

Abstract:Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. In this paper, we minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. Due to uncertain channel conditions, we formulate a robust power minimization problem subject to the receiver's signal-to-noise ratio (SNR) requirement and the IRS's power budget constraint. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences. To improve the learning performance, we derive a convex approximation as a lower bound on the robust problem, which is integrated into the DRL framework and thus promoting a novel optimization-driven deep deterministic policy gradient (DDPG) approach. In particular, when the DDPG algorithm generates a part of the action (e.g., passive beamforming), we can use the model-based convex approximation to optimize the other part (e.g., active beamforming) of the action more efficiently. Our simulation results demonstrate that the optimization-driven DDPG algorithm can improve both the learning rate and reward performance significantly compared to the conventional model-free DDPG algorithm.

Via

Access Paper or Ask Questions