Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinhao Huang

DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models

Apr 20, 2026

You-Liang Huang, Xinhao Huang, Chengxi Liao, Zeyi Wen

Abstract:Existing works on large language model (LLM) decomposition mainly focus on improving performance on downstream tasks, but they ignore the poor parallel inference performance when trying to scale up the model size. To mitigate this important performance issue, this paper introduces DeInfer, a high-performance inference system dedicated to parallel inference of decomposed LLMs. It consists of multiple optimizations to maximize performance and be compatible with state-of-the-art optimization techniques. Extensive experiments are carried out to evaluate DeInfer's performance, where the results demonstrate its superiority, suggesting it can greatly facilitate the parallel inference of decomposed LLMs.

* accepted by DAC'26

Via

Access Paper or Ask Questions

SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Aug 28, 2025

Xinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen

Figure 1 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 2 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 3 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 4 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Abstract:In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose \our, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release \dataset, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both released and industrial datasets across various modern PLMs, along with online A/B testing, demonstrate consistent performance improvements, boosting NDCG@10 from 73.96\% to 77.84\% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.

* Accepted at EMNLP 2025 Main Conference

Via

Access Paper or Ask Questions