Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao Liu

Tony

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Mar 05, 2025

Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, Hui Xiong

Figure 1 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Figure 2 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Figure 3 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Figure 4 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Abstract:Large Language Models (LLMs) have demonstrated improved generation performance by incorporating externally retrieved knowledge, a process known as retrieval-augmented generation (RAG). Despite the potential of this approach, existing studies evaluate RAG effectiveness by 1) assessing retrieval and generation components jointly, which obscures retrieval's distinct contribution, or 2) examining retrievers using traditional metrics such as NDCG, which creates a gap in understanding retrieval's true utility in the overall generation process. To address the above limitations, in this work, we introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework. Specifically, we propose Semantic Perplexity (SePer), a metric that captures the LLM's internal belief about the correctness of the retrieved information. We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval. Extensive experiments demonstrate that SePer not only aligns closely with human preferences but also offers a more precise and efficient evaluation of retrieval utility across diverse RAG scenarios.

* ICLR 2025 Spotlight

Via

Access Paper or Ask Questions

HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation

Feb 18, 2025

Hao Liu, Zhengren Wang, Xi Chen, Zhiyu Li, Feiyu Xiong, Qinhan Yu, Wentao Zhang

Abstract:Retrieval-Augmented Generation (RAG) systems often struggle with imperfect retrieval, as traditional retrievers focus on lexical or semantic similarity rather than logical relevance. To address this, we propose HopRAG, a novel RAG framework that augments retrieval with logical reasoning through graph-structured knowledge exploration. During indexing, HopRAG constructs a passage graph, with text chunks as vertices and logical connections established via LLM-generated pseudo-queries as edges. During retrieval, it employs a retrieve-reason-prune mechanism: starting with lexically or semantically similar passages, the system explores multi-hop neighbors guided by pseudo-queries and LLM reasoning to identify truly relevant ones. Extensive experiments demonstrate HopRAG's superiority, achieving 76.78\% higher answer accuracy and 65.07\% improved retrieval F1 score compared to conventional methods. The repository is available at https://github.com/LIU-Hao-2002/HopRAG.

Via

Access Paper or Ask Questions

GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Feb 17, 2025

Zhuoning Guo, Guangxing Chen, Qian Gao, Xiaochao Liao, Jianjia Zheng, Lu Shen, Hao Liu

Figure 1 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 2 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 3 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Figure 4 for GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations

Abstract:Web recommendations provide personalized items from massive catalogs for users, which rely heavily on retrieval stages to trade off the effectiveness and efficiency of selecting a small relevant set from billion-scale candidates in online digital platforms. As one of the largest Chinese search engine and news feed providers, Baidu resorts to Deep Neural Network (DNN) and graph-based Approximate Nearest Neighbor Search (ANNS) algorithms for accurate relevance estimation and efficient search for relevant items. However, current retrieval at Baidu fails in comprehensive user-item relational understanding due to dissected interaction modeling, and performs inefficiently in large-scale graph-based ANNS because of suboptimal traversal navigation and the GPU computational bottleneck under high concurrency. To this end, we propose a GPU-accelerated Multi-relational Parallel Graph Retrieval (GMP-GR) framework to achieve effective yet efficient retrieval in web-scale recommendations. First, we propose a multi-relational user-item relevance metric learning method that unifies diverse user behaviors through multi-objective optimization and employs a self-covariant loss to enhance pathfinding performance. Second, we develop a hierarchical parallel graph-based ANNS to boost graph retrieval throughput, which conducts breadth-depth-balanced searches on a large-scale item graph and cost-effectively handles irregular neural computation via adaptive aggregation on GPUs. In addition, we integrate system optimization strategies in the deployment of GMP-GR in Baidu. Extensive experiments demonstrate the superiority of GMP-GR in retrieval accuracy and efficiency. Deployed across more than twenty applications at Baidu, GMP-GR serves hundreds of millions of users with a throughput exceeding one hundred million requests per second.

Via

Access Paper or Ask Questions

Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

Feb 17, 2025

Yujie Lin, Ante Wang, Moye Chen, Jingyao Liu, Hao Liu, Jinsong Su, Xinyan Xiao

Figure 1 for Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

Figure 2 for Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

Figure 3 for Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

Figure 4 for Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study

Abstract:Recently, inference-time scaling of chain-of-thought (CoT) has been demonstrated as a promising approach for addressing multi-modal reasoning tasks. While existing studies have predominantly centered on text-based thinking, the integration of both visual and textual modalities within the reasoning process remains unexplored. In this study, we pioneer the exploration of inference-time scaling with multi-modal thought, aiming to bridge this gap. To provide a comprehensive analysis, we systematically investigate popular sampling-based and tree search-based inference-time scaling methods on 10 challenging tasks spanning various domains. Besides, we uniformly adopt a consistency-enhanced verifier to ensure effective guidance for both methods across different thought paradigms. Results show that multi-modal thought promotes better performance against conventional text-only thought, and blending the two types of thought fosters more diverse thinking. Despite these advantages, multi-modal thoughts necessitate higher token consumption for processing richer visual inputs, which raises concerns in practical applications. We hope that our findings on the merits and drawbacks of this research line will inspire future works in the field.

Via

Access Paper or Ask Questions

Bag of Tricks for Inference-time Computation of LLM Reasoning

Feb 12, 2025

Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu

Figure 1 for Bag of Tricks for Inference-time Computation of LLM Reasoning

Figure 2 for Bag of Tricks for Inference-time Computation of LLM Reasoning

Figure 3 for Bag of Tricks for Inference-time Computation of LLM Reasoning

Figure 4 for Bag of Tricks for Inference-time Computation of LLM Reasoning

Abstract:With the advancement of large language models (LLMs), solving complex reasoning tasks has gained increasing attention. Inference-time computation methods (e.g., Best-of-N, beam search, et al.) are particularly valuable as they can enhance reasoning performance without modifying model parameters or requiring additional training. However, these techniques come with implementation challenges, and most existing methods remain at the proof-of-concept stage with limited practical adoption due to their computational complexity and varying effectiveness across different tasks. In this paper, we investigate and benchmark diverse inference-time computation strategies across reasoning tasks of varying complexity. Since most current methods rely on a proposer-verifier pipeline that first generates candidate solutions (e.g., reasoning solutions) and then selects the best one based on reward signals (e.g., RLHF rewards, process rewards), our research focuses on optimizing both candidate solution generation (e.g., instructing prompts, hyperparameters such as temperature and top-p) and reward mechanisms (e.g., self-evaluation, reward types). Through extensive experiments (more than 20,000 A100-80G GPU hours with over 1,000 experiments) across a variety of models (e.g., Llama, Qwen, and Mistral families) of various sizes, our ablation studies reveal that previously overlooked strategies can significantly enhance performance (e.g., tuning temperature can improve reasoning task performance by up to 5%). Furthermore, we establish a standardized benchmark for inference-time computation by systematically evaluating six representative methods across eight reasoning tasks. These findings provide a stronger foundation for future research. The code is available at https://github.com/usail-hkust/benchmark_inference_time_computation_LL

Via

Access Paper or Ask Questions

MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

Feb 12, 2025

Min Hou, Chenxi Bai, Le Wu, Hao Liu, Kun Zhang, Kai Zhang, Richang Hong, Meng Wang

Figure 1 for MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

Figure 2 for MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

Figure 3 for MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

Figure 4 for MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

Abstract:Large Language Models (LLMs) have achieved remarkable success in recent years, owing to their impressive generalization capabilities and rich world knowledge. To capitalize on the potential of using LLMs as recommender systems, mainstream approaches typically focus on two paradigms. The first paradigm designs multi-domain or multi-task instruction data for generalizable recommendation, so as to align LLMs with general recommendation areas and deal with cold-start recommendation. The second paradigm enhances domain-specific recommendation tasks with parameter-efficient fine-tuning techniques, in order to improve models under the warm recommendation scenarios. While most previous works treat these two paradigms separately, we argue that they have complementary advantages, and combining them together would be helpful. To that end, in this paper, we propose a generalizable and efficient LLM-based recommendation framework MoLoRec. Our approach starts by parameter-efficient fine-tuning a domain-general module with general recommendation instruction data, to align LLM with recommendation knowledge. Then, given users' behavior of a specific domain, we construct a domain-specific instruction dataset and apply efficient fine-tuning to the pre-trained LLM. After that, we provide approaches to integrate the above domain-general part and domain-specific part with parameters mixture. Please note that, MoLoRec is efficient with plug and play, as the domain-general module is trained only once, and any domain-specific plug-in can be efficiently merged with only domain-specific fine-tuning. Extensive experiments on multiple datasets under both warm and cold-start recommendation scenarios validate the effectiveness and generality of the proposed MoLoRec.

Via

Access Paper or Ask Questions

Stacked Intelligent Metasurface Enabled Near-Field Multiuser Beamfocusing in the Wave Domain

Feb 09, 2025

Xing Jia, Jiancheng An, Hao Liu, Lu Gan, Marco Di Renzo, Mérouane Debbah, Chau Yuen

Figure 1 for Stacked Intelligent Metasurface Enabled Near-Field Multiuser Beamfocusing in the Wave Domain

Figure 2 for Stacked Intelligent Metasurface Enabled Near-Field Multiuser Beamfocusing in the Wave Domain

Figure 3 for Stacked Intelligent Metasurface Enabled Near-Field Multiuser Beamfocusing in the Wave Domain

Figure 4 for Stacked Intelligent Metasurface Enabled Near-Field Multiuser Beamfocusing in the Wave Domain

Abstract:Intelligent surfaces represent a breakthrough technology capable of customizing the wireless channel cost-effectively. However, the existing works generally focus on planar wavefront, neglecting near-field spherical wavefront characteristics caused by large array aperture and high operation frequencies in the terahertz (THz). Additionally, the single-layer reconfigurable intelligent surface (RIS) lacks the signal processing ability to mitigate the computational complexity at the base station (BS). To address this issue, we introduce a novel stacked intelligent metasurfaces (SIM) comprised of an array of programmable metasurface layers. The SIM aims to substitute conventional digital baseband architecture to execute computing tasks with ultra-low processing delay, albeit with a reduced number of radio-frequency (RF) chains and low-resolution digital-to-analog converters. In this paper, we present a SIM-aided multiuser multiple-input single-output (MU-MISO) near-field system, where the SIM is integrated into the BS to perform beamfocusing in the wave domain and customize an end-to-end channel with minimized inter-user interference. Finally, the numerical results demonstrate that near-field communication achieves superior spatial gain over the far-field, and the SIM effectively suppresses inter-user interference as the wireless signals propagate through it.

* 12 pages, 5 figures, presented at VTC Spring 2024, Singapore

Via

Access Paper or Ask Questions

Geoinformatics-Guided Machine Learning for Power Plant Classification

Feb 03, 2025

Blessing Austin-Gabriel, Aparna S. Varde, Hao Liu

Figure 1 for Geoinformatics-Guided Machine Learning for Power Plant Classification

Figure 2 for Geoinformatics-Guided Machine Learning for Power Plant Classification

Figure 3 for Geoinformatics-Guided Machine Learning for Power Plant Classification

Figure 4 for Geoinformatics-Guided Machine Learning for Power Plant Classification

Abstract:This paper proposes an approach in the area of Knowledge-Guided Machine Learning (KGML) via a novel integrated framework comprising CNN (Convolutional Neural Networks) and ViT (Vision Transformers) along with GIS (Geographic Information Systems) to enhance power plant classification in the context of energy management. Knowledge from geoinformatics derived through Spatial Masks (SM) in GIS is infused into an architecture of CNN and ViT, in this proposed KGML approach. It is found to provide much better performance compared to the baseline of CNN and ViT only in the classification of multiple types of power plants from real satellite imagery, hence emphasizing the vital role of the geoinformatics-guided approach. This work makes a contribution to the main theme of KGML that can be beneficial in many AI systems today. It makes broader impacts on AI in Smart Cities, and Environmental Computing.

* AAAI 2025 Conference Bridge Program

Via

Access Paper or Ask Questions

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Dec 31, 2024

Ling Fu, Biao Yang, Zhebin Kuang, Jiajun Song, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Mingxin Huang(+14 more)

Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Abstract:Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities on certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4x more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10,000 human-verified question-answering pairs and a high proportion of difficult samples. After carefully benchmarking state-of-the-art LMMs on OCRBench v2, we find that 20 out of 22 LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https://github.com/Yuliang-liu/MultimodalOCR.

Via

Access Paper or Ask Questions

Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning

Dec 29, 2024

Hang Ni, Yuzhi Wang, Hao Liu

Figure 1 for Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning

Figure 2 for Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning

Figure 3 for Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning

Abstract:Urban regeneration presents significant challenges within the context of urbanization, requiring adaptive approaches to tackle evolving needs. Leveraging advancements in large language models (LLMs), we propose Cyclical Urban Planning (CUP), a new paradigm that continuously generates, evaluates, and refines urban plans in a closed-loop. Specifically, our multi-agent LLM-based framework consists of three key components: (1) Planning, where LLM agents generate and refine urban plans based on contextual data; (2) Living, where agents simulate the behaviors and interactions of residents, modeling life in the urban environment; and (3) Judging, which involves evaluating plan effectiveness and providing iterative feedback for improvement. The cyclical process enables a dynamic and responsive planning approach. Experiments on the real-world dataset demonstrate the effectiveness of our framework as a continuous and adaptive planning process.

* 4 pages, 2 figures, accepted by The 1st Workshop on AI for Urban Planning (AAAI 2025's Workshop)

Via

Access Paper or Ask Questions