Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hua Wang

URPO: A Unified Reward & Policy Optimization Framework for Large Language Models

Jul 23, 2025

Songshuo Lu, Hua Wang, Zhi Chen, Yaohua Tang

Abstract:Large-scale alignment pipelines typically pair a policy model with a separately trained reward model whose parameters remain frozen during reinforcement learning (RL). This separation creates a complex, resource-intensive pipeline and suffers from a performance ceiling due to a static reward signal. We propose a novel framework, Unified Reward & Policy Optimization (URPO), that unifies instruction-following ("player") and reward modeling ("referee") within a single model and a single training phase. Our method recasts all alignment data-including preference pairs, verifiable reasoning, and open-ended instructions-into a unified generative format optimized by a single Group-Relative Policy Optimization (GRPO) loop. This enables the model to learn from ground-truth preferences and verifiable logic while simultaneously generating its own rewards for open-ended tasks. Experiments on the Qwen2.5-7B model demonstrate URPO's superiority. Our unified model significantly outperforms a strong baseline using a separate generative reward model, boosting the instruction-following score on AlpacaEval from 42.24 to 44.84 and the composite reasoning average from 32.66 to 35.66. Furthermore, URPO cultivates a superior internal evaluator as a byproduct of training, achieving a RewardBench score of 85.15 and surpassing the dedicated reward model it replaces (83.55). By eliminating the need for a separate reward model and fostering a co-evolutionary dynamic between generation and evaluation, URPO presents a simpler, more efficient, and more effective path towards robustly aligned language models.

Via

Access Paper or Ask Questions

ABG-NAS: Adaptive Bayesian Genetic Neural Architecture Search for Graph Representation Learning

Apr 30, 2025

Sixuan Wang, Jiao Yin, Jinli Cao, MingJian Tang, Hua Wang, Yanchun Zhang

Abstract:Effective and efficient graph representation learning is essential for enabling critical downstream tasks, such as node classification, link prediction, and subgraph search. However, existing graph neural network (GNN) architectures often struggle to adapt to diverse and complex graph structures, limiting their ability to provide robust and generalizable representations. To address this challenge, we propose ABG-NAS, a novel framework for automated graph neural network architecture search tailored for efficient graph representation learning. ABG-NAS encompasses three key components: a Comprehensive Architecture Search Space (CASS), an Adaptive Genetic Optimization Strategy (AGOS), and a Bayesian-Guided Tuning Module (BGTM). CASS systematically explores diverse propagation (P) and transformation (T) operations, enabling the discovery of GNN architectures capable of capturing intricate graph characteristics. AGOS dynamically balances exploration and exploitation, ensuring search efficiency and preserving solution diversity. BGTM further optimizes hyperparameters periodically, enhancing the scalability and robustness of the resulting architectures. Empirical evaluations on benchmark datasets (Cora, PubMed, Citeseer, and CoraFull) demonstrate that ABG-NAS consistently outperforms both manually designed GNNs and state-of-the-art neural architecture search (NAS) methods. These results highlight the potential of ABG-NAS to advance graph representation learning by providing scalable and adaptive solutions for diverse graph structures. Our code is publicly available at https://github.com/sserranw/ABG-NAS.

Via

Access Paper or Ask Questions

DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Mar 14, 2025

Yibin Xu, Liang Yang, Hao Chen, Hua Wang, Zhi Chen, Yaohua Tang

Figure 1 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 2 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 3 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Figure 4 for DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Abstract:The limitation of graphical user interface (GUI) data has been a significant barrier to the development of GUI agents today, especially for the desktop / computer use scenarios. To address this, we propose an automated GUI data generation pipeline, AutoCaptioner, which generates data with rich descriptions while minimizing human effort. Using AutoCaptioner, we created a novel large-scale desktop GUI dataset, DeskVision, along with the largest desktop test benchmark, DeskVision-Eval, which reflects daily usage and covers diverse systems and UI elements, each with rich descriptions. With DeskVision, we train a new GUI understanding model, GUIExplorer. Results show that GUIExplorer achieves state-of-the-art (SOTA) performance in understanding/grounding visual elements without the need for complex architectural designs. We further validated the effectiveness of the DeskVision dataset through ablation studies on various large visual language models (LVLMs). We believe that AutoCaptioner and DeskVision will significantly advance the development of GUI agents, and will open-source them for the community.

Via

Access Paper or Ask Questions

SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models

Feb 27, 2025

Zicheng Cai, Yaohua Tang, Yutao Lai, Hua Wang, Zhi Chen, Hao Chen

Abstract:We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method. Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation. In the self-evolution stage, LLMs initially lack sufficient reference examples, so we implement an iterative refinement mechanism that enhances architectures based on performance feedback. Over time, this process accumulates a repository of high-performance architectures. In the knowledge distillation stage, LLMs analyze common patterns among these architectures to generate new, optimized designs. Combining these two stages, SEKI greatly leverages the capacity of LLMs on NAS and without requiring any domain-specific data. Experimental results show that SEKI achieves state-of-the-art (SOTA) performance across various datasets and search spaces while requiring only 0.05 GPU-days, outperforming existing methods in both efficiency and accuracy. Furthermore, SEKI demonstrates strong generalization capabilities, achieving SOTA-competitive results across multiple tasks.

Via

Access Paper or Ask Questions

Transfer Learning Assisted Fast Design Migration Over Technology Nodes: A Study on Transformer Matching Network

Feb 25, 2025

Chenhao Chu, Yuhao Mao, Hua Wang

Abstract:In this study, we introduce an innovative methodology for the design of mm-Wave passive networks that leverages knowledge transfer from a pre-trained synthesis neural network (NN) model in one technology node and achieves swift and reliable design adaptation across different integrated circuit (IC) technologies, operating frequencies, and metal options. We prove this concept through simulation-based demonstrations focusing on the training and comparison of the coefficient of determination (R2) of synthesis NNs for 1:1 on-chip transformers in GlobalFoundries(GF) 22nm FDX+ (target domain), with and without transfer learning from a model trained in GF 45nm SOI (source domain). In the experiments, we explore varying target data densities of 0.5%, 1%, 5%, and 100% with a complete dataset of 0.33 million in GF 22FDX+, and for comparative analysis, apply source data densities of 25%, 50%, 75%, and 100% with a complete dataset of 2.5 million in GF 45SOI. With the source data only at 30GHz, the experiments span target data from two metal options in GF 22FDX+ at frequencies of 30 and 39 GHz. The results prove that the transfer learning with the source domain knowledge (GF 45SOI) can both accelerate the training process in the target domain (GF 22FDX+) and improve the R2 values compared to models without knowledge transfer. Furthermore, it is observed that a model trained with just 5% of target data and augmented by transfer learning achieves R2 values superior to a model trained with 20% of the data without transfer, validating the advantage seen from 1% to 5% data density. This demonstrates a notable reduction of 4X in the necessary dataset size highlighting the efficacy of utilizing transfer learning to mm-Wave passive network design. The PyTorch learning and testing code is publicly available at https://github.com/ChenhaoChu/RFIC-TL.

* Publihsed and Presented at IEEE MTT-S International Microwave Symposium (IMS 2024), Washington, DC, USA

Via

Access Paper or Ask Questions

Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Feb 21, 2025

Yaohua Tang, Zhicheng Hu, Kun Cheng, Fan Mo, Qiheng Lv, Hua Wang, Zhi Chen

Figure 1 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 2 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 3 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Figure 4 for Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference

Abstract:The increasing context window size in large language models (LLMs) has improved their ability to handle complex, long-text tasks. However, as the conversation rounds continue, it is required to store a large amount of KV cache in GPU memory, which significantly affects the efficiency and even availability of the model serving systems. This paper analyzes dialogue data from real users and discovers that the LLM inference manifests a watershed layer, after which the distribution of round-level attention shows notable similarity. We propose Round Attention, a novel round-level attention mechanism that only recalls and computes the KV cache of the most relevant rounds. The experiments show that our method saves 55\% memory usage without compromising model performance.

Via

Access Paper or Ask Questions

Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

Nov 05, 2024

Tao Huang, Jiayang Meng, Hong Chen, Guolong Zheng, Xu Yang, Xun Yi, Hua Wang

Figure 1 for Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

Figure 2 for Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

Figure 3 for Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

Figure 4 for Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

Abstract:We investigate the construction of gradient-guided conditional diffusion models for reconstructing private images, focusing on the adversarial interplay between differential privacy noise and the denoising capabilities of diffusion models. While current gradient-based reconstruction methods struggle with high-resolution images due to computational complexity and prior knowledge requirements, we propose two novel methods that require minimal modifications to the diffusion model's generation process and eliminate the need for prior knowledge. Our approach leverages the strong image generation capabilities of diffusion models to reconstruct private images starting from randomly generated noise, even when a small amount of differentially private noise has been added to the gradients. We also conduct a comprehensive theoretical analysis of the impact of differential privacy noise on the quality of reconstructed images, revealing the relationship among noise magnitude, the architecture of attacked models, and the attacker's reconstruction capability. Additionally, extensive experiments validate the effectiveness of our proposed methods and the accuracy of our theoretical findings, suggesting new directions for privacy risk auditing using conditional diffusion models.

Via

Access Paper or Ask Questions

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Oct 10, 2024

Songshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang

Figure 1 for TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Figure 2 for TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Figure 3 for TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Figure 4 for TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Abstract:Current Retrieval-Augmented Generation (RAG) systems concatenate and process numerous retrieved document chunks for prefill which requires a large volume of computation, therefore leading to significant latency in time-to-first-token (TTFT). To reduce the computation overhead as well as TTFT, we introduce TurboRAG, a novel RAG system that redesigns the inference paradigm of the current RAG system by first pre-computing and storing the key-value (KV) caches of documents offline, and then directly retrieving the saved KV cache for prefill. Hence, online computation of KV caches is eliminated during inference. In addition, we provide a number of insights into the mask matrix and positional embedding mechanisms, plus fine-tune a pretrained language model to maintain model accuracy of TurboRAG. Our approach is applicable to most existing large language models and their applications without any requirement in modification of models and inference systems. Experimental results across a suite of RAG benchmarks demonstrate that TurboRAG reduces TTFT by up to 9.4x compared to the conventional RAG systems (on an average of 8.6x), but reserving comparable performance to the standard RAG systems.

Via

Access Paper or Ask Questions

MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

Aug 09, 2024

Junhao Xu, Zhenlin Liang, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang

Abstract:In this paper, we present MooER, a LLM-based large-scale automatic speech recognition (ASR) / automatic speech translation (AST) model of Moore Threads. A 5000h pseudo labeled dataset containing open source and self collected speech data is used for training. We achieve performance comparable to other open source models trained with up to hundreds of thousands of hours of labeled speech data. Meanwhile, experiments conducted on Covost2 Zh2en testset suggest that our model outperforms other open source Speech LLMs. A BLEU score of 25.2 can be obtained. The main contributions of this paper are summarized as follows. First, this paper presents a training strategy for encoders and LLMs on speech related tasks (including ASR and AST) using a small size of pseudo labeled data without any extra manual annotation and selection. Second, we release our ASR and AST models and plan to open-source our training code and strategy in the near future. Moreover, a model trained on 8wh scale training data is planned to be released later on.

Via

Access Paper or Ask Questions

A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation

Jul 13, 2024

Maohan Liang, Ryan Wen Liu, Ruobin Gao, Zhe Xiao, Xiaocai Zhang, Hua Wang

Abstract:Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. Initially, we conducted a thorough literature review using relevant keywords to gather and summarize pertinent research papers and datasets. Then, this paper discussed the principal methods of data pre-processing that prepare data for further analysis. The survey progresses to detail the leading algorithms for measuring vessel trajectory similarity and the main clustering techniques used in the field today. Furthermore, the various applications of trajectory clustering within the maritime context are explored. Finally, the paper evaluates the effectiveness of different algorithm combinations and pre-processing methods through experimental analysis, focusing on their impact on the performance of distance-based trajectory clustering algorithms. The experimental results demonstrate the effectiveness of various trajectory clustering algorithms and notably highlight the significant improvements that trajectory compression techniques contribute to the efficiency and accuracy of trajectory clustering. This comprehensive approach ensures a deep understanding of current capabilities and future directions in vessel trajectory clustering.

Via

Access Paper or Ask Questions