Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zeyu Zheng

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Oct 09, 2025

Congming Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu(+1 more)

Abstract:Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.

Via

Access Paper or Ask Questions

An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Oct 09, 2025

Binzhe Yuan, Xiangyu Zhang, Zeyu Zheng, Yuefeng Zhang, Haochuan Wan, Zhechen Yuan, Junsheng Chen, Yunxiang He, Junran Ding, Xiaoming Zhang(+5 more)

Figure 1 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Figure 2 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Figure 3 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Figure 4 for An Energy-Efficient Edge Coprocessor for Neural Rendering with Explicit Data Reuse Strategies

Abstract:Neural radiance fields (NeRF) have transformed 3D reconstruction and rendering, facilitating photorealistic image synthesis from sparse viewpoints. This work introduces an explicit data reuse neural rendering (EDR-NR) architecture, which reduces frequent external memory accesses (EMAs) and cache misses by exploiting the spatial locality from three phases, including rays, ray packets (RPs), and samples. The EDR-NR architecture features a four-stage scheduler that clusters rays on the basis of Z-order, prioritize lagging rays when ray divergence happens, reorders RPs based on spatial proximity, and issues samples out-of-orderly (OoO) according to the availability of on-chip feature data. In addition, a four-tier hierarchical RP marching (HRM) technique is integrated with an axis-aligned bounding box (AABB) to facilitate spatial skipping (SS), reducing redundant computations and improving throughput. Moreover, a balanced allocation strategy for feature storage is proposed to mitigate SRAM bank conflicts. Fabricated using a 40 nm process with a die area of 10.5 mmX, the EDR-NR chip demonstrates a 2.41X enhancement in normalized energy efficiency, a 1.21X improvement in normalized area efficiency, a 1.20X increase in normalized throughput, and a 53.42% reduction in on-chip SRAM consumption compared to state-of-the-art accelerators.

* 11 pages, 17 figures, 2 tables

Via

Access Paper or Ask Questions

Controllable Coupled Image Generation via Diffusion Models

Jun 07, 2025

Chenfei Yuan, Nanshan Jia, Hangqi Li, Peter W. Glynn, Zeyu Zheng

Abstract:We provide an attention-level control method for the task of coupled image generation, where "coupled" means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled, the centered objects in the generated images are still expected to enjoy the flexibility raised from different text prompts. The proposed method disentangles the background and entity components in the model's cross-attention modules, attached with a sequence of time-varying weight control parameters depending on the time step of sampling. We optimize this sequence of weight control parameters with a combined objective that assesses how coupled the backgrounds are as well as text-to-image alignment and overall visual quality. Empirical results demonstrate that our method outperforms existing approaches across these criteria.

Via

Access Paper or Ask Questions

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis

May 23, 2025

Hanyu Li, Haoyu Liu, Tingyu Zhu, Tianyu Guo, Zeyu Zheng, Xiaotie Deng, Michael I. Jordan

Abstract:Large Language Models (LLMs) show promise as data analysis agents, but existing benchmarks overlook the iterative nature of the field, where experts' decisions evolve with deeper insights of the dataset. To address this, we introduce IDA-Bench, a novel benchmark evaluating LLM agents in multi-round interactive scenarios. Derived from complex Kaggle notebooks, tasks are presented as sequential natural language instructions by an LLM-simulated user. Agent performance is judged by comparing its final numerical output to the human-derived baseline. Initial results show that even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on < 50% of the tasks, highlighting limitations not evident in single-turn tests. This work underscores the need to improve LLMs' multi-round capabilities for building more reliable data analysis agents, highlighting the necessity of achieving a balance between instruction following and reasoning.

Via

Access Paper or Ask Questions

Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Apr 15, 2025

Nanshan Jia, Chenfei Yuan, Yuhang Wu, Zeyu Zheng

Figure 1 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 2 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 3 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Figure 4 for Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

Abstract:The fast development of Large Language Models (LLMs) offers growing opportunities to further improve sequential recommendation systems. Yet for some practitioners, integrating LLMs to their existing base recommendation systems raises questions about model interpretability, transparency and related safety. To partly alleviate challenges from these questions, we propose guided embedding refinement, a method that carries out a guided and interpretable usage of LLM to enhance the embeddings associated with the base recommendation system. Instead of directly using LLMs as the backbone of sequential recommendation systems, we utilize them as auxiliary tools to emulate the sales logic of recommendation and generate guided embeddings that capture domain-relevant semantic information on interpretable attributes. Benefiting from the strong generalization capabilities of the guided embedding, we construct refined embedding by using the guided embedding and reduced-dimension version of the base embedding. We then integrate the refined embedding into the recommendation module for training and inference. A range of numerical experiments demonstrate that guided embedding is adaptable to various given existing base embedding models, and generalizes well across different recommendation tasks. The numerical results show that the refined embedding not only improves recommendation performance, achieving approximately $10\%$ to $50\%$ gains in Mean Reciprocal Rank (MRR), Recall rate, and Normalized Discounted Cumulative Gain (NDCG), but also enhances interpretability, as evidenced by case studies.

Via

Access Paper or Ask Questions

Collaborative Bayesian Optimization via Wasserstein Barycenters

Apr 15, 2025

Donglin Zhan, Haoting Zhang, Rhonda Righter, Zeyu Zheng, James Anderson

Figure 1 for Collaborative Bayesian Optimization via Wasserstein Barycenters

Figure 2 for Collaborative Bayesian Optimization via Wasserstein Barycenters

Figure 3 for Collaborative Bayesian Optimization via Wasserstein Barycenters

Figure 4 for Collaborative Bayesian Optimization via Wasserstein Barycenters

Abstract:Motivated by the growing need for black-box optimization and data privacy, we introduce a collaborative Bayesian optimization (BO) framework that addresses both of these challenges. In this framework agents work collaboratively to optimize a function they only have oracle access to. In order to mitigate against communication and privacy constraints, agents are not allowed to share their data but can share their Gaussian process (GP) surrogate models. To enable collaboration under these constraints, we construct a central model to approximate the objective function by leveraging the concept of Wasserstein barycenters of GPs. This central model integrates the shared models without accessing the underlying data. A key aspect of our approach is a collaborative acquisition function that balances exploration and exploitation, allowing for the optimization of decision variables collaboratively in each iteration. We prove that our proposed algorithm is asymptotically consistent and that its implementation via Monte Carlo methods is numerically accurate. Through numerical experiments, we demonstrate that our approach outperforms other baseline collaborative frameworks and is competitive with centralized approaches that do not consider data privacy.

Via

Access Paper or Ask Questions

Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Jan 29, 2025

Jinghai He, Cheng Hua, Chunyang Zhou, Zeyu Zheng

Figure 1 for Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Figure 2 for Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Figure 3 for Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Figure 4 for Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

Abstract:We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.

Via

Access Paper or Ask Questions

Black-box Optimization with Simultaneous Statistical Inference for Optimal Performance

Jan 14, 2025

Teng Lian, Jian-Qiang Hu, Yuhang Wu, Zeyu Zheng

Abstract:Black-box optimization is often encountered for decision-making in complex systems management, where the knowledge of system is limited. Under these circumstances, it is essential to balance the utilization of new information with computational efficiency. In practice, decision-makers often face the dual tasks of optimization and statistical inference for the optimal performance, in order to achieve it with a high reliability. Our goal is to address the dual tasks in an online fashion. Wu et al (2022) [arXiv preprint: 2210.06737] point out that the sample average of performance estimates generated by the optimization algorithm needs not to admit a central limit theorem. We propose an algorithm that not only tackles this issue, but also provides an online consistent estimator for the variance of the performance. Furthermore, we characterize the convergence rate of the coverage probabilities of the asymptotic confidence intervals.

Via

Access Paper or Ask Questions

Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Oct 24, 2024

Nanshan Jia, Tingyu Zhu, Haoyu Liu, Zeyu Zheng

Figure 1 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 2 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 3 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Figure 4 for Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Abstract:We propose a class of structured diffusion models, in which the prior distribution is chosen as a mixture of Gaussians, rather than a standard Gaussian distribution. The specific mixed Gaussian distribution, as prior, can be chosen to incorporate certain structured information of the data. We develop a simple-to-implement training procedure that smoothly accommodates the use of mixed Gaussian as prior. Theory is provided to quantify the benefits of our proposed models, compared to the classical diffusion models. Numerical experiments with synthetic, image and operational data are conducted to show comparative advantages of our model. Our method is shown to be robust to mis-specifications and in particular suits situations where training resources are limited or faster training in real time is desired.

Via

Access Paper or Ask Questions

Symbolic Music Generation with Fine-grained Interactive Textural Guidance

Oct 11, 2024

Tingyu Zhu, Haoyu Liu, Zhimin Jiang, Zeyu Zheng

Abstract:The problem of symbolic music generation presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To overcome these difficulties, we introduce Fine-grained Textural Guidance (FTG) within diffusion models to correct errors in the learned distributions. By incorporating FTG, the diffusion models improve the accuracy of music generation, which makes them well-suited for advanced tasks such as progressive music generation, improvisation and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effect of the FTG approach. We provide numerical experiments and a demo page for interactive music generation with user input to showcase the effectiveness of our approach.

Via

Access Paper or Ask Questions