Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bei Peng

How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?

Feb 02, 2026

Xiaoyuan Cheng, Wenxuan Yuan, Boyang Li, Yuanchao Xu, Yiming Yang, Hao Liang, Bei Peng, Robert Loftin, Zhuo Sun, Yukun Hu

Abstract:Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.

Via

Access Paper or Ask Questions

Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

Nov 13, 2025

Oliver Dippel, Alexei Lisitsa, Bei Peng

Figure 1 for Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

Figure 2 for Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

Figure 3 for Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

Figure 4 for Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

Abstract:Transformers have demonstrated exceptional in-context learning (ICL) capabilities, enabling applications across natural language processing, computer vision, and sequential decision-making. In reinforcement learning, ICL reframes learning as a supervised problem, facilitating task adaptation without parameter updates. Building on prior work leveraging transformers for sequential decision-making, we propose Heuristic Transformer (HT), an in-context reinforcement learning (ICRL) approach that augments the in-context dataset with a belief distribution over rewards to achieve better decision-making. Using a variational auto-encoder (VAE), a low-dimensional stochastic variable is learned to represent the posterior distribution over rewards, which is incorporated alongside an in-context dataset and query states as prompt to the transformer policy. We assess the performance of HT across the Darkroom, Miniworld, and MuJoCo environments, showing that it consistently surpasses comparable baselines in terms of both effectiveness and generalization. Our method presents a promising direction to bridge the gap between belief-based augmentations and transformer-based decision-making.

Via

Access Paper or Ask Questions

FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Oct 26, 2025

Shan Zhong, Shutong Ding, He Diao, Xiangyu Wang, Kah Chan Teh, Bei Peng

Figure 1 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 2 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 3 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Figure 4 for FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

Abstract:Reliable value estimation serves as the cornerstone of reinforcement learning (RL) by evaluating long-term returns and guiding policy improvement, significantly influencing the convergence speed and final performance. Existing works improve the reliability of value function estimation via multi-critic ensembles and distributional RL, yet the former merely combines multi point estimation without capturing distributional information, whereas the latter relies on discretization or quantile regression, limiting the expressiveness of complex value distributions. Inspired by flow matching's success in generative modeling, we propose a generative paradigm for value estimation, named FlowCritic. Departing from conventional regression for deterministic value prediction, FlowCritic leverages flow matching to model value distributions and generate samples for value estimation.

Via

Access Paper or Ask Questions

MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures

Jun 04, 2025

Elena Zamaraeva, Christopher M. Collins, George R. Darling, Matthew S. Dyer, Bei Peng, Rahul Savani, Dmytro Antypov, Vladimir V. Gusev, Judith Clymo, Paul G. Spirakis(+1 more)

Abstract:Geometry optimization of atomic structures is a common and crucial task in computational chemistry and materials design. Following the learning to optimize paradigm, we propose a new multi-agent reinforcement learning method called Multi-Agent Crystal Structure optimization (MACS) to address periodic crystal structure optimization. MACS treats geometry optimization as a partially observable Markov game in which atoms are agents that adjust their positions to collectively discover a stable configuration. We train MACS across various compositions of reported crystalline materials to obtain a policy that successfully optimizes structures from the training compositions as well as structures of larger sizes and unseen compositions, confirming its excellent scalability and zero-shot transferability. We benchmark our approach against a broad range of state-of-the-art optimization methods and demonstrate that MACS optimizes periodic crystal structures significantly faster, with fewer energy calculations, and the lowest failure rate.

Via

Access Paper or Ask Questions

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

May 24, 2025

Zhenglin Huang, Tianxiao Li, Xiangtai Li, Haiquan Wen, Yiwei He, Jiangning Zhang, Hao Fei, Xi Yang, Xiaowei Huang, Bei Peng(+1 more)

Figure 1 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Figure 2 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Figure 3 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Figure 4 for So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Abstract:Recent advances in AI-powered generative models have enabled the creation of increasingly realistic synthetic images, posing significant risks to information integrity and public trust on social media platforms. While robust detection frameworks and diverse, large-scale datasets are essential to mitigate these risks, existing academic efforts remain limited in scope: current datasets lack the diversity, scale, and realism required for social media contexts, while detection methods struggle with generalization to unseen generative technologies. To bridge this gap, we introduce So-Fake-Set, a comprehensive social media-oriented dataset with over 2 million high-quality images, diverse generative sources, and photorealistic imagery synthesized using 35 state-of-the-art generative models. To rigorously evaluate cross-domain robustness, we establish a novel and large-scale (100K) out-of-domain benchmark (So-Fake-OOD) featuring synthetic imagery from commercial models explicitly excluded from the training distribution, creating a realistic testbed for evaluating real-world performance. Leveraging these resources, we present So-Fake-R1, an advanced vision-language framework that employs reinforcement learning for highly accurate forgery detection, precise localization, and explainable inference through interpretable visual rationales. Extensive experiments show that So-Fake-R1 outperforms the second-best method, with a 1.3% gain in detection accuracy and a 4.5% increase in localization IoU. By integrating a scalable dataset, a challenging OOD benchmark, and an advanced detection framework, this work establishes a new foundation for social media-centric forgery detection research. The code, models, and datasets will be released publicly.

Via

Access Paper or Ask Questions

A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models

Apr 19, 2025

Chengming Wang, Dongyao Jia, Wei Wang, Dong Ngoduy, Bei Peng, Jianping Wang

Abstract:Car-following models (CFMs) are fundamental to traffic flow analysis and autonomous driving. Although calibrated physics-based and trained data-driven CFMs can replicate human driving behavior, their reliance on specific datasets limits generalization across diverse scenarios and reduces reliability in real-world deployment. Moreover, these models typically focus on behavioral fidelity and do not support the explicit optimization of local and string stability, which are increasingly important for the safe and efficient operation of autonomous vehicles (AVs). To address these limitations, we propose a Knowledge-Informed Deep Learning (KIDL) paradigm that distills the generalization capabilities of pre-trained Large Language Models (LLMs) into a lightweight and stability-aware neural architecture. LLMs are used to extract fundamental car-following knowledge beyond dataset-specific patterns, and this knowledge is transferred to a reliable, tractable, and computationally efficient model through knowledge distillation. KIDL also incorporates stability constraints directly into its training objective, ensuring that the resulting model not only emulates human-like behavior but also satisfies the local and string stability requirements essential for real-world AV deployment. We evaluate KIDL on the real-world NGSIM and HighD datasets, comparing its performance with representative physics-based, data-driven, and hybrid CFMs. Both empirical and theoretical results consistently demonstrate KIDL's superior behavioral generalization and traffic flow stability, offering a robust and scalable solution for next-generation traffic systems.

Via

Access Paper or Ask Questions

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Dec 05, 2024

Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, Guangliang Cheng

Figure 1 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 2 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 3 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 4 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Abstract:The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.

Via

Access Paper or Ask Questions

Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Apr 25, 2024

Tianhui Zhang, Bei Peng, Danushka Bollegala

Figure 1 for Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Figure 2 for Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Figure 3 for Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Figure 4 for Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

Abstract:Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge, while generating coherent sentences. Although the quality of the generated sentences is crucial, the diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. Large Language Models (LLMs) have shown proficiency in enhancing the generation quality across various tasks through in-context learning (ICL) using given examples without the need for any fine-tuning. However, the diversity aspect in LLM outputs has not been systematically studied before. To address this, we propose a simple method that diversifies the LLM generations, while preserving their quality. Experimental results on three benchmark GCR datasets show that our method achieves an ideal balance between the quality and diversity. Moreover, the sentences generated by our proposed method can be used as training data to improve diversity in existing commonsense generators.

* 16 pages, 6 figures

Via

Access Paper or Ask Questions

Gradable ChatGPT Translation Evaluation

Jan 18, 2024

Hui Jiao, Bei Peng, Lu Zong, Xiaojun Zhang, Xinwei Li

Abstract:ChatGPT, as a language model based on large-scale pre-training, has exerted a profound influence on the domain of machine translation. In ChatGPT, a "Prompt" refers to a segment of text or instruction employed to steer the model towards generating a specific category of response. The design of the translation prompt emerges as a key aspect that can wield influence over factors such as the style, precision and accuracy of the translation to a certain extent. However, there is a lack of a common standard and methodology on how to design and select a translation prompt. Accordingly, this paper proposes a generic taxonomy, which defines gradable translation prompts in terms of expression type, translation style, POS information and explicit statement, thus facilitating the construction of prompts endowed with distinct attributes tailored for various translation tasks. Specific experiments and cases are selected to validate and illustrate the effectiveness of the method.

* Under review in the journal Procesamiento del Lenguaje Natural

Via

Access Paper or Ask Questions

A Covariance Adaptive Student's t Based Kalman Filter

Sep 18, 2023

Benyang Gong, Jiacheng He, Gang Wang, Bei Peng

Figure 1 for A Covariance Adaptive Student's t Based Kalman Filter

Figure 2 for A Covariance Adaptive Student's t Based Kalman Filter

Figure 3 for A Covariance Adaptive Student's t Based Kalman Filter

Figure 4 for A Covariance Adaptive Student's t Based Kalman Filter

Abstract:In the classical Kalman filter(KF), the estimated state is a linear combination of the one-step predicted state and measurement state, their confidence level change when the prediction mean square error matrix and covariance matrix of measurement noise vary. The existing student's t based Kalman filter(TKF) works similarly to the way KF works, they both work well with impulse noise, but when it comes to Gaussian noise, TKF encounters an adjustment limit of the confidence level, this can lead to inaccuracies in such situations. This brief optimizes TKF by using the Gaussian mixture model(GMM), which generates a reasonable covariance matrix from the measurement noise to replace the one used in the existing algorithm and breaks the adjustment limit of the confidence level. At the end of the brief, the performance of the covariance adaptive student's t based Kalman filter(TGKF) is verified.

Via

Access Paper or Ask Questions