Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuyuan Xiong

LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching

May 29, 2026

Yao Lai, Xuyuan Xiong, Zeyue Xue, Guojin Chen, Jing Wang, Xihui Liu, Rui Zhang, Robert Mullins, Bei Yu, Ping Luo

Abstract:In semiconductor manufacturing, lithography projects circuit layouts onto silicon wafers through an optical mask. As circuit features shrink below the wavelength of light, optical diffraction causes the printed patterns to deviate from their intended layouts. Inverse Lithography Technology (ILT) addresses this challenge by generating optimized masks that enhance the fidelity of pattern transfer onto wafers. While ILT resembles an image synthesis task, its reliance on explicit physical metrics for mask evaluation limits the applicability of existing generative models. We introduce LithoGRPO, an ILT framework that integrates the flow-matching paradigm with GRPO-based reinforcement learning (RL) fine-tuning, enabling efficient exploration of diverse masks for a given target layout. Unlike purely generative or optimization-based approaches, RL in LithoGRPO exploits the explicitly defined, physics-based reward function of ILT, enabling optimization under complex, process-aware constraints. To the best of our knowledge, this is the first framework that unifies flow matching and RL for mask optimization. To improve RL sampling efficiency, we propose a fast shot-counting algorithm for manufacturability evaluation, achieving over 130x speedup while preserving the mask ranking of the traditional shot-count metric. Extensive experiments demonstrate that LithoGRPO achieves state-of-the-art performance over both optimization-based and learning-based methods, while maintaining efficient mask generation.

* ICML 2026

Via

Access Paper or Ask Questions

SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Oct 22, 2025

Xuyuan Xiong, Pedro Chumpitaz-Flores, Kaixun Hua, Cheng Hua

Figure 1 for SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Figure 2 for SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Figure 3 for SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Figure 4 for SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Abstract:Interpretable reinforcement learning policies are essential for high-stakes decision-making, yet optimizing decision tree policies in Markov Decision Processes (MDPs) remains challenging. We propose SPOT, a novel method for computing decision tree policies, which formulates the optimization problem as a mixed-integer linear program (MILP). To enhance efficiency, we employ a reduced-space branch-and-bound approach that decouples the MDP dynamics from tree-structure constraints, enabling efficient parallel search. This significantly improves runtime and scalability compared to previous methods. Our approach ensures that each iteration yields the optimal decision tree. Experimental results on standard benchmarks demonstrate that SPOT achieves substantial speedup and scales to larger MDPs with a significantly higher number of states. The resulting decision tree policies are interpretable and compact, maintaining transparency without compromising performance. These results demonstrate that our approach simultaneously achieves interpretability and scalability, delivering high-quality policies an order of magnitude faster than existing approaches.

Via

Access Paper or Ask Questions

MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Sep 28, 2024

Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan

Figure 1 for MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Figure 2 for MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Figure 3 for MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Figure 4 for MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Abstract:Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, MetaMath, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.

Via

Access Paper or Ask Questions