Abstract:Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 9.55 and 6.08 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$\times$ reduction in monetary cost.
Abstract:Constrained Reinforcement Learning (CRL) aims to optimize decision-making policies under constraint conditions, making it highly applicable to safety-critical domains such as autonomous driving, robotics, and power grid management. However, existing robust CRL approaches predominantly focus on single-step perturbations and temporally independent adversarial models, lacking explicit modeling of robustness against temporally coupled perturbations. To tackle these challenges, we propose TCRL, a novel temporal-coupled adversarial training framework for robust constrained reinforcement learning (TCRL) in worst-case scenarios. First, TCRL introduces a worst-case-perceived cost constraint function that estimates safety costs under temporally coupled perturbations without the need to explicitly model adversarial attackers. Second, TCRL establishes a dual-constraint defense mechanism on the reward to counter temporally coupled adversaries while maintaining reward unpredictability. Experimental results demonstrate that TCRL consistently outperforms existing methods in terms of robustness against temporally coupled perturbation attacks across a variety of CRL tasks.
Abstract:The algorithms based on message passing neural networks (MPNNs) on graphs have recently achieved great success for various graph applications. However, studies find that these methods always propagate the information to very limited neighborhoods with shallow depth, particularly due to over-smoothing. That means most of the existing MPNNs fail to be so `deep'. Although some previous work tended to handle this challenge via optimization- or structure-level remedies, the overall performance of GCNs still suffers from limited accuracy, poor stability, and unaffordable computational cost. Moreover, neglect of higher-order relationships during the propagation of MPNNs has further limited the performance of them. To overcome these challenges, a novel variant of PageRank named motif-based personalized PageRank (MPPR) is proposed to measure the influence of one node to another on the basis of considering higher-order motif relationships. Secondly, the MPPR is utilized to the message passing process of GCNs, thereby guiding the message passing process at a relatively `high' level. The experimental results show that the proposed method outperforms almost all of the baselines on accuracy, stability, and time consumption. Additionally, the proposed method can be considered as a component that can underpin almost all GCN tasks, with DGCRL being demonstrated in the experiment. The anonymous code repository is available at: https://anonymous.4open.science/r/GCN-MPPR-AFD6/.