Abstract:Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and solutions (or final states), enabling both forward simulation and inverse recovery with limited paired data. At inference, sampling is guided by a composite loss that enforces agreement with sparse measurements and reduces the PDE residual; we support deterministic, stochastic, and hybrid samplers. We provide error guarantees for these guided procedures. For the deterministic optimizer, a coercivity condition ensures trajectory boundedness and a phase-wise contraction yields logarithmic complexity in the target accuracy. For the stochastic sampler, we introduce adaptive guidance and assume dissipativity of the velocity field to obtain uniform moment bounds independent of the noise-floor parameter. This leads to polynomial-time error bounds, and a matching lower bound shows constant guidance induces an unavoidable positive bias, motivating adaptivity. A hybrid deterministic-stochastic analysis is also provided. Experiments on static and time-dependent benchmark PDEs demonstrate competitive accuracy and faster inference than diffusion-based generative models.
Abstract:The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites. These withdrawals occur at generation sites and are virtually allocated to demand based on network power flows. Consequently, the actual water footprint of a specific load varies dynamically with generation dispatch and network conditions. Existing approaches typically rely on static statistical accounting to quantify these water footprints. However, such static methods fail to capture how dispatch optimization and workload relocation dynamically affect water withdrawals. As a result, static statistical accounting approaches remain decoupled from the optimization process, rendering them incapable of guiding workload relocation or power dispatch to mitigate water stress. To address this limitation, this paper develops an operational electricity-computation-water (ECW) nexus framework that internalizes virtual water impacts directly into power system dispatch. The framework represents dispatch optimization as a differentiable optimization layer embedded within a deep learning architecture, enabling efficient end-to-end learning of coordination policies while preserving operational feasibility. Combined with fixed-point coordination, the framework enforces consistency between virtual water attribution and physical generation-side withdrawals. Case studies on the IEEE 30-bus and 118-bus test systems demonstrate reliable convergence, exact power-water consistency, and reductions of approximately 3-5% in generation-related freshwater withdrawals under water-constrained conditions.
Abstract:Dynamic line rating (DLR) is a methodology that requires timely monitoring data to determine the real-time ampacity of power lines. However, DLR monitoring devices (MD) are vulnerable to connectivity disruptions, leading to missing or delayed data. Although unmanned aerial vehicles (UAV) can enable resilient data collection from MD, their limited onboard energy challenges timely monitoring over extended transmission corridors with flight hazards. This paper proposes a cooperative UAV-based data collection framework with integrated sensing and communication (ISAC) to support timely DLR updates. In this framework, ISAC is employed to maintain the sensing and communication quality required for safe and cooperative UAV data collection. Accordingly, a joint energy minimization problem is formulated over UAV trajectories and collection scheduling under ISAC constraints. To solve it, a hybrid algorithm combining deep reinforcement learning (DRL) and semidefinite relaxation (SDR) is proposed, where DRL optimizes the trajectory and collection scheduling, while SDR is used to handle the non-convex ISAC constraints. Simulation results show that the proposed scheme reduces energy consumption by up to 34.6% compared with offline benchmarks and by about 2.2% compared with the separated sensing-and-communication baseline, while satisfying the minute-level timescale requirement of DLR.
Abstract:Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior work evaluates pedagogical quality via answer leakage-the disclosure of complete solutions instead of scaffolding-but typically assumes well-intentioned learners, leaving tutor robustness under student misuse largely unexplored. In this paper, we study scenarios where students behave adversarially and aim to obtain the correct answer from the tutor. We evaluate a broad set of LLM-based tutor models, including different model families, pedagogically aligned models, and a multi-agent design, under a range of adversarial student attacks. We adapt six groups of adversarial and persuasive techniques to the educational setting and use them to probe how likely a tutor is to reveal the final answer. We evaluate answer leakage robustness using different types of in-context adversarial student agents, finding that they often fail to carry out effective attacks. We therefore introduce an adversarial student agent that we fine-tune to jailbreak LLM-based tutors, which we propose as the core of a standardized benchmark for evaluating tutor robustness. Finally, we present simple but effective defense strategies that reduce answer leakage and strengthen the robustness of LLM-based tutors in adversarial scenarios.
Abstract:Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-context experience injected into the student's prompt. For each input, the student generates multiple reasoning trajectories, while a teacher independently produces its own solution. The teacher then compares the student trajectories with its reasoning and the ground-truth answer, extracting generalized experiences that capture effective reasoning patterns. These experiences are continuously refined and updated over time. A key challenge of context-based distillation is unbounded experience growth and noise accumulation. TED addresses this with an experience compression mechanism that tracks usage statistics and selectively merges, rewrites, or removes low-utility experiences. Experiments on multimodal reasoning benchmarks MathVision and VisualPuzzles show that TED consistently improves performance. On MathVision, TED raises the performance of Qwen3-VL-8B from 0.627 to 0.702, and on VisualPuzzles from 0.517 to 0.561 with just 100 training samples. Under this low-data, no-update setting, TED achieves performance competitive with fully trained parameter-based distillation while reducing training cost by over 5x, demonstrating that meaningful knowledge transfer can be achieved through contextual experience.
Abstract:Parametric partial differential equations (PDEs) are fundamental for modeling a wide range of physical and engineering systems influenced by uncertain or varying parameters. Traditional neural network-based solvers, such as Physics-Informed Neural Networks (PINNs) and Deep Galerkin Methods, often face challenges in generalization and long-time prediction efficiency due to their dependence on full space-time approximations. To address these issues, we propose a novel and scalable framework that significantly enhances the Neural Galerkin Method (NGM) by incorporating the Meta-Auto-Decoder (MAD) paradigm. Our approach leverages space-time decoupling to enable more stable and efficient time integration, while meta-learning-driven adaptation allows rapid generalization to unseen parameter configurations with minimal retraining. Furthermore, randomized sparse updates effectively reduce computational costs without compromising accuracy. Together, these advancements enable our method to achieve physically consistent, long-horizon predictions for complex parameterized evolution equations with significantly lower computational overhead. Numerical experiments on benchmark problems demonstrate that our methods performs comparatively well in terms of accuracy, robustness, and adaptability.




Abstract:Text-to-SQL is a fundamental and longstanding problem in the NLP area, aiming at converting natural language queries into SQL, enabling non-expert users to operate databases. Recent advances in LLM have greatly improved text-to-SQL performance. However, challenges persist, especially when dealing with complex user queries. Current approaches (e.g., COT prompting and multi-agent frameworks) rely on the ability of models to plan and generate SQL autonomously, but controlling performance remains difficult. In addition, LLMs are still prone to hallucinations. To alleviate these challenges, we designed a novel MCTS-SQL to guide SQL generation iteratively. The approach generates SQL queries through Monte Carlo Tree Search (MCTS) and a heuristic self-refinement mechanism are used to enhance accuracy and reliability. Key components include a schema selector for extracting relevant information and an MCTS-based generator for iterative query refinement. Experimental results from the SPIDER and BIRD benchmarks show that MCTS-SQL achieves state-of-the-art performance. Specifically, on the BIRD development dataset, MCTS-SQL achieves an Execution (EX) accuracy of 69.40% using GPT-4o as the base model and a significant improvement when dealing with challenging tasks, with an EX of 51.48%, which is 3.41% higher than the existing method.




Abstract:Clustered federated learning (CFL) addresses the performance challenges posed by data heterogeneity in federated learning (FL) by organizing edge devices with similar data distributions into clusters, enabling collaborative model training tailored to each group. However, existing CFL approaches strictly limit knowledge sharing to within clusters, lacking the integration of global knowledge with intra-cluster training, which leads to suboptimal performance. Moreover, traditional clustering methods incur significant computational overhead, especially as the number of edge devices increases. In this paper, we propose LCFed, an efficient CFL framework to combat these challenges. By leveraging model partitioning and adopting distinct aggregation strategies for each sub-model, LCFed effectively incorporates global knowledge into intra-cluster co-training, achieving optimal training performance. Additionally, LCFed customizes a computationally efficient model similarity measurement method based on low-rank models, enabling real-time cluster updates with minimal computational overhead. Extensive experiments show that LCFed outperforms state-of-the-art benchmarks in both test accuracy and clustering computational efficiency.
Abstract:In the field of finance, the prediction of individual credit default is of vital importance. However, existing methods face problems such as insufficient interpretability and transparency as well as limited performance when dealing with high-dimensional and nonlinear data. To address these issues, this paper introduces a method based on Kolmogorov-Arnold Networks (KANs). KANs is a new type of neural network architecture with learnable activation functions and no linear weights, which has potential advantages in handling complex multi-dimensional data. Specifically, this paper applies KANs to the field of individual credit risk prediction for the first time and constructs the Kolmogorov-Arnold Credit Default Predict (KACDP) model. Experiments show that the KACDP model outperforms mainstream credit default prediction models in performance metrics (ROC_AUC and F1 values). Meanwhile, through methods such as feature attribution scores and visualization of the model structure, the model's decision-making process and the importance of different features are clearly demonstrated, providing transparent and interpretable decision-making basis for financial institutions and meeting the industry's strict requirements for model interpretability. In conclusion, the KACDP model constructed in this paper exhibits excellent predictive performance and satisfactory interpretability in individual credit risk prediction, providing an effective way to address the limitations of existing methods and offering a new and practical credit risk prediction tool for financial institutions.
Abstract:The high-index saddle dynamics (HiSD) method is a powerful approach for computing saddle points and solution landscape. However, its practical applicability is constrained by the need for the explicit energy function expression. To overcome this challenge, we propose a neural network-based high-index saddle dynamics (NN-HiSD) method. It utilizes neural network-based surrogate model to approximates the energy function, allowing the use of the HiSD method in the cases where the energy function is either unavailable or computationally expensive. We further enhance the efficiency of the NN-HiSD method by incorporating momentum acceleration techniques, specifically Nesterov's acceleration and the heavy-ball method. We also provide a rigorous convergence analysis of the NN-HiSD method. We conduct numerical experiments on systems with and without explicit energy functions, specifically including the alanine dipeptide model and bacterial ribosomal assembly intermediates for the latter, demonstrating the effectiveness and reliability of the proposed method.