Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ying Sun

Deep classifier kriging for probabilistic spatial prediction of air quality index

Dec 29, 2025

Junyu Chen, Pratik Nag, Huixia Judy-Wang, Ying Sun

Abstract:Accurate spatial interpolation of the air quality index (AQI), computed from concentrations of multiple air pollutants, is essential for regulatory decision-making, yet AQI fields are inherently non-Gaussian and often exhibit complex nonlinear spatial structure. Classical spatial prediction methods such as kriging are linear and rely on Gaussian assumptions, which limits their ability to capture these features and to provide reliable predictive distributions. In this study, we propose \textit{deep classifier kriging} (DCK), a flexible, distribution-free deep learning framework for estimating full predictive distribution functions for univariate and bivariate spatial processes, together with a \textit{data fusion} mechanism that enables modeling of non-collocated bivariate processes and integration of heterogeneous air pollution data sources. Through extensive simulation experiments, we show that DCK consistently outperforms conventional approaches in predictive accuracy and uncertainty quantification. We further apply DCK to probabilistic spatial prediction of AQI by fusing sparse but high-quality station observations with spatially continuous yet biased auxiliary model outputs, yielding spatially resolved predictive distributions that support downstream tasks such as exceedance and extreme-event probability estimation for regulatory risk assessment and policy formulation.

Via

Access Paper or Ask Questions

ChemATP: A Training-Free Chemical Reasoning Framework for Large Language Models

Dec 22, 2025

Mingxu Zhang, Dazhong Shen, Qi Zhang, Ying Sun

Abstract:Large Language Models (LLMs) exhibit strong general reasoning but struggle in molecular science due to the lack of explicit chemical priors in standard string representations. Current solutions face a fundamental dilemma. Training-based methods inject priors into parameters, but this static coupling hinders rapid knowledge updates and often compromises the model's general reasoning capabilities. Conversely, existing training-free methods avoid these issues but rely on surface-level prompting, failing to provide the fine-grained atom-level priors essential for precise chemical reasoning. To address this issue, we introduce ChemATP, a framework that decouples chemical knowledge from the reasoning engine. By constructing the first atom-level textual knowledge base, ChemATP enables frozen LLMs to explicitly retrieve and reason over this information dynamically. This architecture ensures interpretability and adaptability while preserving the LLM's intrinsic general intelligence. Experiments show that ChemATP significantly outperforms training-free baselines and rivals state-of-the-art training-based models, demonstrating that explicit prior injection is a competitive alternative to implicit parameter updates.

Via

Access Paper or Ask Questions

Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection

Oct 30, 2025

Ziling Ma, Ángel López-Oriona, Hernando Ombao, Ying Sun

Figure 1 for Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection

Figure 2 for Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection

Figure 3 for Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection

Figure 4 for Robust fuzzy clustering for high-dimensional multivariate time series with outlier detection

Abstract:Fuzzy clustering provides a natural framework for modeling partial memberships, particularly important in multivariate time series (MTS) where state boundaries are often ambiguous. For example, in EEG monitoring of driver alertness, neural activity evolves along a continuum (from unconscious to fully alert, with many intermediate levels of drowsiness) so crisp labels are unrealistic and partial memberships are essential. However, most existing algorithms are developed for static, low-dimensional data and struggle with temporal dependence, unequal sequence lengths, high dimensionality, and contamination by noise or artifacts. To address these challenges, we introduce RFCPCA, a robust fuzzy subspace-clustering method explicitly tailored to MTS that, to the best of our knowledge, is the first of its kind to simultaneously: (i) learn membership-informed subspaces, (ii) accommodate unequal lengths and moderately high dimensions, (iii) achieve robustness through trimming, exponential reweighting, and a dedicated noise cluster, and (iv) automatically select all required hyperparameters. These components enable RFCPCA to capture latent temporal structure, provide calibrated membership uncertainty, and flag series-level outliers while remaining stable under contamination. On driver drowsiness EEG, RFCPCA improves clustering accuracy over related methods and yields a more reliable characterization of uncertainty and outlier structure in MTS.

Via

Access Paper or Ask Questions

OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects

Oct 23, 2025

Mark He Huang, Lin Geng Foo, Christian Theobalt, Ying Sun, De Wen Soh

Abstract:Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.

* NeurIPS 2025 (Spotlight)

Via

Access Paper or Ask Questions

Scalable Asynchronous Federated Modeling for Spatial Data

Oct 02, 2025

Jianwei Shi, Sameh Abdulah, Ying Sun, Marc G. Genton

Abstract:Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical solution that preserves data privacy while enabling global modeling across distributed data sources. For instance, environmental sensor networks are privacy- and bandwidth-constrained, motivating federated spatial modeling that shares only privacy-preserving summaries to produce timely, high-resolution pollution maps without centralizing raw data. However, existing federated modeling approaches either ignore spatial dependence or rely on synchronous updates that suffer from stragglers in heterogeneous environments. This work proposes an asynchronous federated modeling framework for spatial data based on low-rank Gaussian process approximations. The method employs block-wise optimization and introduces strategies for gradient correction, adaptive aggregation, and stabilized updates. We establish linear convergence with explicit dependence on staleness, a result of standalone theoretical significance. Moreover, numerical experiments demonstrate that the asynchronous algorithm achieves synchronous performance under balanced resource allocation and significantly outperforms it in heterogeneous settings, showcasing superior robustness and scalability.

Via

Access Paper or Ask Questions

Flow Matching-Based Active Learning for Radio Map Construction with Low-Altitude UAVs

Sep 17, 2025

Hao Sun, Shicong Liu, Xianghao Yu, Ying Sun

Abstract:The employment of unmanned aerial vehicles (UAVs) in the lowaltitude economy necessitates precise and real-time radio maps for reliable communication and safe navigation. However, constructing such maps is hindered by the infeasibility of exhaustive measurements due to UAVs' limited flight endurance. To address this, we propose a novel active learning framework for low-altitude radio map construction based on limited measurements. First, a Plug-and-Play (PnP)-refined flow matching algorithm is introduced, which leverages flow matching as a powerful generative prior within a PnP scheme to reconstruct high-fidelity radio maps. Second, the generative nature of flow matching is exploited to quantify uncertainty by generating an ensemble of radio maps and computing the location-wise variance. The resulting uncertainty map guides a multi-objective candidate selection and then a trajectory is planned via utility-aware path search (UAPS), directing the UAV to the most informative locations while taking travel costs into account. Simulation results demonstrate that our method significantly outperforms the baselines, achieving more than a 70% reduction in normalized mean squared error (NMSE).

Via

Access Paper or Ask Questions

Modeling nonstationary spatial processes with normalizing flows

Sep 16, 2025

Pratik Nag, Andrew Zammit-Mangion, Ying Sun

Figure 1 for Modeling nonstationary spatial processes with normalizing flows

Figure 2 for Modeling nonstationary spatial processes with normalizing flows

Figure 3 for Modeling nonstationary spatial processes with normalizing flows

Figure 4 for Modeling nonstationary spatial processes with normalizing flows

Abstract:Nonstationary spatial processes can often be represented as stationary processes on a warped spatial domain. Selecting an appropriate spatial warping function for a given application is often difficult and, as a result of this, warping methods have largely been limited to two-dimensional spatial domains. In this paper, we introduce a novel approach to modeling nonstationary, anisotropic spatial processes using neural autoregressive flows (NAFs), a class of invertible mappings capable of generating complex, high-dimensional warpings. Through simulation studies we demonstrate that a NAF-based model has greater representational capacity than other commonly used spatial process models. We apply our proposed modeling framework to a subset of the 3D Argo Floats dataset, highlighting the utility of our framework in real-world applications.

Via

Access Paper or Ask Questions

NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation

Jul 17, 2025

Yuanxin Zhuang, Dazhong Shen, Ying Sun

Abstract:Graph generation plays a pivotal role across numerous domains, including molecular design and knowledge graph construction. Although existing methods achieve considerable success in generating realistic graphs, their interpretability remains limited, often obscuring the rationale behind structural decisions. To address this challenge, we propose the Neural Graph Topic Model (NGTM), a novel generative framework inspired by topic modeling in natural language processing. NGTM represents graphs as mixtures of latent topics, each defining a distribution over semantically meaningful substructures, which facilitates explicit interpretability at both local and global scales. The generation process transparently integrates these topic distributions with a global structural variable, enabling clear semantic tracing of each generated graph. Experiments demonstrate that NGTM achieves competitive generation quality while uniquely enabling fine-grained control and interpretability, allowing users to tune structural features or induce biological properties through topic-level adjustments.

Via

Access Paper or Ask Questions

Improving Recommendation Fairness without Sensitive Attributes Using Multi-Persona LLMs

May 26, 2025

Haoran Xin, Ying Sun, Chao Wang, Yanke Yu, Weijia Zhang, Hui Xiong

Abstract:Despite the success of recommender systems in alleviating information overload, fairness issues have raised concerns in recent years, potentially leading to unequal treatment for certain user groups. While efforts have been made to improve recommendation fairness, they often assume that users' sensitive attributes are available during model training. However, collecting sensitive information can be difficult, especially on platforms that involve no personal information disclosure. Therefore, we aim to improve recommendation fairness without any access to sensitive attributes. However, this is a non-trivial task because uncovering latent sensitive patterns from complicated user behaviors without explicit sensitive attributes can be difficult. Consequently, suboptimal estimates of sensitive distributions can hinder the fairness training process. To address these challenges, leveraging the remarkable reasoning abilities of Large Language Models (LLMs), we propose a novel LLM-enhanced framework for Fair recommendation withOut Sensitive Attributes (LLMFOSA). A Multi-Persona Sensitive Information Inference module employs LLMs with distinct personas that mimic diverse human perceptions to infer and distill sensitive information. Furthermore, a Confusion-Aware Sensitive Representation Learning module incorporates inference results and rationales to develop robust sensitive representations, considering the mislabeling confusion and collective consensus among agents. The model is then optimized by a formulated mutual information objective. Extensive experiments on two public datasets validate the effectiveness of LLMFOSA in improving fairness.

* 18 pages, 9 figures

Via

Access Paper or Ask Questions

LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach

May 26, 2025

Haoran Xin, Ying Sun, Chao Wang, Weijia Zhang, Hui Xiong

Abstract:Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align with LLMs' semantic space. We achieve this by retrieving a curated set of the most relevant user behaviors in natural language form. However, identifying informative CI is challenging due to the complexity of similarity and utility assessment. To tackle this, we introduce a Self-assessing COllaborative REtrieval framework (SCORE) following the retrieve-rerank paradigm. First, a Collaborative Retriever (CAR) is developed to consider both collaborative patterns and semantic similarity. Then, a Self-assessing Reranker (SARE) leverages LLMs' own reasoning to assess and prioritize retrieved behaviors. Finally, the selected behaviors are prepended to the LLM prompt as natural-language CI to guide recommendation. Extensive experiments on two public datasets validate the effectiveness of SCORE in improving LLM-based recommendation.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions