Xi'an Jiaotong University
Abstract:Removing objects from videos remains difficult in the presence of real-world imperfections such as shadows, abrupt motion, and defective masks. Existing diffusion-based video inpainting models often struggle to maintain temporal stability and visual consistency under these challenges. We propose Stable Video Object Removal (SVOR), a robust framework that achieves shadow-free, flicker-free, and mask-defect-tolerant removal through three key designs: (1) Mask Union for Stable Erasure (MUSE), a windowed union strategy applied during temporal mask downsampling to preserve all target regions observed within each window, effectively handling abrupt motion and reducing missed removals; (2) Denoising-Aware Segmentation (DA-Seg), a lightweight segmentation head on a decoupled side branch equipped with Denoising-Aware AdaLN and trained with mask degradation to provide an internal diffusion-aware localization prior without affecting content generation; and (3) Curriculum Two-Stage Training: where Stage I performs self-supervised pretraining on unpaired real-background videos with online random masks to learn realistic background and temporal priors, and Stage II refines on synthetic pairs using mask degradation and side-effect-weighted losses, jointly removing objects and their associated shadows/reflections while improving cross-domain robustness. Extensive experiments show that SVOR attains new state-of-the-art results across multiple datasets and degraded-mask benchmarks, advancing video object removal from ideal settings toward real-world applications.
Abstract:Dynamic functional connectivity captures time-varying brain states for better neuropsychiatric diagnosis and spatio-temporal interpretability, i.e., identifying when discriminative disease signatures emerge and where they reside in the connectivity topology. Reliable interpretability faces major challenges: diagnostic signals are often subtle and sparsely distributed across both time and topology, while nuisance fluctuations and non-diagnostic connectivities are pervasive. To address these issues, we propose BrainSTR, a spatio-temporal contrastive learning framework for interpretable dynamic brain network modeling. BrainSTR learns state-consistent phase boundaries via a data-driven Adaptive Phase Partition module, identifies diagnostically critical phases with attention, and extracts disease-related connectivity within each phase using an Incremental Graph Structure Generator regularized by binarization, temporal smoothness, and sparsity. Then, we introduce a spatio-temporal supervised contrastive learning approach that leverages diagnosis-relevant spatio-temporal patterns to refine the similarity metric between samples and capture more discriminative spatio-temporal features, thereby constructing a well-structured semantic space for coherent and interpretable representations. Experiments on ASD, BD, and MDD validate the effectiveness of BrainSTR, and the discovered critical phases and subnetworks provide interpretable evidence consistent with prior neuroimaging findings. Our code: https://anonymous.4open.science/r/BrainSTR1.
Abstract:Surface partial differential equations arise in numerous scientific and engineering applications. Their numerical solution on static and evolving surfaces remains challenging due to geometric complexity and, for evolving geometries, the need for repeated mesh updates and geometry or solution transfer. While neural-network-based methods offer mesh-free discretizations, approaches based on nonconvex training can be costly and may fail to deliver high accuracy in practice. In this work, we develop a randomized neural network (RaNN) method for solving PDEs on both static and evolving surfaces: the hidden-layer parameters are randomly generated and kept fixed, and the output-layer coefficients are determined efficiently by solving a least-squares problem. For static surfaces, we present formulations for parametrized surfaces, implicit level-set surfaces, and point-cloud geometries, and provide a corresponding theoretical analysis for the parametrization-based formulation with interface compatibility. For evolving surfaces with topology preserved over time, we introduce a RaNN-based strategy that learns the surface evolution through a flow-map representation and then solves the surface PDE on a space--time collocation set, avoiding remeshing. Extensive numerical experiments demonstrate broad applicability and favorable accuracy--efficiency performance on representative benchmarks.
Abstract:Reliable Alzheimer's disease (AD) diagnosis increasingly relies on multimodal assessments combining structural Magnetic Resonance Imaging (MRI) and Electronic Health Records (EHR). However, deploying these models is bottlenecked by modality missingness, as MRI scans are expensive and frequently unavailable in many patient cohorts. Furthermore, synthesizing de novo 3D anatomical scans from sparse, high-dimensional tabular records is technically challenging and poses severe clinical risks. To address this, we introduce MIRAGE, a novel framework that reframes the missing-MRI problem as an anatomy-guided cross-modal latent distillation task. First, MIRAGE leverages a Biomedical Knowledge Graph (KG) and Graph Attention Networks to map heterogeneous EHR variables into a unified embedding space that can be propagated from cohorts with real MRIs to cohorts without them. To bridge the semantic gap and enforce physical spatial awareness, we employ a frozen pre-trained 3D U-Net decoder strictly as an auxiliary regularization engine. Supported by a novel cohort-aggregated skip feature compensation strategy, this decoder acts as a rigorous structural penalty, forcing 1D latent representations to encode biologically plausible, macro-level pathological semantics. By exclusively utilizing this distilled "diagnostic-surrogate" representation during inference, MIRAGE completely bypasses computationally expensive 3D voxel reconstruction. Experiments demonstrate that our framework successfully bridges the missing-modality gap, improving the AD classification rate by 13% compared to unimodal baselines in cohorts without real MRIs.
Abstract:The incompressible magnetohydrodynamic (MHD) equations are fundamental in many scientific and engineering applications. However, their strong nonlinearity and dual divergence-free constraints make them highly challenging for conventional numerical solvers. To overcome these difficulties, we propose a Structure-Preserving Randomized Neural Network (SP-RaNN) that automatically and exactly satisfies the divergence-free conditions. Unlike deep neural network (DNN) approaches that rely on expensive nonlinear and nonconvex optimization, SP-RaNN reformulates the training process into a linear least-squares system, thereby eliminating nonconvex optimization. The method linearizes the governing equations through Picard or Newton iterations, discretizes them at collocation points within the domain and on the boundaries using finite-difference schemes, and solves the resulting linear system via a linear least-squares procedure. By design, SP-RaNN preserves the intrinsic mathematical structure of the equations within a unified space-time framework, ensuring both stability and accuracy. Numerical experiments on the Navier-Stokes, Maxwell, and MHD equations demonstrate that SP-RaNN achieves higher accuracy, faster convergence, and exact enforcement of divergence-free constraints compared with both traditional numerical methods and DNN-based approaches. This structure-preserving framework provides an efficient and reliable tool for solving complex PDE systems while rigorously maintaining their underlying physical laws.
Abstract:This paper proposes an Adaptive-Growth Randomized Neural Network (AG-RaNN) method for computing multivalued solutions of nonlinear first-order PDEs with hyperbolic characteristics, including quasilinear hyperbolic balance laws and Hamilton--Jacobi equations. Such solutions arise in geometric optics, seismic waves, semiclassical limit of quantum dynamics and high frequency limit of linear waves, and differ markedly from the viscosity or entropic solutions. The main computational challenges lie in that the solutions are no longer functions, and become union of multiple branches, after the formation of singularities. Level-set formulations offer a systematic alternative by embedding the nonlinear dynamics into linear transport equations posed in an augmented phase space, at the price of substantially increased dimensionality. To alleviate this computational burden, we combine AG-RaNN with an adaptive collocation strategy that concentrates samples in a tubular neighborhood of the zero level set, together with a layer-growth mechanism that progressively enriches the randomized feature space. Under standard regularity assumptions on the transport field and the characteristic flow, we establish a convergence result for the AG-RaNN approximation of the level-set equations. Numerical experiments demonstrate that the proposed method can efficiently recover multivalued structures and resolve nonsmooth features in high-dimensional settings.
Abstract:Class-Incremental Learning (CIL) requires models to learn new classes without forgetting old ones. A common method is to freeze a pre-trained model and train a new, lightweight adapter for each task. While this prevents forgetting, it treats the learned knowledge as a simple, unstructured collection and fails to use the relationships between tasks. To this end, we propose the Semantic-guided Adaptive Expert Forest (SAEF), a new method that organizes adapters into a structured hierarchy for better knowledge sharing. SAEF first groups tasks into conceptual clusters based on their semantic relationships. Then, within each cluster, it builds a balanced expert tree by creating new adapters from merging the adapters of similar tasks. At inference time, SAEF finds and activates a set of relevant experts from the forest for any given input. The final prediction is made by combining the outputs of these activated experts, weighted by how confident each expert is. Experiments on several benchmark datasets show that SAEF achieves SOTA performance.
Abstract:Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse" due to the inherent coupling of representation learning and discrete codebook optimization. In this paper, we propose VP-VAE (Vector Perturbation VAE), a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training. Our key insight is that, from the neural network's viewpoint, performing quantization primarily manifests as injecting a structured perturbation in latent space. Accordingly, VP-VAE replaces the non-differentiable quantizer with distribution-consistent and scale-adaptive latent perturbations generated via Metropolis--Hastings sampling. This design enables stable training without a codebook while making the model robust to inference-time quantization error. Moreover, under the assumption of approximately uniform latent variables, we derive FSP (Finite Scalar Perturbation), a lightweight variant of VP-VAE that provides a unified theoretical explanation and a practical improvement for FSQ-style fixed quantizers. Extensive experiments on image and audio benchmarks demonstrate that VP-VAE and FSP improve reconstruction fidelity and achieve substantially more balanced token usage, while avoiding the instability inherent to coupled codebook training.
Abstract:Current Explainable AI (XAI) focuses on explaining a single application, but when encountering related applications, users may rely on their prior understanding from previous explanations. This leads to either overgeneralization and AI overreliance, or burdensome independent memorization. Indeed, related decision tasks can share explanatory factors, but with some notable differences; e.g., body mass index (BMI) affects the risks for heart disease and diabetes at the same rate, but chest pain is more indicative of heart disease. Similarly, models using different attributes for the same task still share signals; e.g., temperature and pressure affect air pollution but in opposite directions due to the ideal gas law. Leveraging transfer of learning, we propose Transferable XAI to enable users to transfer understanding across related domains by explaining the relationship between domain explanations using a general affine transformation framework applied to linear factor explanations. The framework supports explanation transfer across various domain types: translation for data subspace (subsuming prior work on Incremental XAI), scaling for decision task, and mapping for attributes. Focusing on task and attributes domain types, in formative and summative user studies, we investigated how well participants could understand AI decisions from one domain to another. Compared to single-domain and domain-independent explanations, Transferable XAI was the most helpful for understanding the second domain, leading to the best decision faithfulness, factor recall, and ability to relate explanations between domains. This framework contributes to improving the reusability of explanations across related AI applications by explaining factor relationships between subspaces, tasks, and attributes.
Abstract:Explaining with examples is an intuitive way to justify AI decisions. However, it is challenging to understand how a decision value should change relative to the examples with many features differing by large amounts. We draw from real estate valuation that uses Comparables-examples with known values for comparison. Estimates are made more accurate by hypothetically adjusting the attributes of each Comparable and correspondingly changing the value based on factors. We propose Comparables XAI for relatable example-based explanations of AI with Trace adjustments that trace counterfactual changes from each Comparable to the Subject, one attribute at a time, monotonically along the AI feature space. In modelling and user studies, Trace-adjusted Comparables achieved the highest XAI faithfulness and precision, user accuracy, and narrowest uncertainty bounds compared to linear regression, linearly adjusted Comparables, or unadjusted Comparables. This work contributes a new analytical basis for using example-based explanations to improve user understanding of AI decisions.