Abstract:Deep learning has achieved remarkable success in medical image analysis, yet its performance remains highly sensitive to the heterogeneity of clinical data. Differences in imaging hardware, staining protocols, and acquisition conditions produce substantial domain shifts that degrade model generalization across institutions. Here we present a physics-based data preprocessing framework based on the PhyCV (Physics-Inspired Computer Vision) family of algorithms, which standardizes medical images through deterministic transformations derived from optical physics. The framework models images as spatially varying optical fields that undergo a virtual diffractive propagation followed by coherent phase detection. This process suppresses non-semantic variability such as color and illumination differences while preserving diagnostically relevant texture and structural features. When applied to histopathological images from the Camelyon17-WILDS benchmark, PhyCV preprocessing improves out-of-distribution breast-cancer classification accuracy from 70.8% (Empirical Risk Minimization baseline) to 90.9%, matching or exceeding data-augmentation and domain-generalization approaches at negligible computational cost. Because the transform is physically interpretable, parameterizable, and differentiable, it can be deployed as a fixed preprocessing stage or integrated into end-to-end learning. These results establish PhyCV as a generalizable data refinery for medical imaging-one that harmonizes heterogeneous datasets through first-principles physics, improving robustness, interpretability, and reproducibility in clinical AI systems.
Abstract:Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.
Abstract:Time series are highly valuable and rarely shareable across nodes, making federated learning a promising paradigm to leverage distributed temporal data. However, different sampling standards lead to diverse time granularities and variable sets across nodes, hindering classical federated learning. We propose PiXTime, a novel time series forecasting model designed for federated learning that enables effective prediction across nodes with multi-granularity and heterogeneous variable sets. PiXTime employs a personalized Patch Embedding to map node-specific granularity time series into token sequences of a unified dimension for processing by a subsequent shared model, and uses a global VE Table to align variable category semantics across nodes, thereby enhancing cross-node transferability. With a transformer-based shared model, PiXTime captures representations of auxiliary series with arbitrary numbers of variables and uses cross-attention to enhance the prediction of the target series. Experiments show PiXTime achieves state-of-the-art performance in federated settings and demonstrates superior performance on eight widely used real-world traditional benchmarks.



Abstract:This work studies a composite minimization problem involving a differentiable function q and a nonsmooth function h, both of which may be nonconvex. This problem is ubiquitous in signal processing and machine learning yet remains challenging to solve efficiently, particularly when large-scale instances, poor conditioning, and nonconvexity coincide. To address these challenges, we propose a proximal conjugate gradient method (PCG) that matches the fast convergence of proximal (quasi-)Newton algorithms while reducing computation and memory complexity, and is especially effective for spectrally clustered Hessians. Our key innovation is to form, at each iteration, an approximation to the Newton direction based on CG iterations to build a majorization surrogate. We define this surrogate in a curvature-aware manner and equip it with a CG-derived isotropic weight, guaranteeing majorization of a local second-order model of q along the given direction. To better preserve majorization after the proximal step and enable further approximation refinement, we scale the CG direction by the ratio between the Cauchy step length and a step size derived from the largest Ritz value of the CG tridiagonal. All curvature is accessed via Hessian-vector products computed by automatic differentiation, keeping the method Hessian-free. Convergence to first-order critical points is established. Numerical experiments on CS-MRI with nonconvex regularization and on dictionary learning, against benchmark methods, demonstrate the efficiency of the proposed approach.
Abstract:Current Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.
Abstract:Exploring the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction, this study contrasts these modern approaches with traditional Simultaneous Localization and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we assess performance based on tracking accuracy, mapping fidelity, and view synthesis. Findings reveal that NeRF excels in view synthesis, offering unique capabilities in generating new perspectives from existing data, albeit at slower processing speeds. Conversely, Gaussian-based methods provide rapid processing and significant expressiveness but lack comprehensive scene completion. Enhanced by global optimization and loop closure techniques, newer methods like NICE-SLAM and SplaTAM not only surpass older frameworks such as ORB-SLAM2 in terms of robustness but also demonstrate superior performance in dynamic and complex environments. This comparative analysis bridges theoretical research with practical implications, shedding light on future developments in robust 3D scene reconstruction across various real-world applications.




Abstract:Deep neural networks (DNNs) have achieved exceptional performance across various fields by learning complex nonlinear mappings from large-scale datasets. However, they encounter challenges such as high computational costs and limited interpretability. To address these issues, hybrid approaches that integrate physics with AI are gaining interest. This paper introduces a novel physics-based AI model called the "Nonlinear Schr\"odinger Network", which treats the Nonlinear Schr\"odinger Equation (NLSE) as a general-purpose trainable model for learning complex patterns including nonlinear mappings and memory effects from data. Existing physics-informed machine learning methods use neural networks to approximate the solutions of partial differential equations (PDEs). In contrast, our approach directly treats the PDE as a trainable model to obtain general nonlinear mappings that would otherwise require neural networks. As a physics-inspired approach, it offers a more interpretable and parameter-efficient alternative to traditional black-box neural networks, achieving comparable or better accuracy in time series classification tasks while significantly reducing the number of required parameters. Notably, the trained Nonlinear Schr\"odinger Network is interpretable, with all parameters having physical meanings as properties of a virtual physical system that transforms the data to a more separable space. This interpretability allows for insight into the underlying dynamics of the data transformation process. Applications to time series forecasting have also been explored. While our current implementation utilizes the NLSE, the proposed method of using physics equations as trainable models to learn nonlinear mappings from data is not limited to the NLSE and may be extended to other master equations of physics.




Abstract:Gaussian SLAM systems have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM (MG-SLAM), an RGB-D system that leverages the Manhattan World hypothesis to enhance geometric accuracy and completeness. By seamlessly integrating fused line segments derived from structured scenes, MG-SLAM ensures robust tracking in textureless indoor areas. Moreover, The extracted lines and planar surface assumption allow strategic interpolation of new Gaussians in regions of missing geometry, enabling efficient scene completion. Extensive experiments conducted on both synthetic and real-world scenes demonstrate that these advancements enable our method to achieve state-of-the-art performance, marking a substantial improvement in the capabilities of Gaussian SLAM systems.



Abstract:Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail the Pix2Pix model's utilization for generating high-fidelity datasets, supported by a dataset of paired map and aerial images, and enhanced by a tailored training regimen. The results demonstrate the model's capability to accurately render complex urban features, establishing its efficacy and potential for broad real-world applications.




Abstract:The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The establishment of increasingly powerful approaches to incorporate Artificial Intelligence (AI) may provide just the needed innovation to solve some of these challenges. In this paper we provide a comprehensive survey about the usage of AI methods in NDE in light of the recent innovations towards NDE 4.0. Since we cannot discuss each NDE modality in one paper, we limit our attention to magnetic methods, ultrasound, thermography, as well as optical inspection. In addition to reviewing recent AI developments in each field, we draw common connections by pointing out NDE-related tasks that have a common underlying mathematical problem and categorizing the state of the art according to the corresponding sub-tasks. In so doing, interdisciplinary connections are drawn that provide a more complete overall picture.