Peking University
Abstract:To address the limitations of traditional reconfigurable intelligent surfaces (RIS) in spatial control capability, this paper introduces the concept of the fluid antenna system (FAS) and proposes a fluid simultaneously transmitting and reflecting RIS (FSTAR-RIS) assisted non-orthogonal multiple access (NOMA) multi-user communication system. In this system, each FSTAR-RIS element is capable of flexible mobility and can dynamically adjust its position in response to environmental variations, thereby enabling simultaneous service to users in both the transmission and reflection zones. This significantly enhances the system's spatial degrees of freedom (DoF) and service adaptability. To maximize the system's weighted sum-rate, we formulate a non-convex optimization problem that jointly optimizes the base station beamforming, the transmission/reflection coefficients of the FSTAR-RIS, and the element positions. An alternating optimization (AO) algorithm is developed, incorporating successive convex approximation (SCA), semi-definite relaxation (SDR), and majorization-minimization (MM) techniques. In particular, to address the complex channel coupling introduced by the coexistence of direct and FSTAR-RIS paths, the MM framework is employed in the element position optimization subproblem, enabling an efficient iterative solution strategy. Simulation results validate that the proposed system achieves up to a 27% increase in total sum rate compared to traditional STAR-RIS systems and requires approximately 50% fewer RIS elements to attain the same performance, highlighting its effectiveness for cost-efficient large-scale deployment.
Abstract:Artificial Intelligence has revolutionised critical care for common conditions. Yet, rare conditions in the intensive care unit (ICU), including recognised rare diseases and low-prevalence conditions in the ICU, remain underserved due to data scarcity and intra-condition heterogeneity. To bridge such gaps, we developed KnowRare, a domain adaptation-based deep learning framework for predicting clinical outcomes for rare conditions in the ICU. KnowRare mitigates data scarcity by initially learning condition-agnostic representations from diverse electronic health records through self-supervised pre-training. It addresses intra-condition heterogeneity by selectively adapting knowledge from clinically similar conditions with a developed condition knowledge graph. Evaluated on two ICU datasets across five clinical prediction tasks (90-day mortality, 30-day readmission, ICU mortality, remaining length of stay, and phenotyping), KnowRare consistently outperformed existing state-of-the-art models. Additionally, KnowRare demonstrated superior predictive performance compared to established ICU scoring systems, including APACHE IV and IV-a. Case studies further demonstrated KnowRare's flexibility in adapting its parameters to accommodate dataset-specific and task-specific characteristics, its generalisation to common conditions under limited data scenarios, and its rationality in selecting source conditions. These findings highlight KnowRare's potential as a robust and practical solution for supporting clinical decision-making and improving care for rare conditions in the ICU.
Abstract:Multi-sensor fusion perception (MSFP) is a key technology for embodied AI, which can serve a variety of downstream tasks (e.g., 3D object detection and semantic segmentation) and application scenarios (e.g., autonomous driving and swarm robotics). Recently, impressive achievements on AI-based MSFP methods have been reviewed in relevant surveys. However, we observe that the existing surveys have some limitations after a rigorous and detailed investigation. For one thing, most surveys are oriented to a single task or research field, such as 3D object detection or autonomous driving. Therefore, researchers in other related tasks often find it difficult to benefit directly. For another, most surveys only introduce MSFP from a single perspective of multi-modal fusion, while lacking consideration of the diversity of MSFP methods, such as multi-view fusion and time-series fusion. To this end, in this paper, we hope to organize MSFP research from a task-agnostic perspective, where methods are reported from various technical views. Specifically, we first introduce the background of MSFP. Next, we review multi-modal and multi-agent fusion methods. A step further, time-series fusion methods are analyzed. In the era of LLM, we also investigate multimodal LLM fusion methods. Finally, we discuss open challenges and future directions for MSFP. We hope this survey can help researchers understand the important progress in MSFP and provide possible insights for future research.
Abstract:Towards future 6G wireless networks, low earth orbit (LEO) satellites have been widely considered as a promising component to enhance the terrestrial communications. To ensure the link reliability of high-mobility satellite communication scenarios, the emerging orthogonal delay-Doppler division multiplexing (ODDM) modulation has attracted significant research attention. In this paper, we study the diversity gain achieved by ODDM modulation along with the mathematical analysis and numerical simulations. Additionally, we propose an orthogonal approximate message passing (OAMP) algorithm based detector to harvest the diversity gain promised by ODDM modulation. By operating the linear and non-linear estimator iteratively, the orthogonal approximate message passing (OAMP) detector can utilize the sparsity of the effective delay-Doppler (DD) domain channel and extract the full diversity. Simulation results reveal the relationship between diversity gain and system parameters, and demonstrate that our proposed detector can achieve better performance than the conventional message passing methods with significantly reduced complexity.
Abstract:Human activity intensity prediction is a crucial to many location-based services. Although tremendous progress has been made to model dynamic spatiotemporal patterns of human activity, most existing methods, including spatiotemporal graph neural networks (ST-GNNs), overlook physical constraints of spatial interactions and the over-smoothing phenomenon in spatial correlation modeling. To address these limitations, this work proposes a physics-informed deep learning framework, namely Gravity-informed Spatiotemporal Transformer (Gravityformer) by refining transformer attention to integrate the universal law of gravitation and explicitly incorporating constraints from spatial interactions. Specifically, it (1) estimates two spatially explicit mass parameters based on inflow and outflow, (2) models the likelihood of cross-unit interaction using closed-form solutions of spatial interactions to constrain spatial modeling randomness, and (3) utilizes the learned spatial interaction to guide and mitigate the over-smoothing phenomenon in transformer attention matrices. The underlying law of human activity can be explicitly modeled by the proposed adaptive gravity model. Moreover, a parallel spatiotemporal graph convolution transformer structure is proposed for achieving a balance between coupled spatial and temporal learning. Systematic experiments on six real-world large-scale activity datasets demonstrate the quantitative and qualitative superiority of our approach over state-of-the-art benchmarks. Additionally, the learned gravity attention matrix can be disentangled and interpreted based on geographical laws. This work provides a novel insight into integrating physical laws with deep learning for spatiotemporal predictive learning.
Abstract:Image inpainting is the task of reconstructing missing or damaged parts of an image in a way that seamlessly blends with the surrounding content. With the advent of advanced generative models, especially diffusion models and generative adversarial networks, inpainting has achieved remarkable improvements in visual quality and coherence. However, achieving seamless continuity remains a significant challenge. In this work, we propose two novel methods to address discrepancy issues in diffusion-based inpainting models. First, we introduce a modified Variational Autoencoder that corrects color imbalances, ensuring that the final inpainted results are free of color mismatches. Second, we propose a two-step training strategy that improves the blending of generated and existing image content during the diffusion process. Through extensive experiments, we demonstrate that our methods effectively reduce discontinuity and produce high-quality inpainting results that are coherent and visually appealing.
Abstract:Ferroelectric polarization switching underpins the functional performance of a wide range of materials and devices, yet its dependence on complex local microstructural features renders systematic exploration by manual or grid-based spectroscopic measurements impractical. Here, we introduce a multi-objective kernel-learning workflow that infers the microstructural rules governing switching behavior directly from high-resolution imaging data. Applied to automated piezoresponse force microscopy (PFM) experiments, our framework efficiently identifies the key relationships between domain-wall configurations and local switching kinetics, revealing how specific wall geometries and defect distributions modulate polarization reversal. Post-experiment analysis projects abstract reward functions, such as switching ease and domain symmetry, onto physically interpretable descriptors including domain configuration and proximity to boundaries. This enables not only high-throughput active learning, but also mechanistic insight into the microstructural control of switching phenomena. While demonstrated for ferroelectric domain switching, our approach provides a powerful, generalizable tool for navigating complex, non-differentiable design spaces, from structure-property correlations in molecular discovery to combinatorial optimization across diverse imaging modalities.
Abstract:Autoregressive image generation aims to predict the next token based on previous ones. However, existing image tokenizers encode tokens with bidirectional dependencies during the compression process, which hinders the effective modeling by autoregressive models. In this paper, we propose a novel Aligned Tokenizer (AliTok), which utilizes a causal decoder to establish unidirectional dependencies among encoded tokens, thereby aligning the token modeling approach between the tokenizer and autoregressive model. Furthermore, by incorporating prefix tokens and employing two-stage tokenizer training to enhance reconstruction consistency, AliTok achieves great reconstruction performance while being generation-friendly. On ImageNet-256 benchmark, using a standard decoder-only autoregressive model as the generator with only 177M parameters, AliTok achieves a gFID score of 1.50 and an IS of 305.9. When the parameter count is increased to 662M, AliTok achieves a gFID score of 1.35, surpassing the state-of-the-art diffusion method with 10x faster sampling speed. The code and weights are available at https://github.com/ali-vilab/alitok.
Abstract:Domain-wall dynamics in ferroelectric materials are strongly position-dependent since each polar interface is locked into a unique local microstructure. This necessitates spatially resolved studies of the wall-pinning using scanning-probe microscopy techniques. The pinning centers and preexisting domain walls are usually sparse within image plane, precluding the use of dense hyperspectral imaging modes and requiring time-consuming human experimentation. Here, a large area epitaxial PbTiO$_3$ film on cubic KTaO$_3$ were investigated to quantify the electric field driven dynamics of the polar-strain domain structures using ML-controlled automated Piezoresponse Force Microscopy. Analysis of 1500 switching events reveals that domain wall displacement depends not only on field parameters but also on the local ferroelectric-ferroelastic configuration. For example, twin boundaries in polydomains regions like a$_1^-$/$c^+$ $\parallel$ a$_2^-$/$c^-$ stay pinned up to a certain level of bias magnitude and change only marginally as the bias increases from 20V to 30V, whereas single variant boundaries like a$_2^+$/$c^+$ $\parallel$ a$_2^-$/$c^-$ stack are already activated at 20V. These statistics on the possible ferroelectric and ferroelastic wall orientations, together with the automated, high-throughput AFM workflow, can be distilled into a predictive map that links domain configurations to pulse parameters. This microstructure-specific rule set forms the foundation for designing ferroelectric memories.
Abstract:Exceptional behavior tests (EBTs) are crucial in software development for verifying that code correctly handles unwanted events and throws appropriate exceptions. However, prior research has shown that developers often prioritize testing "happy paths", e.g., paths without unwanted events over exceptional scenarios. We present exLong, a framework that automatically generates EBTs to address this gap. exLong leverages a large language model (LLM) fine-tuned from CodeLlama and incorporates reasoning about exception-throwing traces, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces. Our demonstration video illustrates how exLong can effectively assist developers in creating comprehensive EBTs for their project (available at https://youtu.be/Jro8kMgplZk).