Abstract:Multi-modal sensing is an important enabler for future environment-aware wireless systems, since a single sensing modality is generally insufficient to provide accurate metric geometry, material awareness, and semantic interpretability in complex environments. This paper presents a measurement-based multi-modal THz sensing and vision framework for indoor environment reconstruction. A three-dimensional monostatic THz channel sounding system operating at 290-310 GHz is integrated with an omnidirectional fisheye camera to acquire radio-frequency and visual observations from a common sensing viewpoint. From the measured THz data, a signal processing pipeline extracts multipath components and infers geometryand material-consistent structural primitives through trajectory tracking-assisted parameter estimation, graph-based structure discovery, planar reconstruction, and reflection-loss analysis. In parallel, AI-based visual perception modules extract object-level semantic masks and depth priors from panoramic images. To associate these heterogeneous representations, an agentic-AI-based task-driven THz-agent module is developed to select appropriate integration tools according to the attributes of the modality-specific outputs. Through angular alignment and consistency analysis, THz-derived metric geometry and material information are associated with vision-derived semantic regions and depth priors, enabling geometry-consistent and semantically interpretable environment reconstruction directly from measurements. Experimental validation in the indoor L-shaped hallway demonstrates that the proposed framework reconstructs dominant structural elements with centimeter-level accuracy while identifying semantic categories and material attributes of representative indoor objects.
Abstract:While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.
Abstract:Accurate channel modeling is fundamental to design and evaluation of Terahertz (THz) ultra-massive multiple-input multiple-output (UM-MIMO) systems. However, existing model-based approaches typically rely on simplified assumptions, such as sparsity or predefined parametric structures, which are insufficient to capture the complex spatial variations and cross far-/near-field propagation characteristics of practical THz channels. In this paper, a conditional diffusion transformer (CDiT) framework is proposed for high-fidelity THz channel generation. By leveraging the state-of-the-art hybrid planar-spherical wave model (HPSM), THz channel modeling is formulated as a geometry-aware conditional generative learning problem in the sparse beamspace domain. Position information is incorporated as a conditioning signal within a diffusion-transformer architecture, enabling effective learning of the spatially dependent channel distribution. By combining the strong distribution modeling capability of diffusion models with the global dependency modeling strength of transformers, the proposed framework achieves controllable and high-fidelity THz channel synthesis. Extensive experiments on realistic THz channel datasets demonstrate that the proposed framework converges stably and significantly outperforms representative benchmark methods. The proposed framework provides a promising data-driven paradigm for THz channel modeling in next-generation wireless systems.
Abstract:With the enlargement of antenna apertures in 6G Terahertz (THz) communications, the Rayleigh distance expands significantly, rendering near-field propagation a dominant scenario in THz links. Beyond conventional Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) conditions, quasi-LoS scenarios with partial obstructions have emerged as a critical challenge. Airy beams offer a promising solution to circumvent obstacles due to their unique curving trajectory. However, existing Airy beam training methods typically rely on parameter-based sampling or exhaustive search, leading to significant pilot overhead and low training efficiency. In this paper, an efficient Airy beam training framework is proposed to address this research gap. First, the theoretical bounds of Airy beam generation under finite apertures to prune physically invalid codewords are derived. Based on this, a two-stage Non-Uniform Polar Codebook (NUPC) design is presented, utilizing a probing mechanism to resolve the bending direction and a polar-domain spatial sampling strategy to generate Airy beams. To address ultra-low latency requirements, a Fast-Scanning 1D Codebook (FS1C) is further developed that sweeps the entire LoS region with minimal codewords. Simulation results demonstrate that NUPC achieves a higher average spectral efficiency (SE) by 13.4 bit/s/Hz while reducing training overhead by 54.2% compared to the state-of-the-art hierarchical focusing-Airy codebook (HFAC). Furthermore, FS1C reduces overhead by 92.9% with only a marginal 0.3 bit/s/Hz reduction compared with HFAC.
Abstract:Federated learning (FL) has emerged as a promising distributed training paradigm for Low Earth Orbit (LEO) networks by significantly reducing communication overhead. However, its deployment faces critical challenges, e.g., topology-induced model staleness, short contact windows, and unaddressed computing heterogeneity. To address these issues, a topology-aware two-stage FL framework is proposed in this paper. First, a multi-layer physical architecture utilizing high-altitude platforms (HAPs) and Sub-THz communications is designed to extend satellite-ground contact windows and enlarge available bandwidth. Second, a proxy-model-based approach is adopted to fully utilize heterogeneous resources and enable architecture-agnostic knowledge aggregation. Finally, building upon these foundations, a topology-aware two-stage aggregation mechanism is proposed as the central algorithmic design to overcome the topology-induced staleness. The mechanism dynamically partitions LEO satellites into localized groups based on their transient HAP coverage. Within each group, LEO satellites perform asynchronous aggregation at their associated HAP to naturally tolerate computational delays without penalizing faster nodes. Subsequently, a synchronous inter-group aggregation is executed among all HAPs at the Ground Station (GS) to strictly bound the maximum staleness and guarantee stable global convergence. Numerical results demonstrate the proposed framework extends contact windows and achieves 86.59%--90.57% test accuracy, outperforming the state-of-the-art heterogeneous baseline by 16.26\%--19.80\%. Furthermore, it achieves a 1.5x to 2.2x convergence speedup, which closely approaches the ideal upper bound.
Abstract:The Terahertz (THz) band (0.1-10 THz) has emerged as a critical frontier for future communication systems, offering ultra-wide bandwidths that enable Terabits-per-second (Tbps) wireless links and high-precision sensing and imaging. However, practical deployment of THz systems is hindered by unique challenges, including intricate channel characteristics, high-dimensional and large-scale optimization problems, and highly dynamic network environments. Artificial Intelligence (AI) serves as a transformative enabler to address these challenges, providing robust capabilities for precise modeling, advanced signal processing, complex optimization, real-time decision-making, and prediction, among others. Reciprocally, the unprecedented bandwidth and high-resolution sensing capabilities of THz networks provide a promising physical infrastructure for AI, facilitating training, inference, and data collection. This survey presents a systematic and comprehensive overview of AI-driven solutions across the entire THz communication network and the symbiosis of AI and THz networks. To begin with, a foundational overview of AI technologies tailored for wireless communications is presented. Subsequently, AI-based innovations are investigated, spanning from hardware design, channel modeling, physical layer optimization, up to higher-layer network protocols and advanced THz services, including mobile edge computing and sensing-empowered applications. In parallel, the capacity of THz networks to serve AI is examined, underscoring a profound paradigm shift towards a mutual symbiosis where AI and THz co-evolve and empower each other. Finally, by synthesizing these state-of-the-art advancements and identifying open research directions, this survey highlights the potential of AI in copilot with development of THz communication systems.
Abstract:Terahertz (THz) ultra-massive multiple-input multiple-output (UM-MIMO) promises ultra-high throughput, while its highly directional beams demand rapid and accurate beam tracking driven by precise user-state estimation. Moreover, large array apertures at high frequencies induce near-field propagation effects, where far-field modeling becomes inaccurate and near-field parametric channel estimation is costly. Bypassing near-field codebook, PAST-TT is proposed to bridge near-field tracking with low-overhead far-field codebook probing by exploiting parallax, amplified by widely spaced subarrays. With comb-type frequency-division multiplexing pilots, each subarray yields frequency-affine phase signatures whose frequency and temporal increments encode propagation delay and its variation between frames. Building on these signatures, a Parallax-Aware Spatial Transformer (PAST) compresses them and outputs per-frame position estimates with token reliability to downweight bad frames, regularized by a physics-in-the-loop consistency loss. A causal Temporal Transformer (TT) then performs reliability-aware filtering and prediction over a sliding window to initialize the beam of the next frame. Acting on short token sequences, PAST-TT avoids a monolithic spatial-temporal network over raw pilots, which keeps the model lightweight with a critical path latency of 0.61 ms. Simulations show that at 15 dB signal-to-noise ratio, PAST achieves 7.81 mm distance RMSE and 0.0588° angle RMSE. Even with a bad-frame rate of 0.1, TT reduces the distance and angle prediction RMSE by 23.1% and 32.8% compared with the best competing tracker.
Abstract:Terahertz (THz) communication can offer terabit-per-second rates in future wireless systems, thanks to the ultra-wide bandwidths, but require large antenna arrays. As antenna apertures expand and we enter the near-field scenarios, the conventional binary classification of communication links as either Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) becomes insufficient. Instead, quasi-LoS scenarios, where the LoS path is partially obstructed, are increasingly prevalent, posing significant challenges for traditional LoS focusing and steering beams. The Airy beam serves as a promising alternative, utilizing its non-diffracting and curved trajectory properties to mitigate such blockages. However, while existing electromagnetics literature primarily explores their physical patterns without practical generation schemes, recent communication-oriented designs predominantly rely on learning-based frameworks lacking interpretable closed-form solutions. To address this issue, this paper investigates a closed-form Airy beam design to efficiently synthesize Airy beam phase profiles based on the positions of the transceivers and obstacles. Specifically, rigorous analytical derivations of the electric field and trajectory are presented to establish a deterministic closed-form design for ULA Airy beamforming. Leveraging 3D wavefront separability, this framework is extended to uniform planar arrays (UPAs) with two operation modes: the hybrid focusing-Airy mode and the dual Airy mode. Simulation results verify the effectiveness of our derived trajectory equations and demonstrate that the proposed closed-form design significantly outperforms conventional beamforming schemes in quasi-LoS scenarios. Furthermore, the proposed method achieves performance comparable to exhaustive numerical searches with low computational complexity and enhanced physical interpretability.
Abstract:Terahertz (THz) integrated sensing and communication (ISAC) offers high-speed communication alongside precise environmental sensing. This paper presents a computationally efficient framework for THz-based environment reconstruction by integrating connected component analysis (CCA)-assisted multipath component (MPC) estimation with a sliding-window refinement strategy. To start with, a monostatic sensing experiment is conducted in an indoor scenario using a vector network analyzer (VNA)-based sounder operating from 290 to 310 GHz. On one hand, as for geometry mapping, a CCA-based region search is employed to accelerate parameter extraction, significantly reducing the search space for space-alternating generalized expectation-maximization (SAGE)-based estimation and achieving an 8.4 times acceleration, while preserving resolution. Further analysis of the connected component structure enables the identification of indoor features such as flat walls and corners. A sliding-window refinement applied to the identified regions improves geometric mapping, achieving the mean distance error of 4.9 mm, which is one order of magnitude better than the literature. On the other hand, the deterministic and stochastic components of the monostatic channel are classified through reflection loss analysis. Then, material identification is performed by looking up the reflection loss in a THz time-domain spectroscopy (THz-TDS) database, which comprises over 200 materials across a 0-6 THz range. Experimental results validate millimeter-level accuracy in geometry mapping and reliable material classification, enhancing the environmental awareness capabilities of THz ISAC systems.
Abstract:Terahertz (THz) extremely large-scale MIMO (XL-MIMO) is considered a key enabling technology for 6G and beyond due to its advantages such as wide bandwidth and high beam gain. As the frequency and array size increase, users are more likely to fall within the near-field (NF) region, where the far-field plane-wave assumption no longer holds. This also introduces spatial non-stationarity (SnS), as different antenna elements observe distinct multipath characteristics. Therefore, this paper proposes a THz XL-MIMO channel model that accounts for both NF propagation and SnS, validated using channel measurement data. In this work, we first conduct THz XL-MIMO channel measurements at 100 GHz and 132 GHz using 301- and 531-element ULAs in indoor environments, revealing pronounced NF effects characterized by nonlinear inter-element phase variations, as well as element-dependent delay and angle shifts. Moreover, the SnS phenomenon is observed, arising not only from blockage but also from inconsistent reflection or scattering. Based on these observations, a hybrid NF channel modeling approach combining the scatterer-excited point-source model and the specular reflection model is proposed to capture nonlinear phase variation. For SnS modeling, amplitude attenuation factors (AAFs) are introduced to characterize the continuous variation of path power across the array. By analyzing the statistical distribution and spatial autocorrelation properties of AAFs, a statistical rank-matching-based method is proposed for their generation. Finally, the model is validated using measured data. Evaluation across metrics such as entropy capacity, condition number, spatial correlation, channel gain, Rician K-factor, and RMS delay spread confirms that the proposed model closely aligns with measurements and effectively characterizes the essential features of THz XL-MIMO channels.