This manuscript investigates the information-theoretic limits of integrated sensing and communications (ISAC), aiming for simultaneous reliable communication and precise channel state estimation. We model such a system with a state-dependent discrete memoryless channel (SD-DMC) with present or absent channel feedback and generalized side information at the transmitter and the receiver, where the joint task of message decoding and state estimation is performed at the receiver. The relationship between the achievable communication rate and estimation error, the capacity-distortion (C-D) trade-off, is characterized across different causality levels of the side information. This framework is shown to be capable of modeling various practical scenarios by assigning the side information with different meanings, including monostatic and bistatic radar systems. The analysis is then extended to the two-user degraded broadcast channel, and we derive an achievable C-D region that is tight under certain conditions. To solve the optimization problem arising in the computation of C-D functions/regions, we propose a proximal block coordinate descent (BCD) method, prove its convergence to a stationary point, and derive a stopping criterion. Finally, several representative examples are studied to demonstrate the versatility of our framework and the effectiveness of the proposed algorithm.
In this paper, a digital twinning framework for indoor integrated sensing, communications, and robotics is proposed, designed, and implemented. Besides leveraging powerful robotics and ray-tracing technologies, the framework also enables integration with real-world sensors and reactive updates triggered by changes in the environment. The framework is designed with commercial, off-the-shelf components in mind, thus facilitating experimentation in the different areas of communication, sensing, and robotics. Experimental results showcase the feasibility and accuracy of indoor localization using digital twins and validate our implementation both qualitatively and quantitatively.
Intracerebral Hemorrhage (ICH) is the deadliest subtype of stroke, necessitating timely and accurate prognostic evaluation to reduce mortality and disability. However, the multi-factorial nature and complexity of ICH make methods based solely on computed tomography (CT) image features inadequate. Despite the capacity of cross-modal networks to fuse additional information, the effective combination of different modal features remains a significant challenge. In this study, we propose a joint-attention fusion-based 3D cross-modal network termed ICHPro that simulates the ICH prognosis interpretation process utilized by neurosurgeons. ICHPro includes a joint-attention fusion module to fuse features from CT images with demographic and clinical textual data. To enhance the representation of cross-modal features, we introduce a joint loss function. ICHPro facilitates the extraction of richer cross-modal features, thereby improving classification performance. Upon testing our method using a five-fold cross-validation, we achieved an accuracy of 89.11%, an F1 score of 0.8767, and an AUC value of 0.9429. These results outperform those obtained from other advanced methods based on the test dataset, thereby demonstrating the superior efficacy of ICHPro. The code is available at our Github: https://github.com/YU-deep/ICH.
Understanding and recognizing emotions are important and challenging issues in the metaverse era. Understanding, identifying, and predicting fear, which is one of the fundamental human emotions, in virtual reality (VR) environments plays an essential role in immersive game development, scene development, and next-generation virtual human-computer interaction applications. In this article, we used VR horror games as a medium to analyze fear emotions by collecting multi-modal data (posture, audio, and physiological signals) from 23 players. We used an LSTM-based model to predict fear with accuracies of 65.31% and 90.47% under 6-level classification (no fear and five different levels of fear) and 2-level classification (no fear and fear), respectively. We constructed a multi-modal natural behavior dataset of immersive human fear responses (VRMN-bD) and compared it with existing relevant advanced datasets. The results show that our dataset has fewer limitations in terms of collection method, data scale and audience scope. We are unique and advanced in targeting multi-modal datasets of fear and behavior in VR stand-up interactive environments. Moreover, we discussed the implications of this work for communities and applications. The dataset and pre-trained model are available at https://github.com/KindOPSTAR/VRMN-bD.
Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial intelligence (AI) models have been explored to assist clinicians, but many prior studies focused on model modification without considering domain knowledge. This paper introduces a novel deep learning algorithm, GCS-ICHNet, which integrates multimodal brain CT image data and the Glasgow Coma Scale (GCS) score to improve ICH prognosis. The algorithm utilizes a transformer-based fusion module for assessment. GCS-ICHNet demonstrates high sensitivity 81.03% and specificity 91.59%, outperforming average clinicians and other state-of-the-art methods.
This paper investigates the optimization of reconfigurable intelligent surface (RIS) in an integrated sensing and communication (ISAC) system. \red{To meet the demand of growing number of devices, power domain non-orthogonal multiple access (NOMA) is considered. However, traditional NOMA with a large number of devices is challenging due to large decoding delay and propagation error introduced by successive interference cancellation (SIC). Thus, OMA is integrated into NOMA to support more devices}. We formulate a max-min problem to optimize the sensing beampattern \red{with constraints on communication rate}, through joint power allocation, active beamforming and RIS phase shift design. To solve the non-convex problem with a non-smooth objective function, we propose a low complexity alternating optimization (AO) algorithm, where a closed form expression for the intra-cluster power allocation (intra-CPA) is derived, and penalty and successive convex approximation (SCA) methods are used to optimize the beamforming and phase shift design. Simulation results show the effectiveness of the proposed algorithm in terms of improving minimum beampattern gain (MBPG) compared with other baselines. Furthermore, the trade-off between sensing and communication is analyzed and demonstrated in the simulation results.
In this paper, we investigate the fundamental limits of MIMO-OFDM integrated sensing and communications (ISAC) systems based on a Bayesian Cram\'er-Rao bound (BCRB) analysis. We derive the BCRB for joint channel parameter estimation and data symbol detection, in which a performance trade-off between both functionalities is observed. We formulate the optimization problem for a linear precoder design and propose the stochastic Riemannian gradient descent (SRGD) approach to solve the non-convex problem. We analyze the optimality conditions and show that SRGD ensures convergence with high probability. The simulation results verify our analyses and also demonstrate a fast convergence speed. Finally, the performance trade-off is illustrated and investigated.
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.
Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization suffers from two key challenges, i.e., spurious correlation and degeneration, where the predictor overfits the spurious or meaningless pieces solely selected by the not-yet well-trained generator and in turn deteriorates the generator. Although many studies have been proposed to address the two challenges, they are usually designed separately and do not take both of them into account. In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems. The key idea of MGR is to employ multiple generators such that the occurrence stability of real pieces is improved and more meaningful pieces are delivered to the predictor. Empirically, we show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods. Codes are available at https://github.com/jugechengzi/Rationalization-MGR .