Self-supervised learning (SSL) has emerged as a promising technique for medical image analysis due to its ability to learn without annotations. However, despite the promising potential, conventional SSL methods encounter limitations, including challenges in achieving semantic alignment and capturing subtle details. This leads to suboptimal representations, which fail to accurately capture the underlying anatomical structures and pathological details. In response to these constraints, we introduce a novel SSL framework OPTiML, employing optimal transport (OT), to capture the dense semantic invariance and fine-grained details, thereby enhancing the overall effectiveness of SSL in medical image representation learning. The core idea is to integrate OT with a cross-viewpoint semantics infusion module (CV-SIM), which effectively captures complex, fine-grained details inherent in medical images across different viewpoints. In addition to the CV-SIM module, OPTiML imposes the variance and covariance regularizations within OT framework to force the model focus on clinically relevant information while discarding less informative features. Through these, the proposed framework demonstrates its capacity to learn semantically rich representations that can be applied to various medical imaging tasks. To validate its effectiveness, we conduct experimental studies on three publicly available datasets from chest X-ray modality. Our empirical results reveal OPTiML's superiority over state-of-the-art methods across all evaluated tasks.
Self-supervised learning (SSL) is potentially useful in reducing the need for manual annotation and making deep learning models accessible for medical image analysis tasks. By leveraging the representations learned from unlabeled data, self-supervised models perform well on tasks that require little to no fine-tuning. However, for medical images, like chest X-rays, which are characterized by complex anatomical structures and diverse clinical conditions, there arises a need for representation learning techniques that can encode fine-grained details while preserving the broader contextual information. In this context, we introduce MLVICX (Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning), an approach to capture rich representations in the form of embeddings from chest X-ray images. Central to our approach is a novel multi-level variance and covariance exploration strategy that empowers the model to detect diagnostically meaningful patterns while reducing redundancy effectively. By enhancing the variance and covariance of the learned embeddings, MLVICX promotes the retention of critical medical insights by adapting both global and local contextual details. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning through comprehensive experiments. The performance enhancements we observe across various downstream tasks highlight the significance of the proposed approach in enhancing the utility of chest X-ray embeddings for precision medical diagnosis and comprehensive image analysis. For pertaining, we used the NIH-Chest X-ray dataset, while for downstream tasks, we utilized NIH-Chest X-ray, Vinbig-CXR, RSNA pneumonia, and SIIM-ACR Pneumothorax datasets. Overall, we observe more than 3% performance gains over SOTA SSL approaches in various downstream tasks.
Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization.
In this paper, we study underlay device-to-device (D2D) communication systems empowered by a reconfigurable intelligent surface (RIS) for cognitive cellular networks. Considering Rayleigh fading channels and the general case where there exist both the direct and RIS-enabled D2D channels, the outage probability (OP) of the D2D communication link is presented in closed-form. Next, for the considered RIS-empowered underlaid D2D system, we frame an OP minimization problem. We target the joint optimization of the transmit power at the D2D source and the RIS placement, under constraints on the transmit power at the D2D source and on the limited interference imposed on the cellular user for two RIS deployment topologies. Due to the coupled optimization variables, the formulated optimization problem is extremely intractable. We propose an equivalent transformation which we are able to solve analytically. In the transformed problem, an expression for the average value of the signal-to-interference-noise ratio (SINR) at the D2D receiver is derived in closed-form. Our theoretical derivations are corroborated through simulation results, and various system design insights are deduced. It is indicatively showcased that the proposed RIS-empowered underlaid D2D system design outperforms the benchmark semi-adaptive optimal power and optimal distance schemes, offering $44\%$ and $20\%$ performance improvement, respectively.
Non-orthogonal multiple access (NOMA) has come to the fore as a spectral-efficient technique for fifth-generation and beyond communication networks. We consider the downlink of a NOMA system with untrusted users. In order to consider a more realistic scenario, imperfect successive interference cancellation is assumed at the receivers during the decoding process. Since pair outage probability (POP) ensures a minimum rate guarantee to each user, it behaves as a measure of the quality of service for the pair of users. With the objective of designing a reliable communication protocol, we derive the closed-form expression of POP. Further, we find the optimal power allocation that minimizes the POP. Lastly, numerical results have been presented which validate the exactness of the analysis, and reveal the effect of various key parameters on achieved pair outage performance. In addition, we benchmark optimal power allocation against equal and fixed power allocations with respect to POP. The results indicate that optimal power allocation results in improved communication reliability.
Non-orthogonal multiple access (NOMA) serves multiple users simultaneously via the same resource block by exploiting superposition coding at the transmitter and successive interference cancellation (SIC) at the receivers. Under practical considerations, perfect SIC may not be achieved. Thus, residual interference (RI) occurs inevitably due to imperfect SIC. In this work, we first propose a novel model for characterizing RI to provide a more realistic secrecy performance analysis of a downlink NOMA system under imperfect SIC at receivers. In the presence of untrusted users, NOMA has an inherent security flaw. Therefore, for this untrusted users' scenario, we derive new analytical expressions of secrecy outage probability (SOP) for each user in a two-user untrusted NOMA system by using the proposed RI model. To further shed light on the obtained results and obtain a deeper understanding, a high signal-to-noise ratio approximation of the SOPs is also obtained. Lastly, numerical investigations are provided to validate the accuracy of the desired analytical results and present valuable insights into the impact of various system parameters on the secrecy rate performance of the secure NOMA communication system.
Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a FM for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.
Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is critical in high-risk domains. In this paper, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to cutout, mixup, and CutMix, with knowledge distillation. We extend our approach beyond traditional knowledge distillation and find it suitable for Relational Knowledge Distillation and Contrastive Representation Distillation as well. The novelty of the work is that it provides a framework to distill a calibrated student from an uncalibrated teacher model without compromising the accuracy of the distilled student. We perform extensive experiments to validate our approach on various datasets, including CIFAR-10, CIFAR-100, CINIC-10 and TinyImageNet, and obtained calibrated student models. We also observe robust performance of our approach while evaluating it on corrupted CIFAR-100C data.
Graphene-based intelligent reflecting surface (GIRS) has been proved to provide a promising propagation environment to enhance the quality of high frequency terahertz (THz) wireless communication. In this paper, we characterize GIRS for THz communication (GITz) using material specific parameters of graphene to tune the reflection of the incident wave at IRS. In particular, we propose a GITz design model considering the incident signal frequency material level parameters like conductivity, Fermi-level, patch width to control the reflection amplitude (RA) at the communication receiver. We have obtained the closed-form expression of RA for an accurate design and characterization of GIRS, which is incomplete in the existing research due to the inclusion of only phase-shift. The numerical simulation results demonstrate the effectiveness of the proposed characterization by providing key insights.