Abstract:Total-body PET/CT enables system-wide molecular imaging, but heterogeneous anatomical and metabolic signals, approximately 2 m axial coverage, and structured radiology semantics challenge existing medical AI models that assume single-modality inputs, localized fields of view, and coarse image-text alignment. We introduce SDF-HOLO (Systemic Dual-stream Fusion Holo Model), a multimodal foundation model for holistic total-body PET/CT, pre-trained on more than 10,000 patients. SDF-HOLO decouples CT and PET representation learning with dual-stream encoders and couples them through a cross-modal interaction module, allowing anatomical context to refine PET aggregation while metabolic saliency guides subtle morphological reasoning. To model long-range dependencies across the body, hierarchical context modeling combines efficient local windows with global attention. To bridge voxels and clinical language, we use anatomical segmentation masks as explicit semantic anchors and perform voxel-mask-text alignment during pre-training. Across tumor segmentation, low-dose lesion detection, and multilingual diagnostic report generation, SDF-HOLO outperforms strong task-specific and clinical-reference baselines while reducing localization errors and hallucinated findings. Beyond focal interpretation, the model enables system-wide metabolic profiling and reveals tumor-associated fingerprints of inter-organ metabolic network interactions, providing a scalable computational foundation for total-body PET/CT diagnostics and system-level precision oncology.
Abstract:The rapid advancement of generative AI models has provided users with numerous options to address their prompts. When selecting a generative AI model for a given prompt, users should consider not only the performance of the chosen model but also its associated service cost. The principle guiding such consideration is to select the least expensive model among the available satisfactory options. However, existing model-selection approaches typically prioritize performance, overlooking pricing differences between models. In this paper, we introduce PromptWise, an online learning framework designed to assign a sequence of prompts to a group of large language models (LLMs) in a cost-effective manner. PromptWise strategically queries cheaper models first, progressing to more expensive options only if the lower-cost models fail to adequately address a given prompt. Through numerical experiments, we demonstrate PromptWise's effectiveness across various tasks, including puzzles of varying complexity and code generation/translation tasks. The results highlight that PromptWise consistently outperforms cost-unaware baseline methods, emphasizing that directly assigning prompts to the most expensive models can lead to higher costs and potentially lower average performance.



Abstract:Reinforcement Fine-Tuning (RFT) is proved to be greatly valuable for enhancing the reasoning ability of LLMs. Researchers have been starting to apply RFT to MLLMs, hoping it will also enhance the capabilities of visual understanding. However, these works are at a very early stage and have not examined how suitable RFT actually is for visual tasks. In this work, we endeavor to understand the suitabilities and limitations of RFT for visual tasks, through experimental analysis and observations. We start by quantitative comparisons on various tasks, which shows RFT is generally better than SFT on visual tasks. %especially when the number of training samples are limited. To check whether such advantages are brought up by the reasoning process, we design a new reward that encourages the model to ``think'' more, whose results show more thinking can be beneficial for complicated tasks but harmful for simple tasks. We hope this study can provide more insight for the rapid advancements on this topic.
Abstract:A robust full-space physical layer security (PLS) transmission scheme is proposed in this paper considering the full-space wiretapping challenge of wireless networks supported by simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). Different from the existing schemes, the proposed PLS scheme takes account of the uncertainty on the eavesdropper's position within the 360$^\circ$ service area offered by the STAR-RIS. Specifically, the large system analytical method is utilized to derive the asymptotic expression of the average security rate achieved by the security user, considering that the base station (BS) only has the statistical information of the eavesdropper's channel state information (CSI) and the uncertainty of its location. To evaluate the effectiveness of the proposed PLS scheme, we first formulate an optimization problem aimed at maximizing the weighted sum rate of the security user and the public user. This optimization is conducted under the power allocation constraint, and some practical limitations for STAR-RIS implementation, through jointly designing the active and passive beamforming variables. A novel iterative algorithm based on the minimum mean-square error (MMSE) and cross-entropy optimization (CEO) methods is proposed to effectively address the established non-convex optimization problem with discrete variables. Simulation results indicate that the proposed robust PLS scheme can effectively mitigate the information leakage across the entire coverage area of the STAR-RIS-assisted system, leading to superior performance gain when compared to benchmark schemes encompassing traditional RIS-aided scheme.




Abstract:Selecting a sample generation scheme from multiple text-based generative models is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed framework adapts the kernelized contextual bandit (CB) methodology to a CB setting with shared context variables across arms, utilizing the generated data to update a kernel-based function that predicts which model will achieve the highest score for unseen text prompts. Additionally, we apply random Fourier features (RFF) to the kernelized CB algorithm to accelerate the online learning process and establish a $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bound for the proposed RFF-based CB algorithm over T iterations. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show RFF-UCB performs successfully in identifying the best generation model across different sample types.




Abstract:The conventional reconfigurable intelligent surface (RIS) assisted far-field communication systems can only implement angle beamforming, which actually limits the capability for reconfiguring the wireless propagation environment. To overcome this limitation, this paper proposes a newly designed frequency diverse RIS (FD-RIS), which can achieve joint distance-angle beamforming with the assistance of the time modulation technology. The signal processing model for FD-RIS-aided wireless communications is first derived. Then, an optimization problem aimed at maximizing the achievable rate is formulated where the frequency-time modulations are jointly optimized to achieve distance-angle beamforming. Furthermore, a novel iterative algorithm based on the cross-entropy optimization (CEO) framework is proposed to effectively handle the non-convex optimization problem. The numerical results validate that the proposed FD-RIS assisted communication scheme can achieve a notable performance improvement compared with the baseline scheme utilizing traditional RIS. In addition, the effectiveness of the proposed CEO algorithm is further verified by comparing with the benchmark using the genetic algorithm (GA).




Abstract:Existing frameworks for evaluating and comparing generative models typically target an offline setting, where the evaluator has access to full batches of data produced by the models. However, in many practical scenarios, the goal is to identify the best model using the fewest generated samples to minimize the costs of querying data from the models. Such an online comparison is challenging with current offline assessment methods. In this work, we propose an online evaluation framework to find the generative model that maximizes a standard assessment score among a group of available models. Our method uses an optimism-based multi-armed bandit framework to identify the model producing data with the highest evaluation score, quantifying the quality and diversity of generated data. Specifically, we study the online assessment of generative models based on the Fr\'echet Inception Distance (FID) and Inception Score (IS) metrics and propose the FID-UCB and IS-UCB algorithms leveraging the upper confidence bound approach in online learning. We prove sub-linear regret bounds for these algorithms and present numerical results on standard image datasets, demonstrating their effectiveness in identifying the score-maximizing generative model.
Abstract:Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordable in practical communication systems. In this paper, we propose a symbol-level extrapolation (SLE) strategy to extrapolate the precoding matrix by leveraging the relationship between different symbol slots within in a transmission block, during which the channel state information (CSI) remains constant, where we design a closed-form iterative algorithm based on SLE for both PSK and QAM modulation. In order to further reduce the computational complexity, a sub-optimal closed-form solution based on SLE is further developed for PSK and QAM, respectively. Moreover, we design an unsupervised SLE-based neural network (SLE-Net) to unfold the proposed iterative algorithm, which helps enhance the interpretability of the neural network. By carefully designing the loss function of the SLE-Net, the time-complexity of the network can be reduced effectively. Extensive simulation results illustrate that the proposed algorithms can dramatically reduce the computational complexity and time complexity with only marginal performance loss, compared with the conventional SLP design methods.


Abstract:A Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) scheme with simultaneous wireless information and power transfer (SWIPT) is proposed in this paper. Unlike existing MEC-WPT schemes that disregard the downlink period for returning computing results to the ground equipment (GEs), our proposed scheme actively considers and capitalizes on this period. By leveraging the SWIPT technique, the UAV can simultaneously transmit energy and the computing results during the downlink period. In this scheme, our objective is to maximize the remaining energy among all GEs by jointly optimizing computing task scheduling, UAV transmit and receive beamforming, BS receive beamforming, GEs' transmit power and power splitting ratio for information decoding, time scheduling, and UAV trajectory. We propose an alternating optimization algorithm that utilizes the semidefinite relaxation (SDR), singular value decomposition (SVD), and fractional programming (FP) methods to effectively solve the nonconvex problem. Numerous experiments validate the effectiveness of the proposed scheme.




Abstract:A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) enhanced unnamed aerial vehicle (UAV)-enabled multi-user mobile edge computing (MEC) scheme is proposed in this paper. Different from the existing MEC works, the proposed scheme allows bi-directional offloading where users can simultaneously offload their computing tasks to the MEC servers situated at the ground base station (BS) and aerial UAV with the assistance of the STARRIS. Specifically, we formulate an optimization problem aiming at maximizing the energy efficiency of the system while ensuring the quality of service (QoS) constraint by jointly optimizing the resource allocation, user scheduling, passive beamforming of the STAR-RIS, and the UAV trajectory. An iterative algorithm designed with the Dinkelbach's algorithm and the successive convex approximation (SCA) is proposed to effectively handle the formulated non-convex optimization problem. Simulation results indicate that the proposed STAR-RIS enhanced UAV-enabled MEC scheme possesses significant advantages in enhancing the system energy efficiency over other baseline schemes including the conventional RIS-aided scheme.