This paper investigates physical layer security (PLS) for a movable antenna (MA)-assisted full-duplex (FD) system. In this system, an FD base station (BS) with multiple MAs for transmission and reception provides services for an uplink (UL) user and a downlink (DL) user. Each user operates in half-duplex (HD) mode and is equipped with a single fixed-position antenna (FPA), in the presence of a single-FPA eavesdropper (Eve). To ensure secure communication, artificial noise (AN) is transmitted to obstruct the interception of Eve. The objective of this paper is to maximize the sum secrecy rate (SSR) of the UL and DL users by jointly optimizing the beamformers of the BS and the positions of MAs. This paper also proposes an alternating optimization (AO) method to address the non-convex problem, which decomposes the optimization problem into three subproblems and solves them iteratively. Simulation results demonstrate a significant performance gain in the SSR achieved by the proposed scheme compared to the benchmark schemes.
Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. Current RRG approaches are still unsatisfactory against clinical standards. This paper introduces a novel RRG method, \textbf{LM-RRG}, that integrates large models (LMs) with clinical quality reinforcement learning to generate accurate and comprehensive chest X-ray radiology reports. Our method first designs a large language model driven feature extractor to analyze and interpret different regions of the chest X-ray image, emphasizing specific regions with medical significance. Next, based on the large model's decoder, we develop a multimodal report generator that leverages multimodal prompts from visual features and textual instruction to produce the radiology report in an auto-regressive way. Finally, to better reflect the clinical significant and insignificant errors that radiologists would normally assign in the report, we introduce a novel clinical quality reinforcement learning strategy. It utilizes the radiology report clinical quality (RadCliQ) metric as a reward function in the learning process. Extensive experiments on the MIMIC-CXR and IU-Xray datasets demonstrate the superiority of our method over the state of the art.
Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/receive antennas, we can mitigate the SI and enhance the reception of incoming signals. Thus, this paper proposes a novel MA-enabled point-to-point CCFD system and formulates the minimum achievable rate of two CCFD terminals. To maximize the minimum achievable rate and determine the near-optimal positions of the MAs, we introduce a solution based on projected particle swarm optimization (PPSO), which can circumvent common suboptimal positioning issues. Moreover, numerical results reveal that the PPSO method leads to a better performance compared to the conventional alternating position optimization (APO). The results also demonstrate that an MA-enabled CCFD system outperforms the one using fixed-position antennas (FPAs).
Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for full-duplex (FD) wireless, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/receive antennas, we can mitigate the SI and enhance the reception of incoming signals. Thus, this paper proposes a novel MA-enabled point-to-point FD wireless system and formulates the minimum achievable rate of two FD terminals. To maximize the minimum achievable rate and determine the near-optimal positions of the MAs, we introduce a solution based on projected particle swarm optimization (PPSO), which can circumvent common suboptimal positioning issues. Moreover, numerical results reveal that the PPSO method leads to a better performance compared to the conventional alternating position optimization (APO). The results also demonstrate that an MA-enabled FD system outperforms the one using fixed-position antennas (FPAs).
Panoptic Scene Graph Generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory results in real-world applications. Prior methods predominantly rely on vision information or utilize limited language information, such as object or relation names, thereby overlooking the utility of language information. Leveraging the recent progress in Large Language Models (LLMs), we propose to use language information to assist relation prediction, particularly for rare relations. To this end, we propose the Vision-Language Prompting (VLPrompt) model, which acquires vision information from images and language information from LLMs. Then, through a prompter network based on attention mechanism, it achieves precise relation prediction. Our extensive experiments show that VLPrompt significantly outperforms previous state-of-the-art methods on the PSG dataset, proving the effectiveness of incorporating language information and alleviating the long-tail problem of relations.
Despite the enhanced realism and immersion provided by VR headsets, users frequently encounter adverse effects such as digital eye strain (DES), dry eye, and potential long-term visual impairment due to excessive eye stimulation from VR displays and pressure from the mask. Recent VR headsets are increasingly equipped with eye-oriented monocular cameras to segment ocular feature maps. Yet, to compute the incident light stimulus and observe periocular condition alterations, it is imperative to transform these relative measurements into metric dimensions. To bridge this gap, we propose a lightweight framework derived from the U-Net 3+ deep learning backbone that we re-optimised, to estimate measurable periocular depth maps. Compatible with any VR headset equipped with an eye-oriented monocular camera, our method reconstructs three-dimensional periocular regions, providing a metric basis for related light stimulus calculation protocols and medical guidelines. Navigating the complexities of data collection, we introduce a Dynamic Periocular Data Generation (DPDG) environment based on UE MetaHuman, which synthesises thousands of training images from a small quantity of human facial scan data. Evaluated on a sample of 36 participants, our method exhibited notable efficacy in the periocular global precision evaluation experiment, and the pupil diameter measurement.
In this paper, we study data-driven localized wave solutions and parameter discovery in the massive Thirring (MT) model via the deep learning in the framework of physics-informed neural networks (PINNs) algorithm. Abundant data-driven solutions including soliton of bright/dark type, breather and rogue wave are simulated accurately and analyzed contrastively with relative and absolute errors. For higher-order localized wave solutions, we employ the extended PINNs (XPINNs) with domain decomposition to capture the complete pictures of dynamic behaviors such as soliton collisions, breather oscillations and rogue-wave superposition. In particular, we modify the interface line in domain decomposition of XPINNs into a small interface zone and introduce the pseudo initial, residual and gradient conditions as interface conditions linked adjacently with individual neural networks. Then this modified approach is applied successfully to various solutions ranging from bright-bright soliton, dark-dark soliton, dark-antidark soliton, general breather, Kuznetsov-Ma breather and second-order rogue wave. Experimental results show that this improved version of XPINNs reduce the complexity of computation with faster convergence rate and keep the quality of learned solutions with smoother stitching performance as well. For the inverse problems, the unknown coefficient parameters of linear and nonlinear terms in the MT model are identified accurately with and without noise by using the classical PINNs algorithm.
In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancements in vision-language models, we leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder consisting of attention- and convolution-based prompting schemes for surgical instrument segmentation prediction. Our model leverages multiple text prompts for each surgical instrument through a new mixture of prompts mechanism, resulting in enhanced segmentation performance. Additionally, we introduce a hard instrument area reinforcement module to improve image feature comprehension and segmentation precision. Extensive experiments on EndoVis2017 and EndoVis2018 datasets demonstrate our model's superior performance and promising generalization capability. To our knowledge, this is the first implementation of a promptable approach to surgical instrument segmentation, offering significant potential for practical application in the field of robotic-assisted surgery.
Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlapping relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods.
The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.