Integrated sensing and communication (ISAC) has attracted growing interests for enabling the future 6G wireless networks, due to its capability of sharing spectrum and hardware resources between communication and sensing systems. However, existing works on ISAC usually need to modify the communication protocol to cater for the new sensing performance requirement, which may be difficult to implement in practice. In this paper, we study a new intelligent reflecting surface (IRS) aided millimeter-wave (mmWave) ISAC system by exploiting the distinct beam scanning operation in mmWave communications to achieve efficient sensing at the same time. First, we propose a two-phase ISAC protocol aided by a semi-passive IRS, consisting of beam scanning and data transmission. Specifically, in the beam scanning phase, the IRS finds the optimal beam for reflecting signals from the base station to a communication user via its passive elements. Meanwhile, the IRS directly estimates the angle of a nearby target based on echo signals from the target using its equipped active sensing element. Then, in the data transmission phase, the sensing accuracy is further improved by leveraging the data signals via possible IRS beam splitting. Next, we derive the achievable rate of the communication user as well as the Cram\'er-Rao bound and the approximate mean square error of the target angle estimation Finally, extensive simulation results are provided to verify our analysis as well as the effectiveness of the proposed scheme.
Tracking ripening tomatoes is time consuming and labor intensive. Artificial intelligence technologies combined with those of computer vision can help users optimize the process of monitoring the ripening status of plants. To this end, we have proposed a tomato ripening monitoring approach based on deep learning in complex scenes. The objective is to detect mature tomatoes and harvest them in a timely manner. The proposed approach is declined in two parts. Firstly, the images of the scene are transmitted to the pre-processing layer. This process allows the detection of areas of interest (area of the image containing tomatoes). Then, these images are used as input to the maturity detection layer. This layer, based on a deep neural network learning algorithm, classifies the tomato thumbnails provided to it in one of the following five categories: green, brittle, pink, pale red, mature red. The experiments are based on images collected from the internet gathered through searches using tomato state across diverse languages including English, German, French, and Spanish. The experimental results of the maturity detection layer on a dataset composed of images of tomatoes taken under the extreme conditions, gave a good classification rate.
This paper addresses the beam-selection challenges in Multi-User Multiple Input Multiple Output (MU-MIMO) beamforming for mm-wave and THz channels, focusing on the pivotal aspect of spectral efficiency (SE) and computational efficiency. We introduce a novel approach, the Greedy Interference-Optimized Singular Vector Beam-selection (G-IOSVB) algorithm, which offers a strategic balance between high SE and low computational complexity. Our study embarks on a comparative analysis of G-IOSVB against the traditional IOSVB and the exhaustive Singular-Vector Beamspace Search (SVBS) algorithms. The findings reveal that while SVBS achieves the highest SE, it incurs significant computational costs, approximately 162 seconds per channel realization. In contrast, G-IOSVB aligns closely with IOSVB in SE performance yet is markedly more computationally efficient. Heatmaps vividly demonstrate this efficiency, highlighting G-IOSVB's reduced computation time without sacrificing SE. We also delve into the mathematical intricacies of G-IOSVB, demonstrating its theoretical and practical superiority through rigorous expressions and detailed algorithmic analysis. The numerical results illustrate that G-IOSVB stands out as an efficient, practical solution for MU-MIMO systems, making it a promising candidate for high-speed, high-efficiency wireless communication networks.
Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction. To improve the user experience, it is important to consider the user's input habits, segmentation quality, running time and memory consumption.However, existing methods compromise user experience with single input mode and slow running speed. Specifically, these methods only allow the user to interact with one single frame, which limits the expression of the user's intent.To overcome these limitations and better align with people's usage habits, we propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF). Concretely, we designed the Across-Frame Interaction Module that enables users to annotate different objects freely on multiple frames. The AFI module will migrate scribble information among multiple interactive frames and generate multi-frame masks. Additionally, we employ the id-queried mechanism to process multiple objects in batches. Furthermore, for a more efficient propagation and lightweight model, we design a truncated re-propagation strategy to replace the previous multi-round fusion module, which employs an across-round memory that stores important interaction information. Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60). Moreover, our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios.
State of the art Named Entity Recognition (NER) models have achieved an impressive ability to extract common phrases from text that belong to labels such as location, organization, time, and person. However, typical NER systems that rely on having seen a specific entity in their training data in order to label an entity perform poorly on rare or unseen entities ta in order to label an entity perform poorly on rare or unseen entities (Derczynski et al., 2017). This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name. In order for downstream tasks to not exhibit bias based on cultural background, a model should perform well on names from a variety of backgrounds. In this paper I experiment with the training data and input structure of an English Bi-LSTM name recognition model. I look at names from 103 countries to compare how well the model performs on names from different cultures, specifically in the context of a downstream task where extracted names will be matched to information on file. I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models that are not geared toward identifying unseen entity values.
Introduction Quantum Convolutional Neural Network (QCNN)-Long Short-Term Memory (LSTM) models were studied to provide sequential relationships for each timepoint in MRIs of patients with Multiple Sclerosis (MS). In this pilot study, we compared three QCNN-LSTM models for binary classification of MS disability benchmarked against classical neural network architectures. Our hypothesis is that quantum models will provide competitive performance. Methods Matrix Product State (MPS), reverse Multistate Entanglement Renormalization Ansatz (MERA), and Tree-Tensor Network (TTN) circuits were paired with LSTM layer to process near-annual MRI data of patients diagnosed with MS. These were benchmarked against a Visual Geometry Group (VGG)-LSTM and a Video Vision Transformer (ViViT). Predicted logits were measured against ground truth labels of each patient's Extended Disability Severity Score (EDSS) using binary cross-entropy loss. Training/validation/holdout testing was partitioned using 5-fold cross validation with a total split of 60:20:20. Levene's test of variance was used to measure statistical difference and Student's t-test for paired model differences in mean. Results The MPS-LSTM, reverse MERA-LSTM, and TTN-LSTM had holdout testing ROC-AUC of 0.70, 0.77, and 0.81, respectively (p-value 0.915). VGG16-LSTM and ViViT performed similarly with ROC-AUC of 0.73 and 0.77, respectively (p-value 0.631). Overall variance and mean were not statistically significant (p-value 0.713), however, time to train was significantly faster for the QCNN-LSTMs (39.4 sec per fold vs. 224 and 218, respectively, p-value <0.001). Conclusion QCNN-LSTM models perform competitively to their classical counterparts with greater efficiency in train time. Clinically, these can add value in terms of efficiency to time-dependent deep learning prediction of disease progression based upon medical imaging.
Despite the advancement in computational modeling towards brain tumor segmentation, of which several models have been developed, it is evident from the computational complexity of existing models which are still at an all-time high, that performance and efficiency under clinical application scenarios are limited. Therefore, this paper proposes a shallow encoder and decoder network named SEDNet for brain tumor segmentation. The proposed network is adapted from the U-Net structure. Though brain tumors do not assume complex structures like the task the traditional U-Net was designed for, their variance in appearance, shape, and ambiguity of boundaries makes it a compelling complex task to solve. SEDNet architecture design is inspired by the localized nature of brain tumors in brain images, thus consists of sufficient hierarchical convolutional blocks in the encoding pathway capable of learning the intrinsic features of brain tumors in brain slices, and a decoding pathway with selective skip path sufficient for capturing miniature local-level spatial features alongside the global-level features of brain tumor. SEDNet with the integration of the proposed preprocessing algorithm and optimization function on the BraTS2020 set reserved for testing achieves impressive dice and Hausdorff scores of 0.9308, 0.9451, 0.9026, and 0.7040, 1.2866, 0.7762 for non-enhancing tumor core (NTC), peritumoral edema (ED), and enhancing tumor (ET), respectively. Furthermore, through transfer learning with initialized SEDNet pre-trained weights, termed SEDNetX, a performance increase is observed. The dice and Hausdorff scores recorded are 0.9336, 0.9478, 0.9061, 0.6983, 1.2691, and 0.7711 for NTC, ED, and ET, respectively. With about 1.3 million parameters and impressive performance in comparison to the state-of-the-art, SEDNet(X) is shown to be computationally efficient for real-time clinical diagnosis.
Although catheter ablation (CA) is still the first-line treatment for persistent atrial fibrillation (AF) patients, its limited long-term success rate has motivated clinical interest in preoperative prediction on the procedures outcome to provide optimized patient selection, limit repeated procedures, hospitalization rates, and treatment costs. To this respect, dominant frequency (DF) and amplitude of fibrillatory waves (f-waves) reflected on the ECG have provided promising results. Hence this work explores the ability of a novel set of frequency and amplitud f-waves features, such as spectral entropy (SE), spectral flatness measure (SFM), and amplitud spectrum area (AMSA), along with DF and normalized f-wave amplitude (NFWA), to improve CA outcome prediction. Despite all single indices reported statistically significant differences between patients who relapsed to AF and those who maintained sinus rhythm after a follow up of 9 months for 204 6 s-length ECG intervals extracted from 51 persistent AF patients, they obtained a limited discriminant ability ranging between 55 and 62%, which was overcome by 15 - 23% when NFWA, SE and AMSA were combined. Consequently, this combination of frequency and amplitude features of the fwaves seems to provide new insights about the atrial substrate remodeling, which could be helpful in improving preoperative CA outcome prediction.
Maritime transport is a pivotal logistics mode for the long-distance and bulk transportation of goods. However, the intricate planning involved in this mode is often hindered by uncertainties, including weather conditions, cargo diversity, and port dynamics, leading to increased costs. Consequently, accurately estimating vessel total (stay) time at port and potential delays becomes imperative for effective planning and scheduling in port operations. This study aims to develop a port operation solution with competitive prediction and classification capabilities for estimating vessel Total and Delay times. This research addresses a significant gap in port analysis models for vessel Stay and Delay times, offering a valuable contribution to the field of maritime logistics. The proposed solution is designed to assist decision-making in port environments and predict service delays. This is demonstrated through a case study on Brazil ports. Additionally, feature analysis is used to understand the key factors impacting maritime logistics, enhancing the overall understanding of the complexities involved in port operations.
This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs). ServerlessLLM exploits the substantial capacity and bandwidth of storage and memory devices available on GPU servers, thereby reducing costly remote checkpoint downloads and achieving efficient checkpoint loading. ServerlessLLM achieves this through three main contributions: (i) fast LLM checkpoint loading via a novel loading-optimized checkpoint format design, coupled with an efficient multi-tier checkpoint loading system; (ii) locality-driven LLM inference with live migration, which allows ServerlessLLM to effectively achieve locality-driven server allocation while preserving the low latency of ongoing LLM inference; and (iii) locality-aware server allocation, enabling ServerlessLLM to evaluate the status of each server in a cluster and effectively schedule model startup time to capitalize on local checkpoint placement. Our comprehensive experiments, which include microbenchmarks and real-world traces, show that ServerlessLLM surpasses state-of-the-art systems by 10 - 200X in latency performance when running various LLM inference workloads.