Relation extraction typically aims to extract semantic relationships between entities from the unstructured text. One of the most essential data sources for relation extraction is the spoken language, such as interviews and dialogues. However, the error propagation introduced in automatic speech recognition (ASR) has been ignored in relation extraction, and the end-to-end speech-based relation extraction method has been rarely explored. In this paper, we propose a new listening information extraction task, i.e., speech relation extraction. We construct the training dataset for speech relation extraction via text-to-speech systems, and we construct the testing dataset via crowd-sourcing with native English speakers. We explore speech relation extraction via two approaches: the pipeline approach conducting text-based extraction with a pretrained ASR module, and the end2end approach via a new proposed encoder-decoder model, or what we called SpeechRE. We conduct comprehensive experiments to distinguish the challenges in speech relation extraction, which may shed light on future explorations. We share the code and data on https://github.com/wutong8023/SpeechRE.
Processing information locked within clinical health records is a challenging task that remains an active area of research in biomedical NLP. In this work, we evaluate a broad set of machine learning techniques ranging from simple RNNs to specialised transformers such as BioBERT on a dataset containing clinical notes along with a set of annotations indicating whether a sample is cancer-related or not. Furthermore, we specifically employ efficient fine-tuning methods from NLP, namely, bottleneck adapters and prompt tuning, to adapt the models to our specialised task. Our evaluations suggest that fine-tuning a frozen BERT model pre-trained on natural language and with bottleneck adapters outperforms all other strategies, including full fine-tuning of the specialised BioBERT model. Based on our findings, we suggest that using bottleneck adapters in low-resource situations with limited access to labelled data or processing capacity could be a viable strategy in biomedical text mining. The code used in the experiments are going to be made available at https://github.com/omidrohanian/bottleneck-adapters.
Cell-free (CF) massive multiple-input multiple-output (MIMO) is considered as a promising technology for achieving the ultimate performance limit. However, due to its distributed architecture and low-cost access points (APs), the signals received at user equipments (UEs) are most likely asynchronous. In this paper, we investigate the performance of CF massive MIMO systems with asynchronous reception, including both effects of delay and oscillator phases. Taking into account the imperfect channel state information caused by phase asynchronization and pilot contamination, we obtain novel and closed-form downlink spectral efficiency (SE) expressions with coherent and non-coherent data transmission schemes, respectively. Simulation results show that asynchronous reception destroys the orthogonality of pilots and coherent transmission of data, and thus results in poor system performance. In addition, getting a highly accurate delay phase is substantial for CF massive MIMO systems to achieve coherent transmission gain. Moreover, the oscillator phase of UEs has a larger effect on SE than that of the APs, because the latter can be significantly reduced by increasing the number of antennas.
This document outlines a PROSPERO pre-registered protocol for a systematic review regarding articulatory changes in speech following oral or orophayrngeal cancer treatment. Treatment of tumours in the oral cavity may result in physiological changes that could lead to articulatory difficulties. The tongue becomes less mobile due to scar tissue and/or potential (postoperative) radiation therapy. Moreover, tissue loss may create a bypass for airflow or limit constriction possibilities. In order to gain a better understanding of the nature of the speech problems, information regarding the movement of the articulators is needed since perceptual or acoustic information provide only indirect evidence of articulatory changes. Therefore, this systematic review will review studies that directly measured the articulatory movements of the tongue, jaw, and lips following treatment for oral or oropharyngeal cancer.
Cyber-resilience is an increasing concern in developing autonomous navigation solutions for marine vessels. This paper scrutinizes cyber-resilience properties of marine navigation through a prism with three edges: multiple sensor information fusion, diagnosis of not-normal behaviours, and change detection. It proposes a two-stage estimator for diagnosis and mitigation of sensor signals used for coastal navigation. Developing a Likelihood Field approach, a first stage extracts shoreline features from radar and matches them to the electronic navigation chart. A second stage associates buoy and beacon features from the radar with chart information. Using real data logged at sea tests combined with simulated spoofing, the paper verifies the ability to timely diagnose and isolate an attempt to compromise position measurements. A new approach is suggested for high level processing of received data to evaluate their consistency, that is agnostic to the underlying technology of the individual sensory input. A combined parametric Gaussian modelling and Kernel Density Estimation is suggested and compared with a generalized likelihood ratio change detector that uses sliding windows. The paper shows how deviations from nominal behaviour and isolation of the components is possible when under attack or when defects in sensors occur.
Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of points prevents us from using convolutional filters. Here we introduce the composite layer, a new convolutional operator for point clouds. The peculiarity of our composite layer is that it extracts and compresses the spatial information from the position of points before combining it with their feature vectors. Compared to well-known point-convolutional layers such as those of ConvPoint and KPConv, our composite layer provides additional regularization and guarantees greater flexibility in terms of design and number of parameters. To demonstrate the design flexibility, we also define an aggregate composite layer that combines spatial information and features in a nonlinear manner, and we use these layers to implement a convolutional and an aggregate CompositeNet. We train our CompositeNets to perform classification and, most remarkably, unsupervised anomaly detection. Our experiments on synthetic and real-world datasets show that, in both tasks, our CompositeNets outperform ConvPoint and achieve similar results as KPConv despite having a much simpler architecture. Moreover, our CompositeNets substantially outperform existing solutions for anomaly detection on point clouds.
Funding agencies are largely relied on a topic matching between domain experts and research proposals to assign proposal reviewers. As proposals are increasingly interdisciplinary, it is challenging to profile the interdisciplinary nature of a proposal, and, thereafter, find expert reviewers with an appropriate set of expertise. An essential step in solving this challenge is to accurately model and classify the interdisciplinary labels of a proposal. Existing methodological and application-related literature, such as textual classification and proposal classification, are insufficient in jointly addressing the three key unique issues introduced by interdisciplinary proposal data: 1) the hierarchical structure of discipline labels of a proposal from coarse-grain to fine-grain, e.g., from information science to AI to fundamentals of AI. 2) the heterogeneous semantics of various main textual parts that play different roles in a proposal; 3) the number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research. Can we simultaneously address the three issues in understanding the proposal's interdisciplinary nature? In response to this question, we propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp. H-MixUp leverages a transformer-based semantic information extractor and a GCN-based interdisciplinary knowledge extractor for the first and second issues. H-MixUp develops a fused training method of Wold-level MixUp, Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the third issue.
Semantic-oriented communication has been considered as a promising to boost the bandwidth efficiency by only transmitting the semantics of the data. In this paper, we propose a multi-level semantic aware communication system for wireless image transmission, named MLSC-image, which is based on the deep learning techniques and trained in an end to end manner. In particular, the proposed model includes a multilevel semantic feature extractor, that extracts both the highlevel semantic information, such as the text semantics and the segmentation semantics, and the low-level semantic information, such as local spatial details of the images. We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics. These high-level and low-level semantic features are then combined and encoded by a joint semantic and channel encoder into symbols to transmit over the physical channel. The numerical results validate the effectiveness and efficiency of the proposed semantic communication system, especially under the limited bandwidth condition, which indicates the advantages of the high-level semantics in the compression of images.
This paper describes our approach to the CRAC 2022 Shared Task on Multilingual Coreference Resolution. Our model is based on a state-of-the-art end-to-end coreference resolution system. Apart from joined multilingual training, we improved our results with mention head prediction. We also tried to integrate dependency information into our model. Our system ended up in $3^{rd}$ place. Moreover, we reached the best performance on two datasets out of 13.
The advent of deep learning in the past decade has significantly helped advance image inpainting. Although achieving promising performance, deep learning-based inpainting algorithms still struggle from the distortion caused by the fusion of structural and contextual features, which are commonly obtained from, respectively, deep and shallow layers of a convolutional encoder. Motivated by this observation, we propose a novel progressive inpainting network that maintains the structural and contextual integrity of a processed image. More specifically, inspired by the Gaussian and Laplacian pyramids, the core of the proposed network is a feature extraction module named GLE. Stacking GLE modules enables the network to extract image features from different image frequency components. This ability is important to maintain structural and contextual integrity, for high frequency components correspond to structural information while low frequency components correspond to contextual information. The proposed network utilizes the GLE features to progressively fill in missing regions in a corrupted image in an iterative manner. Our benchmarking experiments demonstrate that the proposed method achieves clear improvement in performance over many state-of-the-art inpainting algorithms.