Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Language Models (LLMs) trained on both codes and texts have demonstrated powerful capabilities of transforming texts into codes, which provides a feasible solution to IE tasks. Therefore, in this paper, we propose a universal retrieval-augmented code generation framework based on LLMs, called Code4UIE, for IE tasks. Specifically, Code4UIE adopts Python classes to define task-specific schemas of various structural knowledge in a universal way. By so doing, extracting knowledge under these schemas can be transformed into generating codes that instantiate the predefined Python classes with the information in texts. To generate these codes more precisely, Code4UIE adopts the in-context learning mechanism to instruct LLMs with examples. In order to obtain appropriate examples for different tasks, Code4UIE explores several example retrieval strategies, which can retrieve examples semantically similar to the given texts. Extensive experiments on five representative IE tasks across nine datasets demonstrate the effectiveness of the Code4UIE framework.
The Knowledge Base Question Answering (KBQA) task aims to answer natural language questions based on a given knowledge base. As a kind of common method for this task, semantic parsing-based ones first convert natural language questions to logical forms (e.g., SPARQL queries) and then execute them on knowledge bases to get answers. Recently, Large Language Models (LLMs) have shown strong abilities in language understanding and may be adopted as semantic parsers in such kinds of methods. However, in doing so, a great challenge for LLMs is to understand the schema of knowledge bases. Therefore, in this paper, we propose an In-Context Schema Understanding (ICSU) method for facilitating LLMs to be used as a semantic parser in KBQA. Specifically, ICSU adopts the In-context Learning mechanism to instruct LLMs to generate SPARQL queries with examples. In order to retrieve appropriate examples from annotated question-query pairs, which contain comprehensive schema information related to questions, ICSU explores four different retrieval strategies. Experimental results on the largest KBQA benchmark, KQA Pro, show that ICSU with all these strategies outperforms that with a random retrieval strategy significantly (from 12\% to 78.76\% in accuracy).
Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE.
Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer events and as triggers of inner events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NEE methods, as they cannot well cope with the dual identities of PEs. Therefore, this paper proposes a new model, called PerNee, which extracts nested events mainly based on recognizing PEs. Specifically, PerNee first recognizes the triggers of both inner and outer events and further recognizes the PEs via classifying the relation type between trigger pairs. In order to obtain better representations of triggers and arguments to further improve NEE performance, it incorporates the information of both event types and argument roles into PerNee through prompt learning. Since existing NEE datasets (e.g., Genia11) are limited to specific domains and contain a narrow range of event types with nested structures, we systematically categorize nested events in generic domain and construct a new NEE dataset, namely ACE2005-Nest. Experimental results demonstrate that PerNee consistently achieves state-of-the-art performance on ACE2005-Nest, Genia11 and Genia13.
Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations. Specifically, ProtoEM extracts event relations in a two-step manner, i.e., prototype representing and prototype matching. In the first step, to capture the connotations of different event relations, ProtoEM utilizes examples to represent the prototypes corresponding to these relations. Subsequently, to capture the interdependence among event relations, it constructs a dependency graph for the prototypes corresponding to these relations and utilized a Graph Neural Network (GNN)-based module for modeling. In the second step, it obtains the representations of new event pairs and calculates their similarity with those prototypes obtained in the first step to evaluate which types of event relations they belong to. Experimental results on the MAVEN-ERE dataset demonstrate that the proposed ProtoEM framework can effectively represent the prototypes of event relations and further obtain a significant improvement over baseline models.
Robust 3D perception under corruption has become an essential task for the realm of 3D vision. While current data augmentation techniques usually perform random transformations on all point cloud objects in an offline way and ignore the structure of the samples, resulting in over-or-under enhancement. In this work, we propose an alternative to make sample-adaptive transformations based on the structure of the sample to cope with potential corruption via an auto-augmentation framework, named as AdaptPoint. Specially, we leverage a imitator, consisting of a Deformation Controller and a Mask Controller, respectively in charge of predicting deformation parameters and producing a per-point mask, based on the intrinsic structural information of the input point cloud, and then conduct corruption simulations on top. Then a discriminator is utilized to prevent the generation of excessive corruption that deviates from the original data distribution. In addition, a perception-guidance feedback mechanism is incorporated to guide the generation of samples with appropriate difficulty level. Furthermore, to address the paucity of real-world corrupted point cloud, we also introduce a new dataset ScanObjectNN-C, that exhibits greater similarity to actual data in real-world environments, especially when contrasted with preceding CAD datasets. Experiments show that our method achieves state-of-the-art results on multiple corruption benchmarks, including ModelNet-C, our ScanObjectNN-C, and ShapeNet-C.
Endoscopy is a widely used technique for the early detection of diseases or robotic-assisted minimally invasive surgery (RMIS). Numerous deep learning (DL)-based research works have been developed for automated diagnosis or processing of endoscopic view. However, existing DL models may suffer from catastrophic forgetting. When new target classes are introduced over time or cross institutions, the performance of old classes may suffer severe degradation. More seriously, data privacy and storage issues may lead to the unavailability of old data when updating the model. Therefore, it is necessary to develop a continual learning (CL) methodology to solve the problem of catastrophic forgetting in endoscopic image segmentation. To tackle this, we propose a Endoscopy Continual Semantic Segmentation (EndoCSS) framework that does not involve the storage and privacy issues of exemplar data. The framework includes a mini-batch pseudo-replay (MB-PR) mechanism and a self-adaptive noisy cross-entropy (SAN-CE) loss. The MB-PR strategy circumvents privacy and storage issues by generating pseudo-replay images through a generative model. Meanwhile, the MB-PR strategy can also correct the model deviation to the replay data and current training data, which is aroused by the significant difference in the amount of current and replay images. Therefore, the model can perform effective representation learning on both new and old tasks. SAN-CE loss can help model fitting by adjusting the model's output logits, and also improve the robustness of training. Extensive continual semantic segmentation (CSS) experiments on public datasets demonstrate that our method can robustly and effectively address the catastrophic forgetting brought by class increment in endoscopy scenes. The results show that our framework holds excellent potential for real-world deployment in a streaming learning manner.
In the realm of modern diagnostic technology, video capsule endoscopy (VCE) is a standout for its high efficacy and non-invasive nature in diagnosing various gastrointestinal (GI) conditions, including obscure bleeding. Importantly, for the successful diagnosis and treatment of these conditions, accurate recognition of bleeding regions in VCE images is crucial. While deep learning-based methods have emerged as powerful tools for the automated analysis of VCE images, they often demand large training datasets with comprehensive annotations. Acquiring these labeled datasets tends to be time-consuming, costly, and requires significant domain expertise. To mitigate this issue, we have embraced a semi-supervised learning (SSL) approach for the bleeding regions segmentation within VCE. By adopting the `Mean Teacher' method, we construct a student U-Net equipped with an scSE attention block, alongside a teacher model of the same architecture. These models' parameters are alternately updated throughout the training process. We use the Kvasir-Capsule dataset for our experiments, which encompasses various GI bleeding conditions. Notably, we develop the segmentation annotations for this dataset ourselves. The findings from our experiments endorse the efficacy of the SSL-based segmentation strategy, demonstrating its capacity to reduce reliance on large volumes of annotations for model training, without compromising on the accuracy of identification.
Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long-range dependence. However, current DeTR requires long iterations to converge, and does not perform well in detecting small objects. This paper proposes a transformer-based landmark detection solution with deformable DeTR and the semantic-aligned-matching module for detecting landmarks in robot-assisted intubation. The semantics aligner can effectively align the semantics of object queries and image features in the same embedding space using the most discriminative features. To evaluate the performance of our solution, we utilize a publicly accessible glottis dataset and automatically annotate a nostril detection dataset. The experimental results demonstrate our competitive performance in detection accuracy. Our code is publicly accessible.
The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education. Except for providing text-based answers, the VQLA system can highlight the interested region for better surgical scene understanding. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge. Specifically, when DNNs learn on incremental classes or tasks, their performance on old tasks drops dramatically. Furthermore, due to medical data privacy and licensing issues, it is often difficult to access old data when updating continual learning (CL) models. Therefore, we develop a non-exemplar continual surgical VQLA framework, to explore and balance the rigidity-plasticity trade-off of DNNs in a sequential learning paradigm. We revisit the distillation loss in CL tasks, and propose rigidity-plasticity-aware distillation (RP-Dist) and self-calibrated heterogeneous distillation (SH-Dist) to preserve the old knowledge. The weight aligning (WA) technique is also integrated to adjust the weight bias between old and new tasks. We further establish a CL framework on three public surgical datasets in the context of surgical settings that consist of overlapping classes between old and new surgical VQLA tasks. With extensive experiments, we demonstrate that our proposed method excellently reconciles learning and forgetting on the continual surgical VQLA over conventional CL methods. Our code is publicly accessible.