Feature extraction and description is an important topic of computer vision, as it is the starting point of a number of tasks such as image reconstruction, stitching, registration, and recognition among many others. In this paper, two new image features are proposed: the Information Ratio (IR) and the Mutual Information Ratio (MIR). The IR is a feature of a single image, while the MIR describes features common across two or more images.We begin by introducing the IR and the MIR and motivate these features in an information theoretical context as the ratio of the self-information of an intensity level over the information contained over the pixels of the same intensity. Notably, the relationship of the IR and MIR with the image entropy and mutual information, classic information measures, are discussed. Finally, the effectiveness of these features is tested through feature extraction over INRIA Copydays datasets and feature matching over the Oxfords Affine Covariant Regions. These numerical evaluations validate the relevance of the IR and MIR in practical computer vision tasks
In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent in any existing triples; or they perform predicate-level mapping and completely ignore background evidence from individual entities, thus cannot achieve satisfying quality. We propose OpenKI to handle sparsity of OpenIE extractions by performing instance-level inference: for each entity, we encode the rich information in its neighborhood in both KB and OpenIE extractions, and leverage this information in relation inference by exploring different methods of aggregation and attention. In order to handle unseen entities, our model is designed without creating entity-specific parameters. Extensive experiments show that this method not only significantly improves state-of-the-art for conventional OpenIE extractions like ReVerb, but also boosts the performance on OpenIE from semi-structured data, where new entity pairs are abundant and data are fairly sparse.
Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses. The 8th task of SemEval-2021: Counts and Measurements (MeasEval) aimed to boost research in this direction by providing a new dataset on which participants train their models to extract meaningful information on measurements from scientific texts. The competition is composed of five subtasks that build on top of each other: (1) quantity span identification, (2) unit extraction from the identified quantities and their value modifier classification, (3) span identification for measured entities and measured properties, (4) qualifier span identification, and (5) relation extraction between the identified quantities, measured entities, measured properties, and qualifiers. We approached these challenges by first identifying the quantities, extracting their units of measurement, classifying them with corresponding modifiers, and afterwards using them to jointly solve the last three subtasks in a multi-turn question answering manner. Our best performing model obtained an overlapping F1-score of 36.91% on the test set.
Autonomous parking systems start with the detection of available parking slots. Parking slot detection performance has been dramatically improved by deep learning techniques. Deep learning-based object detection methods can be categorized into one-stage and two-stage approaches. Although it is well-known that the two-stage approach outperforms the one-stage approach in general object detection, they have performed similarly in parking slot detection so far. We consider this is because the two-stage approach has not yet been adequately specialized for parking slot detection. Thus, this paper proposes a highly specialized two-stage parking slot detector that uses region-specific multi-scale feature extraction. In the first stage, the proposed method finds the entrance of the parking slot as a region proposal by estimating its center, length, and orientation. The second stage of this method designates specific regions that most contain the desired information and extracts features from them. That is, features for the location and orientation are separately extracted from only the specific regions that most contain the locational and orientational information. In addition, multi-resolution feature maps are utilized to increase both positioning and classification accuracies. A high-resolution feature map is used to extract detailed information (location and orientation), while another low-resolution feature map is used to extract semantic information (type and occupancy). In experiments, the proposed method was quantitatively evaluated with two large-scale public parking slot detection datasets and outperformed previous methods, including both one-stage and two-stage approaches.
Relation Extraction (RE), the task of detecting and characterizing semantic relations between entities in text, has gained much importance in the last two decades, mainly in the biomedical domain. Many papers have been published on Relation Extraction using supervised machine learning techniques. Most of these techniques rely on statistical methods, such as feature-based and tree-kernels-based methods. Such statistical learning techniques are usually based on a propositional hypothesis space for representing examples, i.e., they employ an attribute-value representation of features. This kind of representation has some drawbacks, particularly in the extraction of complex relations which demand more contextual information about the involving instances, i.e., it is not able to effectively capture structural information from parse trees without loss of information. In this work, we present OntoILPER, a logic-based relational learning approach to Relation Extraction that uses Inductive Logic Programming for generating extraction models in the form of symbolic extraction rules. OntoILPER takes profit of a rich relational representation of examples, which can alleviate the aforementioned drawbacks. The proposed relational approach seems to be more suitable for Relation Extraction than statistical ones for several reasons that we argue. Moreover, OntoILPER uses a domain ontology that guides the background knowledge generation process and is used for storing the extracted relation instances. The induced extraction rules were evaluated on three protein-protein interaction datasets from the biomedical domain. The performance of OntoILPER extraction models was compared with other state-of-the-art RE systems. The encouraging results seem to demonstrate the effectiveness of the proposed solution.
Temporal information extraction plays a critical role in natural language understanding. Previous systems have incorporated advanced neural language models and have successfully enhanced the accuracy of temporal information extraction tasks. However, these systems have two major shortcomings. First, they fail to make use of the two-sided nature of temporal relations in prediction. Second, they involve non-parallelizable pipelines in inference process that bring little performance gain. To this end, we propose a novel temporal information extraction model based on deep biaffine attention to extract temporal relationships between events in unstructured text efficiently and accurately. Our model is performant because we perform relation extraction tasks directly instead of considering event annotation as a prerequisite of relation extraction. Moreover, our architecture uses Multilayer Perceptrons (MLP) with biaffine attention to predict arcs and relation labels separately, improving relation detecting accuracy by exploiting the two-sided nature of temporal relationships. We experimentally demonstrate that our model achieves state-of-the-art performance in temporal relation extraction.
In recent years, researchers have become increasingly interested in speaker extraction (SE), which is the task of extracting the speech of a target speaker from a mixture of interfering speakers with the help of auxiliary information about the target speaker. Several forms of auxiliary information have been employed in single-channel SE, such as a speech snippet enrolled from the target speaker or visual information corresponding to the spoken utterance. Many SE studies have reported performance improvement compared to speaker separation (SS) methods with oracle selection, arguing that this is due to the use of auxiliary information. However, such works have not considered state-of-the-art SS methods that have shown impressive separation performance. In this paper, we revise and examine the role of the auxiliary information in SE. Specifically, we compare the performance of two SE systems (audio-based and video-based) with SS using a common framework that utilizes the state-of-the-art dual-path recurrent neural network as the main learning machine. In addition, we study how much the considered SE systems rely on the auxiliary information by analyzing the systems' output for random auxiliary signals. Experimental evaluation on various datasets suggests that the main purpose of the auxiliary information in the considered SE systems is only to specify the target speaker in the mixture and that it does not provide consistent extraction performance gain when compared to the uninformed SS system.
A key challenge of online news recommendation is to help users find articles they are interested in. Traditional news recommendation methods usually use single news information, which is insufficient to encode news and user representation. Recent research uses multiple channel news information, e.g., title, category, and body, to enhance news and user representation. However, these methods only use various attention mechanisms to fuse multi-view embeddings without considering deep digging higher-level information contained in the context. These methods encode news content on the word level and jointly train the attention parameters in the recommendation network, leading to more corpora being required to train the model. We propose an Event Extraction-based News Recommendation (EENR) framework to overcome these shortcomings, utilizing event extraction to abstract higher-level information. EENR also uses a two-stage strategy to reduce parameters in subsequent parts of the recommendation network. We train the Event Extraction module by external corpora in the first stage and apply the trained model to the news recommendation dataset to predict event-level information, including event types, roles, and arguments, in the second stage. Then we fuse multiple channel information, including event information, news title, and category, to encode news and users. Extensive experiments on a real-world dataset show that our EENR method can effectively improve the performance of news recommendations. Finally, we also explore the reasonability of utilizing higher abstract level information to substitute news body content.
When creating an outfit, style is a criterion in selecting each fashion item. This means that style can be regarded as a feature of the overall outfit. However, in various previous studies on outfit generation, there have been few methods focusing on global information obtained from an outfit. To address this deficiency, we have incorporated an unsupervised style extraction module into a model to learn outfits. Using the style information of an outfit as a whole, the proposed model succeeded in generating outfits more flexibly without requiring additional information. Moreover, the style information extracted by the proposed model is easy to interpret. The proposed model was evaluated on two human-generated outfit datasets. In a fashion item prediction task (missing prediction task), the proposed model outperformed a baseline method. In a style extraction task, the proposed model extracted some easily distinguishable styles. In an outfit generation task, the proposed model generated an outfit while controlling its styles. This capability allows us to generate fashionable outfits according to various preferences.
Dependency trees convey rich structural information that is proven useful for extracting relations among entities in text. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenging research question. Existing approaches employing rule based hard-pruning strategies for selecting relevant partial dependency structures may not always yield optimal results. In this work, we propose Attention Guided Graph Convolutional Networks (AGGCNs), a novel model which directly takes full dependency trees as inputs. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task. Extensive results on various tasks including cross-sentence n-ary relation extraction and large-scale sentence-level relation extraction show that our model is able to better leverage the structural information of the full dependency trees, giving significantly better results than previous approaches.