Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Thematic Annotation: extracting concepts out of documents

Dec 30, 2004

Pierre Andrews, Martin Rajman

Figure 1 for Thematic Annotation: extracting concepts out of documents

Figure 2 for Thematic Annotation: extracting concepts out of documents

Figure 3 for Thematic Annotation: extracting concepts out of documents

Figure 4 for Thematic Annotation: extracting concepts out of documents

Share this with someone who'll enjoy it:

Abstract:Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.

* Technical report EPFL/LIA. 81 pages, 16 figures

View paper on

Share this with someone who'll enjoy it:

Title:Thematic Annotation: extracting concepts out of documents

Paper and Code