TALP Research Center. LSI Dept. Universitat Politecnica de Catalunya. Barcelona




Abstract:We present a robust approach for linking already existing lexical/semantic hierarchies. We used a constraint satisfaction algorithm (relaxation labeling) to select --among a set of candidates-- the node in a target taxonomy that bests matches each node in a source taxonomy. In particular, we use it to map the nominal part of WordNet 1.5 onto WordNet 1.6, with a very high precision and a very low remaining ambiguity.




Abstract:This paper explores the automatic construction of a multilingual Lexical Knowledge Base from pre-existing lexical resources. We present a new and robust approach for linking already existing lexical/semantic hierarchies. We used a constraint satisfaction algorithm (relaxation labeling) to select --among all the candidate translations proposed by a bilingual dictionary-- the right English WordNet synset for each sense in a taxonomy automatically derived from a Spanish monolingual dictionary. Although on average, there are 15 possible WordNet connections for each sense in the taxonomy, the method achieves an accuracy over 80%. Finally, we also propose several ways in which this technique could be applied to enrich and improve existing lexical databases.




Abstract:We present a bootstrapping method to develop an annotated corpus, which is specially useful for languages with few available resources. The method is being applied to develop a corpus of Spanish of over 5Mw. The method consists on taking advantage of the collaboration of two different POS taggers. The cases in which both taggers agree present a higher accuracy and are used to retrain the taggers.


Abstract:This paper addresses the issue of {\sc pos} tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.