This article reports the evaluation of the integration of data from a syntactic-semantic lexicon, the Lexicon-Grammar of French, into a syntactic parser. We show that by changing the set of labels for verbs and predicational nouns, we can improve the performance on French of a non-lexicalized probabilistic parser.
Lexicon-Grammar tables constitute a large-coverage syntactic lexicon but they cannot be directly used in Natural Language Processing (NLP) applications because they sometimes rely on implicit information. In this paper, we introduce LGExtract, a generic tool for generating a syntactic lexicon for NLP from the Lexicon-Grammar tables. It is based on a global table that contains undefined information and on a unique extraction script including all operations to be performed for all tables. We also present an experiment that has been conducted to generate a new lexicon of French verbs and predicative nouns.
The Outilex software platform, which will be made available to research, development and industry, comprises software components implementing all the fundamental operations of written text processing: processing without lexicons, exploitation of lexicons and grammars, language resource management. All data are structured in XML formats, and also in more compact formats, either readable or binary, whenever necessary; the required format converters are included in the platform; the grammar formats allow for combining statistical approaches with resource-based approaches. Manually constructed lexicons for French and English, originating from the LADL, and of substantial coverage, will be distributed with the platform under LGPL-LR license.