Levée d'ambiguïtés par grammaires locales

Abstract

Many words are ambiguous in terms of their part of speech (POS). However,when a word appears in a text, this ambiguity is generally much reduced.Disambiguating POS involves using context to reduce the number of POSassociated with words, and is one of the main challenges of lexical tagging.The problem of labeling words by POS frequently arises in natural languageprocessing, for example for spelling correction, grammar or style checking,expression recognition, text-to-speech conversion, text corpus analysis, etc.Lexical tagging systems are thus useful as an initial component of many naturallanguage processing systems. A number of recent lexical tagging systems producemultiple solutions when the text is lexically ambiguous or the uniquely correctsolution cannot be found. These contributions aim to guarantee a zero silencerate: the correct tag(s) for a word must never be discarded. This objective isunrealistic for systems that tag each word uniquely. This article concerns alexical disambiguation method adapted to the objective of a zero silence rateand implemented in Silberztein's INTEX system (1993). We present here a formaldescription of this method. We show that to verify a local disambiguationgrammar in this framework, it is not sufficient to consider the transducerpaths separately: one needs to verify their interactions. Similarly, if acombination of multiple transducers is used, the result cannot be predicted byconsidering them in isolation. Furthermore, when examining the initial labelingof a text as produced by INTEX, ideas for disambiguation rules comespontaneously, but grammatical intuitions may turn out to be inaccurate, oftendue to an unforeseen construction or ambiguity. If a zero silence rate istargeted, local grammars must be carefully tested. This is where a detailedspecification of what a grammar will do once applied to texts would benecessary.

Quick Read (beta)

loading the full paper ...