Hybrid Method for Stress Prediction Applied to GLAFF-IT, a Large-Scale Italian Lexicon
and Nabil Hathout
Language, Data, and Knowledge (LDK 2017)
Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S
Lecture Notes in Computer Science
B. Calderone, M. Pascoli, F. Sajous and N. Hathout (2017).
Hybrid Method for Stress Prediction Applied to GLAFF-IT, a Large-Scale Italian Lexicon.
In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds)
Language, Data, and Knowledge. LDK 2017, Lecture Notes in Computer Science, vol 10318, pp. 26-41. Springer, Cham.
[ Authors version ]
Italian stress prediction, phonological transcriptions, free large-scale lexicon, Wiktionary, Wikizionario
This paper presents a hybrid method for automatic stress prediction
that we apply to GLAFF-IT, a large-scale Italian lexicon we extracted from GLAW-IT,
a Machine-Readable Dictionary grounded on Wikizionario. Our approach
combines heuristic rules and a logistic model trained on the words’ sets of phonological features.
This model reaches a 98.1% accuracy. The resulting resource is
a large lexicon for the Italian language that we release under a free licence.
It includes morphological and phonological information for each of its 457,702
entries. As of today, it is the only Italian lexicon featuring both large coverage
and indication of stress position.
[ .bib ]