GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary
and Nabil Hathout
Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference
Iztok Kosem, Miloš Jakubíček, Jelena Kallas, Simon Krek
[ PDF article ]
F. Sajous and N. Hathout (2015).
GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary.
Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, pp. 405-426, Herstmonceux, England.
[ .bib ]
French Machine-Readable Dictionary, Free Lexical Resource, Wiktionary, Wiktionnaire
This article introduces GLAWI, a large XML-encoded machine-readable dictionary automatically
extracted from Wiktionnaire, the French edition of Wiktionary. GLAWI contains 1,341,410
articles and is released under a free license. Besides the size of its headword list, GLAWI inherits
from Wiktionnaire its original macrostructure and the richness of its lexicographic descriptions:
articles contain etymologies, definitions, usage examples, inflectional paradigms, lexical relations
and phonemic transcriptions. The paper first gives some insights on the nature and content
of Wiktionnaire, with a particular focus on its encoding format, before presenting our approach,
the standardization of its microstructure and the conversion into XML. First intended to meet
NLP needs, GLAWI has been used to create a number of customized lexicons dedicated to specific
uses including linguistic description and psycholinguistics. The main one is GLÀFF, a large
inflectional and phonological lexicon of French. We show that many more specific on demand
lexicons can be easily derived from the large body of lexical knowledge encoded in GLAWI.