Looking for French deverbal nouns in an evolving Web
(a short history of WAC)
Nabil Hathout,
Franck Sajous and
Ludovic Tanguy
2009
Proceedings of the 5th Workshop on Web As Corpus (WAC5)
San Sebastian, Spain
pp. 37-44
[ PDF article ]
N. Hathout, F. Sajous and L. Tanguy (2009).
Looking for French deverbal nouns in an evolving Web (a short history of WAC).
Proceedings of the 5th Workshop on Web As Corpus (WAC5), pp. 37-44,
San Sebastian, Spain.
[ .bib ]
This papers describes an 8-year-long research effort for automatically collecting
new French deverbal nouns on the Web. The goal has remained the same: building
an extensive and cumulative list of noun-verb pairs where the noun denotes the action expressed
by the verb (e.g. production-produce). This list is used for both linguistic research and for NLP applications.
The initial method consisted in taking advantage of the former Altavista search engine,
allowing for a direct access to unknown word forms. The second technique led us to develop a specific crawler, which
raised a number of technical difficulties. In the third experiment, we use a collection of web pages made available to us by
a commercial search engine. Through all these stages, the general method has remained the same, and the results are
similar and cumulative, although the technical environment has greatly evolved.