Looking for French deverbal nouns in an evolving Web
(a short history of WAC)

Nabil Hathout, Franck Sajous and Ludovic Tanguy 2009 Proceedings of the 5th Workshop on Web As Corpus (WAC5) San Sebastian, Spain pp. 37-44 PDF article ] N. Hathout, F. Sajous and L. Tanguy (2009). Looking for French deverbal nouns in an evolving Web (a short history of WAC). Proceedings of the 5th Workshop on Web As Corpus (WAC5), pp. 37-44, San Sebastian, Spain. .bib ] This papers describes an 8-year-long research effort for automatically collecting new French deverbal nouns on the Web. The goal has remained the same: building an extensive and cumulative list of noun-verb pairs where the noun denotes the action expressed by the verb (e.g. production-produce). This list is used for both linguistic research and for NLP applications. The initial method consisted in taking advantage of the former Altavista search engine, allowing for a direct access to unknown word forms. The second technique led us to develop a specific crawler, which raised a number of technical difficulties. In the third experiment, we use a collection of web pages made available to us by a commercial search engine. Through all these stages, the general method has remained the same, and the results are similar and cumulative, although the technical environment has greatly evolved.