Dataset of the Wiktionary and NLP: Improving synonymy networks paper
Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
ACL-IJCNLP 2009, Singapore

We provide in the archive (to be downloaded below) the dataset used to compare the English Wiktionary against Princeton WordNet. We restricted our study to nouns, verbs and adjectives. You will find, for each part of speech, files containing:

Vertices
  • Wiktionary's entries ;
  • PWN's entries ;
  • entries shared by Wiktionary and PWN ;
  • disjunctions (entries which are in PWN and not in Wiktionary and vice versa).

Links
  • Wiktionary's synonymy relations ;
  • PWN's synonymy relations ;
  • synonymy relations shared by Wiktionary and PWN ;
  • disjunctions (synonymy relations which are in PWN and not in Wiktionary and vice versa).

Links files have been built after resources have been pruned by selecting only vertices shared by both Wiktionary and PWN.


Licences

Wiktionary is now available under the Creative Commons Attribution/Share-Alike License.
WorNet's license is here.
The dataset is available under the Creative Commons Attribution/Share-Alike License.

Paper
The paper is available here (PDF).
bibtex ]

Download
Download the dataset (.tar.bz 3.2MB)

Contact
For any question or remark, please feel free to contact Franck Sajous :
Depending on the topic, I will forward the message if necessary to the suitable author.

Dataset is available under the Creative Commons Attribution/Share-Alike License.

Creative Commons Attribution/Share-Alike