GETALP/FLUE_WSD
收藏数据集概述
数据集名称
Word Sense Disambiguation for FLUE
数据集描述
该数据集包含三个子数据集:FrenchSemEval-Task12, French WNGT, 以及SemCor的自动翻译版本。主要用于法语的词义消歧任务。
语言
法语
许可信息
GNU Lesser General Public License
数据集特征
- document_id: 字符串类型
- sentence: 字符串类型
- sentence_label: 字符串类型
- sentence_first_label: 字符串类型
- surface_forms: 字符串序列
- labels: 字符串序列
- first_labels: 字符串序列
- word_id: 字符串序列
- scores: 字符串序列
- lemmas: 字符串序列
- pos: 字符串序列
数据集分割
- SemCor: 37176个样本,大小为71632913字节
- SemEval: 306个样本,大小为749832字节
- WNGT: 117659个样本,大小为206691837字节
下载大小
41831981字节
数据集大小
279074582字节
引用信息
bibtex @inproceedings{vial-etal-2019-sense, title = "Sense Vocabulary Compression through the Semantic Knowledge of {W}ord{N}et for Neural Word Sense Disambiguation", author = {Vial, Lo{"i}c and Lecouteux, Benjamin and Schwab, Didier}, booktitle = "Proceedings of the 10th Global Wordnet Conference", month = jul, year = "2019", address = "Wroclaw, Poland", publisher = "Global Wordnet Association", url = "https://aclanthology.org/2019.gwc-1.14", pages = "108--117", abstract = "In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduce the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our methods, we present a WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperforms the state of the art on all WSD evaluation tasks.", }
贡献者
- loic.vial@univ-grenoble-alpes.fr
- benjamin.lecouteux@univ-grenoble-alpes.fr
- didier.schwab@univ-grenoble-alpes.fr




