five

WN18

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://figshare.com/articles/WN18/11869548/2
下载链接
链接失效反馈
官方服务:
资源简介:
This WORDNET TENSOR DATA consists of a collection of triplets (synset, relation_type, triplet) extracted from WordNet 3.0 (http://wordnet.princeton.edu). This data set can be seen as a 3-mode tensor depicting ternary relationships between synsets. The definitions file (wordnet-mlj12-definitions.txt) contains one synset per line with the following format: synset_id (a 8-digit unique identifier) intelligible name (word+POS_tag+sense_index), definition. The previous 3 pieces of information are separated by a tab ('\t'). All wordnet-mlj12-*.txt files contain one triplet per line, with 2 synset_ids and relation type identifier in a tab separated format. The first element is the synset_id of the left hand side of the relation triple, the third one is the synset_id of the right hand side and the second element is the name of the type of relations between them. There are 40,943 synsets and 18 relation types among them. The training set contains 141,442 triplets, the validation set 5,000 and the test set 5,000. All triplets are unique and we made sure that all synsets appearing in the validation or test sets were occurring in the training set. The WN18.zip file contains the other files, with more compression than the default "download all".

本词网张量数据集(WORDNET TENSOR DATA)由从WordNet 3.0(http://wordnet.princeton.edu)中提取的三元组集合构成,每个三元组包含两个同义词集(synset)与一个关系类型(relation_type)。该数据集可被视作一张刻画同义词集之间三元关系的三维张量(3-mode tensor)。定义文件wordnet-mlj12-definitions.txt每行对应一个同义词集,格式为:同义词集ID(8位唯一标识符)、可读名称(单词+词性标签+义项索引)、释义。前述三部分信息以制表符( )分隔。所有以wordnet-mlj12-*.txt命名的文件每行存储一个三元组,以制表符分隔格式排列两个同义词集ID与关系类型标识符。其中,第一个元素为关系三元组左侧的同义词集ID,第三个元素为右侧的同义词集ID,第二个元素则为二者间的关系类型名称。本数据集共包含40943个同义词集与18种关系类型。训练集包含141442个三元组,验证集与测试集各含5000个三元组。所有三元组均唯一,且我们确保了验证集或测试集中出现的所有同义词集均曾在训练集中出现过。WN18.zip压缩包包含其余文件,其压缩率高于默认的"全部下载"选项。
创建时间:
2024-01-31
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作