five

research-backup/conceptnet_high_confidence

收藏
Hugging Face2022-09-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/research-backup/conceptnet_high_confidence
下载链接
链接失效反馈
官方服务:
资源简介:
ConceptNet高置信度子集是一个专门用于微调RelBERT模型的数据集。该数据集从ConceptNet中精选出高置信度的数据,包含多种关系类型,如位于位置、能够等,每个关系类型都配有正负样本对。数据集结构清晰,分为训练集和验证集,详细记录了每个关系类型在不同数据集中的正负样本数量。

language: - 英语 license: - 其他 multilinguality: - 单语言 size_categories: - 1000 < 样本量 < 10000 pretty_name: 高置信度概念网络(ConceptNet) --- # 数据集卡片:「relbert/conceptnet_high_confidence」 ## 数据集说明 - **代码仓库**:[RelBERT](https://github.com/asahi417/relbert) - **相关论文**:[https://home.ttic.edu/~kgimpel/commonsense.html](https://home.ttic.edu/~kgimpel/commonsense.html) - **数据集简介**:概念网络(ConceptNet)的高置信度子集 ### 数据集总览 本数据集为[此项研究](https://home.ttic.edu/~kgimpel/commonsense.html)中采用的ConceptNet精选子集,用于微调RelBERT模型。 ## 数据集结构 ### 数据实例 训练集(train)的示例格式如下: { "relation_type": "AtLocation", "positives": [["fish", "water"], ["cloud", "sky"], ["child", "school"], ... ], "negatives": [["pen", "write"], ["sex", "fun"], ["soccer", "sport"], ["fish", "school"], ... ] } ### 数据划分 | 数据集名称 | 训练集 | 验证集 | |---------------------------|-------:|--------:| | `conceptnet_high_confidence` | 25 | 24 | ### 各划分下正负词对数量 | 关系类型 | 训练集正样本数 | 训练集负样本数 | 验证集正样本数 | 验证集负样本数 | |:-----------------|-------------------:|-------------------:|------------------------:|------------------------:| | AtLocation | 383 | 1768 | 97 | 578 | | CapableOf | 195 | 1790 | 73 | 600 | | Causes | 71 | 1797 | 26 | 595 | | CausesDesire | 9 | 1793 | 11 | 595 | | CreatedBy | 2 | 1796 | 0 | 0 | | DefinedAs | 0 | 0 | 2 | 595 | | Desires | 16 | 1794 | 12 | 595 | | HasA | 67 | 1814 | 17 | 595 | | HasFirstSubevent | 2 | 1796 | 0 | 0 | | HasLastSubevent | 2 | 1796 | 3 | 593 | | HasPrerequisite | 168 | 1803 | 57 | 592 | | HasProperty | 94 | 1801 | 39 | 605 | | HasSubevent | 125 | 1798 | 40 | 609 | | IsA | 310 | 1764 | 98 | 603 | | MadeOf | 17 | 1793 | 7 | 593 | | MotivatedByGoal | 14 | 1796 | 11 | 595 | | NotCapableOf | 15 | 1793 | 0 | 0 | | NotDesires | 4 | 1795 | 4 | 592 | | PartOf | 34 | 1801 | 7 | 593 | | ReceivesAction | 18 | 1793 | 8 | 593 | | SymbolOf | 0 | 0 | 2 | 596 | | UsedFor | 249 | 1815 | 81 | 588 | | SUM | 1795 | 35896 | 595 | 11305 | ### 引用信息 @InProceedings{P16-1137, author = "Li, Xiang and Taheri, Aynaz and Tu, Lifu and Gimpel, Kevin", title = "Commonsense Knowledge Base Completion", booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", year = "2016", publisher = "Association for Computational Linguistics", pages = "1445--1455", location = "Berlin, Germany", doi = "10.18653/v1/P16-1137", url = "http://aclweb.org/anthology/P16-1137" }
提供机构:
research-backup
原始信息汇总

数据集概述

数据集基本信息

  • 名称: ConceptNet with High Confidence
  • 语言: 英语
  • 许可证: 其他
  • 多语言性: 单语
  • 大小: 1K<n<10K

数据集描述

  • 来源: 用于RelBERT模型的微调
  • 数据集类型: ConceptNet的高置信度子集

数据集结构

数据实例

json { "relation_type": "AtLocation", "positives": [["fish", "water"], ["cloud", "sky"], ["child", "school"], ... ], "negatives": [["pen", "write"], ["sex", "fun"], ["soccer", "sport"], ["fish", "school"], ... ] }

数据分割

名称 训练 验证
conceptnet_high_confidence 25 24

正负词对数量

relation_type 正例 (训练) 负例 (训练) 正例 (验证) 负例 (验证)
AtLocation 383 1768 97 578
CapableOf 195 1790 73 600
Causes 71 1797 26 595
CausesDesire 9 1793 11 595
CreatedBy 2 1796 0 0
DefinedAs 0 0 2 595
Desires 16 1794 12 595
HasA 67 1814 17 595
HasFirstSubevent 2 1796 0 0
HasLastSubevent 2 1796 3 593
HasPrerequisite 168 1803 57 592
HasProperty 94 1801 39 605
HasSubevent 125 1798 40 609
IsA 310 1764 98 603
MadeOf 17 1793 7 593
MotivatedByGoal 14 1796 11 595
NotCapableOf 15 1793 0 0
NotDesires 4 1795 4 592
PartOf 34 1801 7 593
ReceivesAction 18 1793 8 593
SymbolOf 0 0 2 596
UsedFor 249 1815 81 588
SUM 1795 35896 595 11305

引用信息

@InProceedings{P16-1137, author = "Li, Xiang and Taheri, Aynaz and Tu, Lifu and Gimpel, Kevin", title = "Commonsense Knowledge Base Completion", booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ", year = "2016", publisher = "Association for Computational Linguistics", pages = "1445--1455", location = "Berlin, Germany", doi = "10.18653/v1/P16-1137", url = "http://aclweb.org/anthology/P16-1137" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作