research-backup/conceptnet_high_confidence
收藏Hugging Face2022-09-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/research-backup/conceptnet_high_confidence
下载链接
链接失效反馈官方服务:
资源简介:
ConceptNet高置信度子集是一个专门用于微调RelBERT模型的数据集。该数据集从ConceptNet中精选出高置信度的数据,包含多种关系类型,如位于位置、能够等,每个关系类型都配有正负样本对。数据集结构清晰,分为训练集和验证集,详细记录了每个关系类型在不同数据集中的正负样本数量。
language:
- 英语
license:
- 其他
multilinguality:
- 单语言
size_categories:
- 1000 < 样本量 < 10000
pretty_name: 高置信度概念网络(ConceptNet)
---
# 数据集卡片:「relbert/conceptnet_high_confidence」
## 数据集说明
- **代码仓库**:[RelBERT](https://github.com/asahi417/relbert)
- **相关论文**:[https://home.ttic.edu/~kgimpel/commonsense.html](https://home.ttic.edu/~kgimpel/commonsense.html)
- **数据集简介**:概念网络(ConceptNet)的高置信度子集
### 数据集总览
本数据集为[此项研究](https://home.ttic.edu/~kgimpel/commonsense.html)中采用的ConceptNet精选子集,用于微调RelBERT模型。
## 数据集结构
### 数据实例
训练集(train)的示例格式如下:
{
"relation_type": "AtLocation",
"positives": [["fish", "water"], ["cloud", "sky"], ["child", "school"], ... ],
"negatives": [["pen", "write"], ["sex", "fun"], ["soccer", "sport"], ["fish", "school"], ... ]
}
### 数据划分
| 数据集名称 | 训练集 | 验证集 |
|---------------------------|-------:|--------:|
| `conceptnet_high_confidence` | 25 | 24 |
### 各划分下正负词对数量
| 关系类型 | 训练集正样本数 | 训练集负样本数 | 验证集正样本数 | 验证集负样本数 |
|:-----------------|-------------------:|-------------------:|------------------------:|------------------------:|
| AtLocation | 383 | 1768 | 97 | 578 |
| CapableOf | 195 | 1790 | 73 | 600 |
| Causes | 71 | 1797 | 26 | 595 |
| CausesDesire | 9 | 1793 | 11 | 595 |
| CreatedBy | 2 | 1796 | 0 | 0 |
| DefinedAs | 0 | 0 | 2 | 595 |
| Desires | 16 | 1794 | 12 | 595 |
| HasA | 67 | 1814 | 17 | 595 |
| HasFirstSubevent | 2 | 1796 | 0 | 0 |
| HasLastSubevent | 2 | 1796 | 3 | 593 |
| HasPrerequisite | 168 | 1803 | 57 | 592 |
| HasProperty | 94 | 1801 | 39 | 605 |
| HasSubevent | 125 | 1798 | 40 | 609 |
| IsA | 310 | 1764 | 98 | 603 |
| MadeOf | 17 | 1793 | 7 | 593 |
| MotivatedByGoal | 14 | 1796 | 11 | 595 |
| NotCapableOf | 15 | 1793 | 0 | 0 |
| NotDesires | 4 | 1795 | 4 | 592 |
| PartOf | 34 | 1801 | 7 | 593 |
| ReceivesAction | 18 | 1793 | 8 | 593 |
| SymbolOf | 0 | 0 | 2 | 596 |
| UsedFor | 249 | 1815 | 81 | 588 |
| SUM | 1795 | 35896 | 595 | 11305 |
### 引用信息
@InProceedings{P16-1137,
author = "Li, Xiang
and Taheri, Aynaz
and Tu, Lifu
and Gimpel, Kevin",
title = "Commonsense Knowledge Base Completion",
booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2016",
publisher = "Association for Computational Linguistics",
pages = "1445--1455",
location = "Berlin, Germany",
doi = "10.18653/v1/P16-1137",
url = "http://aclweb.org/anthology/P16-1137"
}
提供机构:
research-backup
原始信息汇总
数据集概述
数据集基本信息
- 名称: ConceptNet with High Confidence
- 语言: 英语
- 许可证: 其他
- 多语言性: 单语
- 大小: 1K<n<10K
数据集描述
- 来源: 用于RelBERT模型的微调
- 数据集类型: ConceptNet的高置信度子集
数据集结构
数据实例
json { "relation_type": "AtLocation", "positives": [["fish", "water"], ["cloud", "sky"], ["child", "school"], ... ], "negatives": [["pen", "write"], ["sex", "fun"], ["soccer", "sport"], ["fish", "school"], ... ] }
数据分割
| 名称 | 训练 | 验证 |
|---|---|---|
| conceptnet_high_confidence | 25 | 24 |
正负词对数量
| relation_type | 正例 (训练) | 负例 (训练) | 正例 (验证) | 负例 (验证) |
|---|---|---|---|---|
| AtLocation | 383 | 1768 | 97 | 578 |
| CapableOf | 195 | 1790 | 73 | 600 |
| Causes | 71 | 1797 | 26 | 595 |
| CausesDesire | 9 | 1793 | 11 | 595 |
| CreatedBy | 2 | 1796 | 0 | 0 |
| DefinedAs | 0 | 0 | 2 | 595 |
| Desires | 16 | 1794 | 12 | 595 |
| HasA | 67 | 1814 | 17 | 595 |
| HasFirstSubevent | 2 | 1796 | 0 | 0 |
| HasLastSubevent | 2 | 1796 | 3 | 593 |
| HasPrerequisite | 168 | 1803 | 57 | 592 |
| HasProperty | 94 | 1801 | 39 | 605 |
| HasSubevent | 125 | 1798 | 40 | 609 |
| IsA | 310 | 1764 | 98 | 603 |
| MadeOf | 17 | 1793 | 7 | 593 |
| MotivatedByGoal | 14 | 1796 | 11 | 595 |
| NotCapableOf | 15 | 1793 | 0 | 0 |
| NotDesires | 4 | 1795 | 4 | 592 |
| PartOf | 34 | 1801 | 7 | 593 |
| ReceivesAction | 18 | 1793 | 8 | 593 |
| SymbolOf | 0 | 0 | 2 | 596 |
| UsedFor | 249 | 1815 | 81 | 588 |
| SUM | 1795 | 35896 | 595 | 11305 |
引用信息
@InProceedings{P16-1137, author = "Li, Xiang and Taheri, Aynaz and Tu, Lifu and Gimpel, Kevin", title = "Commonsense Knowledge Base Completion", booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ", year = "2016", publisher = "Association for Computational Linguistics", pages = "1445--1455", location = "Berlin, Germany", doi = "10.18653/v1/P16-1137", url = "http://aclweb.org/anthology/P16-1137" }



