relbert/nell_relational_similarity

Name: relbert/nell_relational_similarity
Creator: relbert
Published: 2023-03-10 11:18:11
License: 暂无描述

Hugging Face2023-03-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/relbert/nell_relational_similarity

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: - other multilinguality: - monolingual size_categories: - n<1K pretty_name: Relational similarity dataset based on the NELL-one --- # Dataset Card for "relbert/nell_relation_similarity" ## Dataset Description - **Repository:** [RelBERT](https://github.com/asahi417/relbert) - **Paper:** [https://aclanthology.org/D18-1223/](https://aclanthology.org/D18-1223/) - **Dataset:** Relational similarity dataset based on the NELL-one ### Dataset Summary [NELL-one](https://huggingface.co/datasets/relbert/nell) cleaned dataset compiled for relational similarity. ## Dataset Structure ### Data Instances An example of `test` looks as follows. ```shell { "relation_type": "concept:automobilemakerdealersincity", "positives": [["Lexus", "Dallas"], ["Buick", "Columbus"], ..., "negatives": []} } ``` ### Data Splits | train |validation| test| |--------:|---------:|---------:| | 30| 3 | 5 | ### Citation Information ``` @inproceedings{xiong-etal-2018-one, title = "One-Shot Relational Learning for Knowledge Graphs", author = "Xiong, Wenhan and Yu, Mo and Chang, Shiyu and Guo, Xiaoxiao and Wang, William Yang", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D18-1223", doi = "10.18653/v1/D18-1223", pages = "1980--1990", abstract = "Knowledge graphs (KG) are the key components of various natural language processing applications. To further expand KGs{'} coverage, previous studies on knowledge graph completion usually require a large number of positive examples for each relation. However, we observe long-tail relations are actually more common in KGs and those newly added relations often do not have many known triples for training. In this work, we aim at predicting new facts under a challenging setting where only one training instance is available. We propose a one-shot relational learning framework, which utilizes the knowledge distilled by embedding models and learns a matching metric by considering both the learned embeddings and one-hop graph structures. Empirically, our model yields considerable performance improvements over existing embedding models, and also eliminates the need of re-training the embedding models when dealing with newly added relations.", } ```

语言： - 英语许可协议： - 其他多语言属性： - 单语言样本规模类别： - 样本数小于1000 美观名称：基于NELL-one的关系相似度数据集 --- # 数据集卡片 "relbert/nell_relation_similarity" ## 数据集描述 - **仓库地址**：[RelBERT](https://github.com/asahi417/relbert) - **相关论文**：[https://aclanthology.org/D18-1223/](https://aclanthology.org/D18-1223/) - **数据集**：基于NELL-one的关系相似度数据集 ### 数据集概述 [NELL-one](https://huggingface.co/datasets/relbert/nell) 经清洗后的数据集，专为关系相似度任务构建。 ## 数据集结构 ### 数据实例测试集（test）的单条数据示例如下： shell { "relation_type": "concept:automobilemakerdealersincity", "positives": [["雷克萨斯", "达拉斯"], ["别克", "哥伦布"], ..., "negatives": []} ### 数据划分 | 训练集 | 验证集 | 测试集 | |-------:|-------:|-------:| | 30 | 3 | 5 | ### 引用信息 bibtex @inproceedings{xiong-etal-2018-one, title = "面向知识图谱的单样本关系学习", author = "Xiong, Wenhan and Yu, Mo and Chang, Shiyu and Guo, Xiaoxiao and Wang, William Yang", booktitle = "2018年自然语言处理经验方法会议论文集", month = "10月-11月", year = "2018", address = "比利时布鲁塞尔", publisher = "计算语言学协会", url = "https://aclanthology.org/D18-1223/", doi = "10.18653/v1/D18-1223", pages = "1980--1990", abstract = "知识图谱（KG）是各类自然语言处理应用的核心组件。为进一步拓展知识图谱的覆盖范围，此前的知识图谱补全研究通常需要为每种关系获取大量正样本。但我们观察到，长尾关系在知识图谱中实则更为常见，且新增关系往往仅拥有少量可用的训练三元组。本研究旨在在仅提供单个训练实例的挑战性设置下预测新事实。我们提出一种单样本关系学习框架，该框架利用嵌入模型所蒸馏得到的知识，并同时考量习得的嵌入与单跳图结构来学习匹配度量。实验结果表明，我们的模型相较现有嵌入模型取得了显著的性能提升，同时也消除了处理新增关系时重新训练嵌入模型的需求。", }

提供机构：

relbert

原始信息汇总

数据集卡片 for "relbert/nell_relation_similarity"

数据集描述

数据集名称: Relational similarity dataset based on the NELL-one
数据集摘要: NELL-one 清理后的数据集，用于关系相似性分析。

数据集结构

数据实例

一个 test 示例如下： shell { "relation_type": "concept:automobilemakerdealersincity", "positives": [["Lexus", "Dallas"], ["Buick", "Columbus"], ..., "negatives": []} }

数据分割

train	validation	test
30	3	5

引用信息

@inproceedings{xiong-etal-2018-one, title = "One-Shot Relational Learning for Knowledge Graphs", author = "Xiong, Wenhan and Yu, Mo and Chang, Shiyu and Guo, Xiaoxiao and Wang, William Yang", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D18-1223", doi = "10.18653/v1/D18-1223", pages = "1980--1990", abstract = "Knowledge graphs (KG) are the key components of various natural language processing applications. To further expand KGs{} coverage, previous studies on knowledge graph completion usually require a large number of positive examples for each relation. However, we observe long-tail relations are actually more common in KGs and those newly added relations often do not have many known triples for training. In this work, we aim at predicting new facts under a challenging setting where only one training instance is available. We propose a one-shot relational learning framework, which utilizes the knowledge distilled by embedding models and learns a matching metric by considering both the learned embeddings and one-hop graph structures. Empirically, our model yields considerable performance improvements over existing embedding models, and also eliminates the need of re-training the embedding models when dealing with newly added relations.", }

搜集汇总

数据集介绍

构建方式

在知识图谱补全领域，数据稀缺性常制约模型性能，尤其针对长尾关系。本数据集基于NELL-one知识库构建，通过精心筛选与清洗，提取了关系类型及其对应的正例实体对。构建过程中，研究者从原始NELL数据中保留了核心关系实例，剔除了噪声与冗余信息，确保了数据质量。该过程注重关系表示的纯粹性，为后续的相似性计算奠定了可靠基础。

特点

本数据集聚焦于关系相似性评估，其核心特征在于高度精简的规模与结构化表示。数据实例以关系类型为单位组织，每个实例包含明确的正例实体对列表，形式简洁而内涵丰富。数据划分极为紧凑，训练集、验证集与测试集分别仅含30、3与5个实例，这种设计旨在模拟极端低资源场景下的模型泛化能力。数据格式清晰，便于直接应用于关系嵌入或匹配任务，体现了小样本学习场景下的典型挑战。

使用方法

在关系表示学习研究中，本数据集适用于评估模型在极少样本下捕捉关系语义的能力。使用者可将其输入关系嵌入模型，通过对比学习或度量学习框架，训练模型区分不同关系类型的相似性。典型流程包括：利用训练集学习关系表示，在验证集调整超参数，最终在测试集评估模型对未见关系的泛化性能。该数据集尤其适合探索单样本或小样本关系推理，为知识图谱扩展研究提供基准测试平台。

背景与挑战

背景概述

在知识图谱（KG）领域，关系学习是支撑自然语言处理应用的核心任务之一。NELL-one关系相似性数据集由Wenhan Xiong等研究人员于2018年构建，旨在应对知识图谱中长尾关系普遍存在且训练实例稀缺的挑战。该数据集基于NELL知识库，专注于一次性关系学习，即仅利用单个训练实例预测新事实，推动了少样本学习在知识图谱补全中的前沿探索。其研究不仅深化了对关系表示的理解，还为处理新兴关系提供了方法论基础，对知识表示与推理领域产生了显著影响。

当前挑战

该数据集致力于解决知识图谱中一次性关系学习的核心问题，即如何在仅有一个正例的情况下准确预测新关系事实，这要求模型具备强大的泛化与迁移能力。构建过程中的挑战包括从NELL知识库中清洗和提取高质量的关系实例，确保数据在稀疏条件下的代表性与一致性，同时平衡正负样本以支持有效的相似性度量学习。这些挑战共同凸显了在有限监督下实现稳健关系推理的复杂性。

常用场景

经典使用场景

在知识图谱与自然语言处理领域，关系相似性评估是理解实体间语义关联的核心任务。relbert/nell_relational_similarity数据集通过基于NELL-one知识图谱的清洗与整理，为关系相似性计算提供了标准化的基准。该数据集典型应用于关系嵌入模型的训练与评估，尤其在少样本或单样本学习场景下，研究者利用其结构化的正负例对，能够有效度量不同关系类型在向量空间中的语义距离，从而推动知识图谱补全与关系推理技术的发展。

解决学术问题

该数据集针对知识图谱中长尾关系普遍存在、新关系标注数据稀缺的学术挑战，提供了系统化的解决方案。通过构建基于单实例的关系相似性评估框架，它使研究者能够在不依赖大量训练样本的情况下，实现对新关系的快速适应与预测。这一创新不仅缓解了传统嵌入模型对数据规模的依赖，还为少样本关系学习提供了可复现的实验环境，显著提升了知识图谱在动态扩展中的实用性与鲁棒性。

衍生相关工作

基于该数据集衍生的经典工作主要集中在少样本关系学习与知识图谱嵌入的交叉领域。例如，原论文提出的单样本关系学习框架，通过融合嵌入表示与一跳图结构信息，显著提升了新关系预测的性能。后续研究则进一步拓展了该数据集的用途，如结合元学习或图神经网络，开发出更高效的关系匹配算法。这些工作不仅深化了对关系相似性建模的理论探索，也为知识图谱的动态构建与维护提供了重要的方法论参考。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集