文本知识图谱数据集

Name: 文本知识图谱数据集
Creator: 哈尔滨工业大学
License: 暂无描述

国家基础学科公共科学数据中心2024-03-05 收录

下载链接：

https://www.nbsdc.cn/general/dataDetail?id=64edc830bb16e07753c35170&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

文本知识图谱数据集要面向课题一“自动化知识发现与图谱构建”的知识图谱构建技术研究，并提供知识图谱数据集。知识图谱提供单位为哈尔滨工业大学，图谱中包含实体30,102,845 (三千万)，上位词182,079（十八万），优质的实体上下位关系对15,577,846（一千五百万对），关系三元组79,568,791（七千九百万对），关系（属性）数436,961（四十三万），图谱中数据来源于多来源数据的挖掘，采用的算法包括命名实体识别、关系挖掘、属性抽取等。。

This text-based knowledge graph dataset is tailored for the research on knowledge graph construction technologies under Project 1: "Automated Knowledge Discovery and Graph Construction", and is provided as a supporting dataset for relevant studies. Developed by Harbin Institute of Technology (HIT), the knowledge graph contains 30,102,845 entities (approximately 30 million), 182,079 hypernyms (approximately 180 thousand), 15,577,846 high-quality entity hypernym-hyponym relation pairs (approximately 15.5 million), 79,568,791 relation triplets (approximately 79.5 million), and 436,961 relations (attributes) (approximately 437 thousand). The data of the graph is sourced from multi-source data mining, and the applied algorithms include named entity recognition, relation extraction, attribute extraction, and other related natural language processing technologies.

提供机构：

哈尔滨工业大学

搜集汇总

数据集介绍

背景与挑战

背景概述

文本知识图谱数据集是一个大规模开放域知识图谱，由哈尔滨工业大学构建，包含超过3000万个实体和7900万对关系三元组，数据来源于多来源文本挖掘，采用命名实体识别、关系挖掘和属性抽取等算法，主要用于自然语言处理和人工智能领域的研究，特别是自动化知识发现与图谱构建技术。

以上内容由遇见数据集搜集并总结生成