five

Research on Level-4 Lossless Knowledge Graphs and Vector Datasets for NATO Alternative Analysis

收藏
DataONE2025-11-20 更新2025-11-29 收录
下载链接:
https://search.dataone.org/view/sha256:d12f047b23dfce48b6eb6eb2652ee598b3d84a510f9881e2cf7d56adcf04e7c1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset constructs a multi-level, fine-grained, lossless knowledge graph and its accompanying vector data files based on the full text of a monograph. Firstly, the entire book's content is systematically decomposed and encoded according to a hierarchical structure of ‘chapter–paragraph–sentence–keyword’. Within the CSV-formatted knowledge graph, each chapter title, paragraph position, sentence text, along with its core concepts and keywords, is fully preserved. Concurrently, each image and table within the book is individually catalogued, with precise annotation of page numbers, corresponding textual descriptions, and key information/data fields contained therein. This ensures lossless representation of textual, visual, and tabular information at the structural level, guaranteeing no loss of individual entries. Building upon this foundation, the dataset concurrently generates a JSON-formatted vector data file. This file performs embedding computations on each record within the knowledge graph, associating each structured knowledge entry in the CSV with its corresponding vector representation in JSON via a unified unique identifier (ID). This enables researchers within the Harvard Dataverse environment to perform graph retrieval, semantic vector retrieval, and advanced analytical tasks such as RAG/GraphRAG. This provides a high-precision, reproducible foundational dataset for subsequent text mining, knowledge discovery, and visualisation modelling.
创建时间:
2025-11-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作