five

Engineering-Grade High-Fidelity Five-Level Fine-Grained Knowledge Graph for Structured Intelligence Analysis Technology

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/CXAASM
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset presents an engineering-grade, high-fidelity, near-lossless five-layer knowledge graph for structured intelligence analysis, designed to support visualization in text mining, knowledge engineering, and intelligence analysis methodologies. The graph employs a multi-level hierarchical structure from documents to atomic units, with node types including document, chapter, section, paragraph, sentence, keyword, figure, table, and data_point. All nodes are linked through explicit parent-child relationships. At the content level, the dataset preserves the semantic and pagination information of the original PDFs to the greatest extent possible: mixed Chinese-English text is segmented into sentences, sentence-level keywords are extracted, and page numbers are annotated for each node. Figure nodes store captions, surrounding explanatory text, and page-level bounding box (bbox) metadata, enabling precise localization and supporting future multimodal expansion. Table nodes are expanded into cell-level records, combining row/column indices, row/column headers, and original cell text to provide a fine-grained, structured representation of tabular information. Crucially, this resource achieves engineering-grade fidelity and near-lossless preservation: within the constraints of automated PDF parsing, it requires no manual table-by-table correction or image OCR, striving to retain the original text, graphics, and tables' structure and semantics to the greatest extent possible. This knowledge graph can be utilized to construct method-level ontologies for structured analysis techniques, develop retrieval systems based on GraphRAG or vector models, train or evaluate domain-specific language models, and serve as a foundation for teaching and research in multi-level text annotation, logical structure analysis, and table data visualization.
创建时间:
2025-11-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作