Discourse Aware Scholarly Knowledge Graph Construction Dataset
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20055705
下载链接
链接失效反馈官方服务:
资源简介:
Description
The dataset is a manually curated benchmark for evaluating discourse-aware scholarly knowledge graph construction from scientific papers. It was created to support the development and evaluation of the Scholarly Upper Discourse Ontology (SUDO) and the SUDO-KG construction pipeline.
The dataset is derived from research papers associated with the AMSR peer-review corpus. For each paper, the dataset includes source paper artifacts, parsed text representations, reviewer-facing metadata, and manually annotated gold-standard knowledge graph annotations. The annotations cover named entity spans and classes, finite-clause-level proposition spans and classes, and context-aware relations between artifacts, propositions, and proposition pairs.
This dataset is suitable for research on:
scholarly knowledge graph construction
discourse-aware knowledge representation
scientific information extraction
ontology-guided annotation and evaluation
grounded knowledge graph generation from scientific text
evaluation of LLM-based and neuro-symbolic KGC pipeline
Folder Structure
The dataset is organized as one directory per paper. Each paper directory is named using a unique paper identifier.
```textgold_standard_dataset/v1/<paper_id>/annotation.jsonfact.jsonfacts.jsonpaper.grobid.tei.xmlpaper.mainbody.mdpaper.mdpaper.pdfreview.jsonsample.txt```
Main files:
annotation.json: gold SUDO annotation used for KGC evaluation.
review.json: paper review metadata used for abstract-test preparation.
paper.pdf: original paper PDF.
paper.grobid.tei.xml: GROBID-parsed TEI XML.
paper.mainbody.md: Markdown text for the main body of the paper.
fact.json : supporting fact-level annotations.
sample.txt: auxiliary text sample for inspection or testing.
提供机构:
Zenodo
创建时间:
2026-05-06



