five

Discourse Aware Scholarly Knowledge Graph Construction Dataset

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20055705
下载链接
链接失效反馈
官方服务:
资源简介:
Description The dataset is a manually curated benchmark for evaluating discourse-aware scholarly knowledge graph construction from scientific papers. It was created to support the development and evaluation of the Scholarly Upper Discourse Ontology (SUDO) and the SUDO-KG construction pipeline. The dataset is derived from research papers associated with the AMSR peer-review corpus. For each paper, the dataset includes source paper artifacts, parsed text representations, reviewer-facing metadata, and manually annotated gold-standard knowledge graph annotations. The annotations cover named entity spans and classes, finite-clause-level proposition spans and classes, and context-aware relations between artifacts, propositions, and proposition pairs. This dataset is suitable for research on: scholarly knowledge graph construction discourse-aware knowledge representation scientific information extraction ontology-guided annotation and evaluation grounded knowledge graph generation from scientific text evaluation of LLM-based and neuro-symbolic KGC pipeline Folder Structure The dataset is organized as one directory per paper. Each paper directory is named using a unique paper identifier. ```textgold_standard_dataset/v1/<paper_id>/annotation.jsonfact.jsonfacts.jsonpaper.grobid.tei.xmlpaper.mainbody.mdpaper.mdpaper.pdfreview.jsonsample.txt``` Main files: annotation.json: gold SUDO annotation used for KGC evaluation. review.json: paper review metadata used for abstract-test preparation. paper.pdf: original paper PDF. paper.grobid.tei.xml: GROBID-parsed TEI XML. paper.mainbody.md: Markdown text for the main body of the paper. fact.json : supporting fact-level annotations. sample.txt: auxiliary text sample for inspection or testing.
提供机构:
Zenodo
创建时间:
2026-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作