An Open Dataset of Scholarly Publications Referenced in Selected Policy Documents (POLIDOC_SCHOLAR)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8184040
下载链接
链接失效反馈官方服务:
资源简介:
POLIDOC_SCHOLAR: An Open Dataset of Scholarly Publications Referenced in Selected Policy Documents
This repository contains an open dataset of scholarly publications cited by selected policy documents.
1. Background:
We do not aim to create a dataset of references for all policy documents or millions of policy documents but rather from a carefully selected set of policy documents.
The long-term plan is to facilitate the inclusion of citations of scholarly publications in open bibliometric databases (or at least to create inter-operable datasets).
In the short-term, we plan to increase the number of policy documents included in the dataset and continue to monitor and increase the data quality (completeness of records, provided external identifiers).
We will also document - in the next release - the reference extraction process (including code used)
2. Structure of the dataset:
The dataset is structured into two primary categories: "Collections" and "Collection References."
Collections:
The metadata for selected policy documents is included the "collections.jsonl" file.
The collection is a central feature of the POLIDOC_SCHOLAR dataset. The selected policy documents are listed in the “collections.jsonl”.
For instance, a collection might include reports like the IPCC reports of the 6th Cycle (the "IPCC_AR_6 collection") or the reports from IPBES (the "IPBES collection").
Within each collection, there are "documents." These can be twofold:
They represent individual reports within a collection (e.g., the IPCC_AR_6 collection contains 6 reports: 3 assessment reports and 3 special reports from the 6th Cycle of the IPCC assessment).
They also denote specific sections of these reports that contain bibliographic references. These sections can be chapters or other segments like supplementary materials or annexes (any section which has a reference list). Each document has a unique code, and the relationships between a main document and its subdivisions are indicated in the "is_part_of" field.
Collection References:
To allow users to access only the collections they are interested in, we've separated references by collection in files named "collection_reference_{…name of collection…}jsonl."
Each of these files includes bibliographic references for every document in a specific collection.
Besides presenting these as "reference strings" (in their original format within the document), we also offer unique identifiers like DOI and OpenAlex ID to facilitate linkage to external databases.
The documentation of the dataset is provided in the file “data_dictionary”
3, Content release v1:
This release (POLIDOC_SCHOLAR version 1) includes 2 collections:
IPCC Assessment Cycle 6
IPBES Assessment reports
collection
Number of reports
Number of documents (“sections” with reference)
Number of references (strings, not unique)
Number of references with DOI (unique)
Number of references with DOI (unique)
1
IPCC Assessment Cycle 6
6
103
94,958
51,713
48,695
2
IPBES Assessment reports
3
27
21,750
12,100
11,896
创建时间:
2024-07-11



