Distribution of trial registry numbers within full-text PubMed Central - full dataset of discovered links
收藏DataONE2025-02-04 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:3645524aaf5bb33c50a9b1a0c29565b8455afa846b205e2397f76489f923f236
下载链接
链接失效反馈官方服务:
资源简介:
Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. Articles from the PubMed Central full-text collection were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics. Three supporting files are included herein: a pdf containing supplementary figures pertaining to the distribution of registry numbers found within the full text of articles, a csv dataset providing the registry numbers discovered and the corresponding XML path location within the document, and an example Python script to locate registry identifiers within an XML article document. It should be noted that the purpose of this study is to..., These datasets and files are the results of scanning 6,901,686 XML documents within the Pubmed Central Open Access article datasets available at: https://ftp.ncbi.nlm.nih.gov/pub/pmc/
Each registry identifier match is represented by a row in the xmlScanOutput.csv file, along with PubMed identifiers, file information, XML path information, and several computed columns including a validation that an NCT number exists within ClinicalTrials.gov, a generalized article section, and publication types from multiple indexing sources. Summaries within the Distribution_of_Trial_Registry_Numbers_Additional_File.pdf were generated by counting distinct PMID values within the csv file across various groups., , # Distribution of trial registry numbers within full-text PubMed Central - full dataset of discovered links
[https://doi.org/10.5061/dryad.dbrv15fb1](https://doi.org/10.5061/dryad.dbrv15fb1)
This data set contains a table with every combination of publication ID, registry number, XML path, and section of the publication discovered in the Full-Text scanning of PubMed Central articles.
## Description of the data and file structure
#### **Distribution\_of\_Trial\_Registry\_Numbers\_Additional\_File.pdf**
This document contains charts and summaries of the trial registry numbers found from the XML document scanning process. The explicit criteria for locating registry identifiers and designating article sections are provided in this document and may be useful for further research and refinement.
#### **Distribution\_of\_Trial\_Registry\_Numbers\_ScanOutput.zip**
This zip archive contains a comma-separated file named \"xmlScanOutput.csv\" that contains all rows of registry numbers and art...
创建时间:
2025-02-05



