Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"
收藏Figshare2020-11-03 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_from_Obstacles_to_the_Reuse_of_Study_Metadata_in_ClinicalTrials_gov_/12743939/1
下载链接
链接失效反馈官方服务:
资源简介:
This fileset provides supporting data and corpora for the empirical study described in: <br><b>Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.gov</b><br><i>Description of files</i><br><b>Original data files:</b>- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.<br><b>BioPortal API Query Results</b>- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.<br><br><b>Analytical Results:</b>- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final Rule<br>
本数据集为以下文献中所述的实证研究提供支撑数据与语料库:<br>劳拉·米尔隆(Laura Miron)、拉斐尔·S·贡萨尔维斯(Rafael S. Goncalves)与马克·A·穆森(Mark A. Musen):《ClinicalTrials.gov中元数据复用的障碍》(Obstacles to the Reuse of Metadata in ClinicalTrials.gov)<br><i>文件说明</i><br><b>原始数据文件:</b><br>- AllPublicXml.zip:包含从ClinicalTrials.gov(临床试验注册平台)下载的全部公开XML记录集(含试验方案与汇总结果信息),为后续所有分析的基础数据集。该数据集包含2019年4月3日下载的302091条记录。<br>- public.xsd:为2019年4月3日从ClinicalTrials.gov下载的XML模式(XML Schema),用于校验AllPublicXml.zip中的记录。<br><br><b>BioPortal API 查询结果</b><br>- condition_matches.csv:包含针对从ClinicalTrials.gov的XML文件中抓取的所有病症字符串,通过BioPortal API查询得到的所有与病症字符串完全匹配的本体术语查询结果。字段包括:filename(文件名)、condition(病症)、url(链接)、bioportal term(BioPortal术语)、cuis(概念唯一标识符,CUI)、tuis(语义类型标识符,TUI)。<br>- intervention_matches.csv:包含针对从ClinicalTrials.gov的XML文件中抓取的所有干预措施字符串,通过BioPortal API查询得到的结果。字段包括:filename(文件名)、intervention(干预措施)、url(链接)、bioportal term(BioPortal术语)、cuis(概念唯一标识符,CUI)、tuis(语义类型标识符,TUI)。<br><br><b>分析结果:</b><br>- EC_human_review.csv:包含对400条CT.gov(ClinicalTrials.gov)记录中随机抽取的入组标准进行人工评审的结果。该文件包含字段:文件名、评审标准,以及人工评审判定该标准是否包含针对参与者"多亚组"的筛选条件。<br>- completeness.xlsx:包含符合《食品药品监督管理局修正案法案2007第801条》(FDAAA801)及其最终规则要求的必填字段中存在字段缺失的干预性试验记录的数量与占比。<br>- industry_completeness.xlsx:包含按试验牵头申办方的机构类型(分为"美国国立卫生研究院(NIH)"、"美国联邦机构(US Fed)"、"企业(Industry)"与"其他(Other)")分组,以及在最终规则生效日期前后的干预性试验记录中存在必填字段缺失情况的占比数据。<br>- location_completeness.xlsx:包含按试验记录是否至少包含1个美国境内研究地点、仅包含境外研究地点(排除未标注任何研究地点的试验)分组,以及在最终规则生效日期前后的干预性试验记录中存在必填字段缺失情况的占比数据。
创建时间:
2020-07-31



