Randomized controlled clinical trials with tagged information regarding the number of participants
收藏DataCite Commons2026-03-12 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.g1jwstr0b
下载链接
链接失效反馈官方服务:
资源简介:
Background: Extracting the sample size from randomized controlled trials
(RCTs) remains a challenge to developing better search functionalities or
automating systematic reviews. Most current approaches rely on the sample
size being explicitly mentioned in the abstract. Data collection: A random
sample of 996 randomized controlled trials (RCTs) from seven major
journals (British Medical Journal, JAMA, JAMA Oncology, Journal of
Clinical Oncology, Lancet, Lancet Oncology, New England Journal of
Medicine) published between 2010 and 2022 were labeled. To do so,
abstracts were retrieved as a txt file from PubMed and parsed using
regular expressions (i.e., expressions that match certain patterns in
text). For each trial, the number of people who were randomized was
retrieved by looking at the abstract, followed by the full publication if
the number could not be determined with certainty from the abstract. In
addition, six different entities were tagged in each abstract, independent
of whether the information was presented using words or integers. If the
number of people who were randomized was explicitly stated (e.g., using
the words “randomly,” “randomized,” etc.), this was tagged as
“RANDOMIZED_TOTAL.” If the number of people who were analyzed was
presented, this was tagged as “ANALYSIS_TOTAL”. If the number of people
who completed the trial or a certain follow-up period was presented, this
was tagged as “COMPLETION_TOTAL. If the number of people who were part of
the trial without being more specific was presented, this was tagged as
“GENERAL_TOTAL”. If the number of people who were assigned to an arm of
the trial was presented, this was tagged as “ARM”. Lastly, if the number
of patients who were assigned to an arm was presented in the context of
how many patients experienced an event, this was tagged as “ARM_EVENT”. If
the abstract did not contain the aforementioned entities, the manuscript
was added to the dataset without any tags. Data properties: Each trial is
a row in the csv file. For a detailed description, please have a look at
the enclosed Readme file.
提供机构:
Dryad
创建时间:
2024-07-21



