Evaluation Dataset for RCT Classification by LLMs and Cochrane RCT Classifier
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/2mnnrd7nwb
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the evaluation results of multiple large language models (LLMs) and the Cochrane RCT classifier in classifying biomedical abstracts as randomized controlled trials (RCTs) or non-RCTs. The dataset was constructed by retrieving 2,252 articles related to "Thrombectomy" from PubMed, then manually labeled by two independent reviewers to establish a gold standard. Six LLMs (5 foundational LLMs, 1 fine-tuned LLMs) were then prompted to classify the same set of abstracts using a consistent instruction template. The Cochrane RCT classifier output is included for benchmarking. Each model’s binary decision (“Include” for probable RCT; “Exclude” otherwise) and a brief justification are recorded.
Column Definitions:
Title: Title of the article.
Abstract: Abstract of the article.
Gold_Standard: Manual label assigned by consensus review (1 = RCT, 0 = non-RCT).
Cochrane_decision: Output from the Cochrane RCT classifier (1 = probable RCT, 0 = not RCT).
GPT_4.1_decision: Output from the GPT-4.1 model (1 = Include, 0 = Exclude).
Reason: Brief justification from the GPT-4.1 model.
Llama_4_Maverick_decision: Output from the Llama-4-Maverick model (1 = Include, 0 = Exclude).
Reason: Brief justification from the Llama-4-Maverick model.
R1_Distilled_Qwen_32B_decision: Output from the R1 Distilled Qwen-32B model (1 = Include, 0 = Exclude).
Reason: Brief justification from the Qwen-32B model.
Qwen3_14B_decision: Output from the Qwen3-14B model (1 = Include, 0 = Exclude).
Reason: Brief justification from the Qwen3-14B model.
Mistral_Nemo_Instruct_2407_decision: Output from the pre-fine-tuned Mistral model (1 = Include, 0 = Exclude).
Reason: Brief justification from the pre-fine-tuned Mistral model.
Fine_tuned_Mistral_Nemo_Instruct_2407_decision: Output from the fine-tuned Mistral model (1 = Include, 0 = Exclude).
Reason: Brief justification from the fine-tuned Mistral model.
创建时间:
2025-05-23



