data and code for "Beyond Human Gold Standards: A Multi-Model Framework for Automated Abstract Classification and Information Extraction" article

Name: data and code for "Beyond Human Gold Standards: A Multi-Model Framework for Automated Abstract Classification and Information Extraction" article
Creator: Zenodo
Published: 2025-07-07 12:37:55
License: 暂无描述

Zenodo2025-07-07 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.15829039

下载链接

链接失效反馈

官方服务：

资源简介：

This is the public repository for the article "Beyond Human Gold Standards: A Multi-Model Framework for Automated Abstract Classification and Information Extraction" by Delphine S. Courvoisier, Diana Buitrago Garcia, Nils Burgisser, Clément P. Buclin, Michele Iudici, and Denis Mongin. The uptodate repository can be found here: https://gitlab.unige.ch/trial_integrity/llm_majority_public The structure of the repository is as follows: - The folder [LLM_inference](./LLM_inference) contains the LLM inferences for the two tasks performed on the abstracts list of the [abstract csv file](./LLM_inference/abstract.csv) by the list of LLMs described in the [model_list.csv](./LLM_inference/model_list.csv) file. The two tasks are the task for the classification of the intervention (folder [abstract_classification](./LLM_inference/abstract_classification)) and the task for the extraction of the number of participants ([participant_numbers](./LLM_inference/participant_numbers) folder). The initial list of abstract conatined 1080 abstract, some of which were not considered in our final analysis because they were protocols, and not randomized. - both folders contain the python script used for the inference using the prompt in the `prompt` folder, the two bash scripts used to run it on the university HPC. - All inference results are une the `results` folder, which contains the log files, and one csv file per model - The file gold.csv contains, for the final list of 1020 abstracts, the tasks performed by each reviewers, the human gold standard, and the platine stndard, with a 0/1 variable `platine_check` indicating which gold results were re-checked - The folder [R_analysis](./R_analysis) contains the R files allowing to perform the analysis, produce the tables and the figures: - the file [analysis.R](./R_analysis/analysis.R) contains the code to read the LLM inferences results, and calculate the accuracy for the different model combinations. It output a file in the [results](./R_analysis/results) folder - the file [figure_tables.R](./R_analysis/figure_tables.R) contains the R code using the result of the analysis.R code to produce the tables and figures of the article. The figures and tables are created in the [figures_tables](./R_analysis/figures_tables) folder. The file [trial_publication_info.csv](./R_analysis/trial_publication_info.csv) contains the information about the RCT used for this analysis, coming from the data of the study doi.org/10.1016/j.jclinepi.2024.111586 . - the file [help_func.R](./R_analysis/help_func.R) contains the functions used to format the table results, and is loaded in `figure_tables.R`.

提供机构：

Zenodo

创建时间：

2025-07-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集