Performance of GPT-4o mini and GPT-4o for medical text mining tasks at different temperature settings

DataONE2025-01-13 更新2025-04-26 收录

下载链接：

https://search.dataone.org/view/sha256:ccf85c40ccb7b5d1b53a0d2e45a19454b0d567b8a0b150c18461617b5c22a78f

下载链接

链接失效反馈

官方服务：

资源简介：

The application of natural language processing (NLP) for extracting data from biomedical research has gained momentum with the advent of large language models (LLMs). However, the effect of different LLM parameters, such as temperature settings, on biomedical text mining remains underexplored and a consensus on what settings can be considered âsafeâ is missing. This study evaluates the impact of temperature settings on LLM performance for a named entity recognition and a classification task in clinical trial publications. Two datasets that had been annotated as part of previous projects by the author group were used to create tasks for the evaluation of two LLMs, namely Generative Pretrained Transformer 4 Omni (GPT-4o, OpenAI, San Francisco, United States) and GPT-4o mini at nine different temperature settings. The LLMs were first asked to extract the number of people who underwent randomization from the abstract of a publication reporting on a randomized clinical trial (RCT). The secon..., Two datasets that had been annotated as part of previous projects by the author group were used to create tasks for the evaluation of two LLMs, namely Generative Pretrained Transformer 4 Omni (GPT-4o, OpenAI, San Francisco, United States) and GPT-4o mini at nine different temperature settings (0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00). The respective versions that were used were gpt-4o-2024-05-13 and gpt-4o-mini-2024-07-18. The first task was to extract the number of people who underwent randomization from the abstract of a publication reporting on a randomized clinical trial (RCT). To this end, a random sample of 996 randomized controlled trials (RCTs) from seven major journals (British Medical Journal, JAMA, JAMA Oncology, Journal of Clinical Oncology, Lancet, Lancet Oncology, New England Journal of Medicine) published between 2010 and 2022 were labeled. The abstracts were retrieved as a txt file from PubMed and parsed using regular expressions (i.e., expressions that matc..., , # Performance of GPT-4o mini and GPT-4o for medical text mining tasks at different temperature settings [https://doi.org/10.5061/dryad.crjdfn3dt](https://doi.org/10.5061/dryad.crjdfn3dt) ## Description of the data and file structure ### Files and variables **Note: Empty cells in the formatted responses mean that the raw response could not be formatted correctly (due to a hallucination of the model when it produced the raw response). As an example, the raw response might contain \"746:@\"-æ¥æ¬ ä¸ é®¬ØªÙÙÙØ³Ø±\" which causes the formatted response to be empty.** #### File: sample\_size\_gpt\_responses\_all.csv **Description:**Â The dataset with the ground truth and the answers by the LLM for extracting the number of people who underwent randomization in each publication (task 1). ##### Variables * doi:Â Digital Object Identifier of the publication * date:Â Publication data according to PubMed * journal:Â Journal the publication was published in according to PubMed * title:Â Title of the publicati...

随着大语言模型（Large Language Model，LLM）的问世，利用自然语言处理（Natural Language Processing，NLP）从生物医学研究中提取数据的应用迎来了迅猛发展。然而，不同大语言模型参数（如温度设置）对生物医学文本挖掘的影响仍未得到充分探索，且尚未就何种设置可被认定为"安全"达成共识。本研究针对临床试验文献中的命名实体识别与分类任务，评估了温度设置对大语言模型性能的影响。研究团队使用了其在过往项目中标注的两个数据集，构建了针对两款大语言模型的评估任务，即生成式预训练Transformer 4 Omni（GPT-4o，OpenAI，美国旧金山）与GPT-4o mini，并设置了九种不同的温度参数。研究首先要求大语言模型从报告随机对照试验（Randomized Controlled Trial，RCT）的文献摘要中提取接受随机分组的受试者人数。后续内容……（原文截断）本研究使用作者团队在过往项目中标注的两个数据集，构建了针对两款大语言模型的评估任务：生成式预训练Transformer 4 Omni（GPT-4o，OpenAI，美国旧金山）与GPT-4o mini，共设置九种温度参数（0.00、0.25、0.50、0.75、1.00、1.25、1.50、1.75、2.00）。所用模型版本分别为gpt-4o-2024-05-13与gpt-4o-mini-2024-07-18。第一项任务为从报告随机对照试验（RCT）的文献摘要中提取接受随机分组的受试者人数。为此，研究团队对2010年至2022年间发表于7种主流期刊（《英国医学期刊》《美国医学会杂志》《美国医学会杂志·肿瘤学》《临床肿瘤学杂志》《柳叶刀》《柳叶刀·肿瘤学》《新英格兰医学杂志》）的996篇随机对照试验文献进行了随机抽样并标注。研究从PubMed获取了这些文献的摘要并保存为TXT文件，随后通过正则表达式（即用于匹配……的表达式）进行解析，原文内容被截断。 # 不同温度设置下GPT-4o mini与GPT-4o在医学文本挖掘任务中的性能表现 [https://doi.org/10.5061/dryad.crjdfn3dt](https://doi.org/10.5061/dryad.crjdfn3dt) ## 数据与文件结构说明 ### 文件与变量 **注意：格式化结果中的空单元格表示原始响应无法正确格式化（因模型生成原始响应时出现幻觉）。例如，原始响应可能包含"746:@"-日本中粤ØªÙÙÙØ³Ø±"这类内容，导致格式化结果为空。** #### 文件：sample_size_gpt_responses_all.csv **描述：** 包含真实标签与大语言模型针对每项文献中接受随机分组的受试者人数提取结果（任务1）的数据集。 ##### 变量 * doi：文献的数字对象标识符 * date：PubMed收录的文献出版日期 * journal：PubMed收录的文献发表期刊 * title：文献标题……（原文内容被截断）

创建时间：

2025-01-14