Performance of GPT-4o mini and GPT-4o for medical text mining tasks at different temperature settings
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.crjdfn3dt
下载链接
链接失效反馈官方服务:
资源简介:
The application of natural language processing (NLP) for extracting data
from biomedical research has gained momentum with the advent of large
language models (LLMs). However, the effect of different LLM parameters,
such as temperature settings, on biomedical text mining remains
underexplored and a consensus on what settings can be considered “safe” is
missing. This study evaluates the impact of temperature settings on LLM
performance for a named entity recognition and a classification task in
clinical trial publications. Two datasets that had been annotated as part
of previous projects by the author group were used to create tasks for the
evaluation of two LLMs, namely Generative Pretrained Transformer 4 Omni
(GPT-4o, OpenAI, San Francisco, United States) and GPT-4o mini at nine
different temperature settings. The LLMs were first asked to extract the
number of people who underwent randomization from the abstract of a
publication reporting on a randomized clinical trial (RCT). The second
task was to classify an abstract regarding whether or not it was reported
on an RCT and/or an oncology topic. The answers of the LLM as well as the
ground truth are provided in the dataset.
提供机构:
Dryad
创建时间:
2025-01-13



