Experimental setup.
收藏Figshare2025-10-09 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Experimental_setup_/30321500
下载链接
链接失效反馈官方服务:
资源简介:
While large language models (LLMs) have made significant advances in various fields, The study of applying LLMs to infectious disease-specific tasks has lagged behind. This study addresses this gap by introducing the Infectious Disease Question and Answering Dataset (IDQuAD), which is a novel dataset designed to train and evaluate LLMs in infectious disease-related queries. IDQuAD is constructed using medical papers, patents, and news, and employs innovative methodologies such as generating answers before questions and using counterfactual thinking to enhance the quality of the Question Answering (QA) pairs. In the experimental phase, we fine-tuned the Mistral-7B model on the IDQuAD dataset to test the effectiveness of our proposed datasets on LLM performance in QA tasks related to infectious diseases. The fine-tuned Mistral-7B model demonstrated substantial performance improvements, with its EM score increasing from 28.49% to 65.47% in the one-shot setting. Additionally, we evaluated other LLMs across various setups. Among all models tested, our fine-tuned model achieved the highest performance across metrics and settings. In conclusion, this study introduces IDQuAD as a foundational dataset for infectious disease research, demonstrating the effectiveness of fine-tuning LLMs and paving the way for future advances in dataset development and LLM refinement for infectious disease tasks.
创建时间:
2025-10-09



