Responses of seven tested models for real clinical questions

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Responses_of_seven_tested_models_for_real_clinical_questions/30889556

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains 14 files, each corresponding to the model responses to 14 question groups, two of which belong to the AE (accident and emergency) specialty, and the other 12 belonging to 12 separate specialties (from cardiology to rare diseases). Each file (in JSON format) contains dozens of items, each corresponding to responses for a specific clinical question. Each question has a unique ID (QID) and is one of two question types, OSCE or realCases. The "modelResponses" field typically is a dictionary with seven keys pointing to seven responses. The seven keys, which is in the format of 6-letter random alphabet followed by a four-letter suffix, represent the seven models tested. The key suffices help identify the models that produced the responses, with explanations below: CG4L: ChatGPT-4oL (first released May 13, 2024) QM212: QwenMAX (Release Date [API Availability]: January 29, 2025) RDS3: DS32B (DeepSeek-R1-Distill-Qwen-32B, Released January 20, 2025) G2FT: Gemini2F (gemini-2.0-flash-thinking-exp-01-21, Released January 21, 2025) Q3B: Qwen32B (Qwen3 32B or QwQ-32B, March 6, 2025) SR67: DS-r1 671B ( Released January 20, 2025) R17: DeS70B (DeepSeek-R1-Distill-Llama-70B, January 20, 2025) hallu-sept-4-2025.xlsx contains the 40 alleged hallucinations.

创建时间：

2025-12-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集