"Agentic Medical Reasoning Dataset"

Name: "Agentic Medical Reasoning Dataset"
Creator: IEEE DataPort
Published: 2026-04-30 10:53:25
License: 暂无描述

DataCite Commons2026-04-30 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/agentic-medical-reasoning-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

"Benchmarking Agentic Clinical Reasoning: A Trustworthiness andSafety Analysis of Frontier LLMs in Medical ConsultationsDataset AbstractAt the early phase of this project, the decisive aim was on collection and preparation of datasets to ensure that the data authentically displays real\u2011world clinical scenarios. We have built a comprehensive dataset by combining four varies medical datasets, making sure thorough deduplication, which results in a robust collection of 488 rows and 4 columns that detail disease symptoms and treatment options. Dataset\u20111 comprises three columns, aiming on disease names, associated symptoms, and treatment advice, with over 400 records. Dataset\u20112 contains four columns centered on disease prediction along decision trees, specifying information about doctors, treatments, symptoms, and risk levels. Dataset\u20113 combines two files into a total of 18 columns: one file lists over 800 diseases along with up to 17 symptoms each, while the other provides four precautionary steps per disease. Finally, Dataset\u20114 includes four structured columns derived from web\u2011sourced medical information, detailing symptoms and corresponding treatments across 400 entries. Together, these datasets form a comprehensive and curated resource, significant for analyzing clinical reasoning in LLMs and supporting diverse AI\u2011based healthcare tasks. Our data is strategically classified in three principal bounds: symptoms to treatment, diseases to treatment, and symptoms to disease, authorizing versatile inquiry and mirroring real\u2011world clinical workflows. Each dataset entry is thoroughly annotated with ground truth labels to identify diseases, symptoms, and treatments, significantly enhancing our ability for efficient comparison. Thus, the dataset is designed to prioritize treatment suggestions while integrating an emergency alert that emphasizes the critical significance of consulting a medical professional in urgent situations.Data Included & Source IntegrationThe dataset is a curated corpus of clinical scenarios integrating four heterogeneous medical repositories. It brings together disease names, symptoms, treatments, and precautionary steps, all annotated with physician\u2011verified ground truth labels to ensure diagnostic and therapeutic reliability. Dataset\u20111 provides over 400 records focusing on symptoms and treatment advice. Dataset\u20112 contributes 4,920 records emphasizing disease prediction logic, doctor information, and risk levels. Dataset\u20113 merges two files with more than 800 diseases, each linked to up to 17 symptoms and four precautionary steps. Dataset\u20114 adds 1,200+ entries of web\u2011sourced clinical Q&A. Together, these sources form a consolidated benchmark resource that authentically reflects real\u2011world clinical reasoning tasks.Dataset IDPlatform \/ RepositoryRaw RecordsOriginal DimensionsPrimary ContributionDataset-1Kaggle (Aadya Singh)400+3 ColumnsCore Symptoms & Treatment AdviceDataset-2Kaggle (P. Eranga)4,920133 ColumnsClinical Decision Tree LogicDataset-3Kaggle (C.Q. Zheng)800+18 ColumnsMulti-symptom Mapping & PrecautionsDataset-4Hugging Face (QuyenAnhDE)1,200+4 ColumnsWeb-sourced Clinical Q&AConsolidatedIntegrated Corpus4884 ColumnsUnique Clinical Ground TruthMethodologyThe raw data, comprising approximately 7,320 rows, was systematically merged into a single entity. Deduplication was performed using a foreign key strategy to remove overlapping diseases and treatments. Disease labels were normalized through lowercase conversion and spell\u2011checking to ensure consistency. Each entry was then verified against physician\u2011annotated ground truth labels, guaranteeing reliability and validity. This methodology ensures that the dataset is both comprehensive and trustworthy, suitable for evaluating clinical reasoning in LLMs.Size & StructureThe final dataset consists of 488 unique clinical rows organized into 4 standardized columns. It is available in CSV and DataFrame formats for ease of use. The structure is as follows:Disease - standardized medical name.Symptoms - concatenated string of primary and secondary indicators.Treatment - merged string of precautions, cures, and professional advice.Filename - metadata for source traceability.This structure enables versatile queries and supports mappings across Symptoms \u2192 Treatment, Disease \u2192 Treatment, and Symptoms \u2192 Disease.CategoryParameterStatistical Value \/ DetailSource VolumeRaw Cumulative Records\u223c7,320 Rows (Pre-processing)Final ScaleTotal Observations (N)488 Unique Rows (Post-deduplication)Data StructureFeature Dimensions4 Columns (Disease, Symptoms, Treatment, Filename)Class DensityUnique Diseases\u223c41 Normalized CategoriesFeature DepthAvg. Symptoms\/Disease6.4 (Unique concatenated strings)Clinical ScopeMapping Taxonomy3-Bounds: Symp\u2192Treat, Dis\u2192Treat, Symp\u2192DisQuality ControlFill Rate100% (Diseases, Symptoms);94.2% (Treatments)Safety ScopeGround TruthPhysician-annotated & Emergency Alert IntegratedPossible Use CasesThe dataset has broad applicability across healthcare AI research and deployment. It can be used for LLM benchmarking, evaluating the performance of clinical reasoning agents such as GPT\u20114o, Gemini Flash\u20112.5, DeepSeek\u2011v3, and Microsoft Copilot. It supports diagnostic modeling, enabling classifiers to map symptoms to diseases. It can power healthcare chatbots, providing safe, curated treatment suggestions with integrated emergency alerts. Additionally, it serves as a valuable resource for medical education, allowing students and researchers to analyze clinical decision\u2011making workflows and compare them against physician\u2011annotated ground truth.Clinical Sub-taskMapping LogicPossible Use CaseTask ASymptoms \u2192 TreatmentTriage: Benchmarking first-aid and self-care advice.Task BDisease \u2192 TreatmentPrescription: Evaluating therapeutic knowledge.Task CSymptoms \u2192 DiseaseDiagnosis: Identifying conditions from patient input. "

提供机构：

IEEE DataPort

创建时间：

2026-04-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集