Simulated dataset for analysis via Large language models, via the analytical framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA).

Name: Simulated dataset for analysis via Large language models, via the analytical framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA).
Creator: figshare
Published: 2025-05-01 06:18:12
License: 暂无描述

DataCite Commons2025-05-01 更新2025-05-07 收录

下载链接：

https://figshare.com/articles/dataset/Simulated_dataset_for_analysis_via_Large_language_models_via_the_analytical_framework_Analysis_of_Individual_Heterogeneity_and_Discriminatory_Accuracy_AIHDA_/28560710/1

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset consists of 10 000 simulated observations. It is utilized to explore and apply Large language models for data analysis, via the analytical framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The dataset is based on a previous study's aggregated results (Öberg J, Khalaf K, Perez Vicente R, Johnell K, Fastbom J, J. M. Geographic and socioeconomic differences in potentially inappropriate medication among older adults – Applying a simplified analysis of individual heterogeneity and discriminatory accuracy (AIHDA) for basic comparisons of healthcare quality. BMC Health Services Research. 2024 (Under peer-review). Empirical patient data must be analyzed within a secure IT environment to ensure confidentiality. By utilizing simulated patient data, we can apply a cloud-based GPT to our analysis, thereby gaining access to computational power and LLM capabilities that would otherwise be inaccessible to us via local LLMs. For the purposes of our study, a simulated database is a suitable solution. The simulated database was created by ChatGPT 4o based on the previous publication already referenced. By doing so, we can illustrate the application of GPT-based analysis in a real-world example of a healthcare quality indicator. The quality indicator, known as potentially inappropriate medication among older adults, is managed by the Swedish National Board of Health and Welfare (NBHW).<br>

本数据集包含10000条模拟观测数据，旨在通过个体异质性与判别准确性分析（Analysis of Individual Heterogeneity and Discriminatory Accuracy, AIHDA）分析框架，探索并应用大语言模型（Large Language Model, LLM）开展数据分析工作。本数据集基于既往研究的汇总结果（Öberg J, Khalaf K, Perez Vicente R, Johnell K, Fastbom J, J. M. Geographic and socioeconomic differences in potentially inappropriate medication among older adults – Applying a simplified analysis of individual heterogeneity and discriminatory accuracy (AIHDA) for basic comparisons of healthcare quality. BMC Health Services Research. 2024，待同行评审）。为保障患者隐私，真实临床患者数据需在安全的信息技术环境中开展分析。通过使用模拟患者数据，我们可将基于云端的GPT应用于本次分析流程，从而获得本地大语言模型无法提供的计算能力与大语言模型功能。就本研究而言，模拟数据库是适宜的解决方案。本模拟数据库由ChatGPT 4o基于前述参考文献中的已发表研究构建而成，借此我们可在老年人群潜在不适当用药这一医疗质量指标的真实应用场景中，展示基于GPT的数据分析方法。该医疗质量指标即老年人群潜在不适当用药情况，由瑞典国家卫生与福利委员会（Swedish National Board of Health and Welfare, NBHW）负责管理。

提供机构：

figshare

创建时间：

2025-03-09