Antimicrobial Resistance Microbiological Dataset (ARMD-ECUH): A deidentified collection of electronic health records from a rural academic health system for antimicrobial resistance research
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.7sqv9s55x
下载链接
链接失效反馈官方服务:
资源简介:
As antimicrobial resistance is increasingly becoming an emergent public health issue, quality, real-world electronic health record-based data sets available for research are lacking. To help remedy this, we have developed the Antimicrobial Resistance Microbiological Dataset: East Carolina University Health (ARMD-ECUH), which includes microbiological culture and susceptibility results for 261,217 patients from ECU Health from 2015 to 2025. Additionally, the inclusion of longitudinal data such as patient demographics, prior medical histories, medications, and procedures adds to the significance of the data set. Clinically relevant data, such as the locations where the cultures were gathered, recent laboratory values, and vitals taken during the respective encounters, are also included. The deidentified ARMD: ECUH data set, with standardized data values to minimize the need for data transformations, will allow researchers across the globe to improve patient outcomes and increase awareness and understanding of antimicrobial resistance.
Methods
Our Antimicrobial Resistance Microbiological Dataset - East Carolina University Health (ARMD-ECUH) data set is a longitudinal collection of Epic-based EHR from the ECU Health (ECUH) health system of adults (≥18 years old) from 2015 to 2025 (prior to date jittering for deidentification purposes). The data set includes deidentified microbiological laboratory results for blood, urine, and respiratory cultures. Encounter-based vitals, patient demographics, comorbidities, socioeconomic factors quantified by the Area Deprivation Index (ADI), and prior exposures to medications and procedures are included in the data set. All data were collected via a Microsoft Fabric-supported data warehouse, which contains daily updates from ECUH’s Epic Clarity database. The data were queried using Spark SQL in Fabric notebooks.
In similar methods described by both ARMD and ARMD-UTSW, all raw data were standardized to assist with future research applications. This standardization includes identifying gender into two anonymized values of “0” or “1” (Null was used for missing gender data), patient ages at the time when the culture was taken were divided into age ranges such as 18-24, 25-34, 35-44, etc. Medication names were also standardized by using generic names. To account for the various reporting methods used by different microbiology laboratories, susceptibility results were consolidated to the values of “susceptible”, “intermediate”, “resistant”, “synergism”, and “inconclusive.” Binary indicators of “0” or “1” were used to designate culture positivity as determined by the inclusion of susceptibility results for positive cultures. Patients with possible active infections were identified by having multiple cultures within the previous two weeks from the encounter and were thus excluded from the data set.
Safe Harbor regulations were utilized for the deidentification process, including removing or anonymizing all patient identifiers. This includes identifiers such as patient ID numbers, encounter ID numbers, and culture order ID numbers, which were anonymized through a process that produced randomized identification numbers consistent between all individual patient records. Socioeconomic factors were identified using the ADI, which requires the use of patient zip codes. Zip codes were removed after the identification of the ADI values. All datetime information has been offset by a randomly assigned number of days per patient, while keeping possible temporal relationships intact. Age and gender for patients were deidentified using the methods described previously.
创建时间:
2025-11-10



