Antimicrobial Resistance Microbiological Dataset (ARMD-UTSW): A deidentified collection of electronic health records, from a quaternary, academic medical center, for antimicrobial resistance research
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.0rxwdbsd5
下载链接
链接失效反馈官方服务:
资源简介:
Antibiotic resistance is a global public health emergency, but quality, real world, EHR based data sets that can be utilized for antibiotic resistance research are limited. We have developed the Antibiotic Resistance Microbiology Dataset: UTSW (ARMD: UTSW) which includes microbiological culture testing and susceptibility results for 237,258 patients at the University of Texas Southwestern Medical Center (UTSW) from 2005-2025. Longitudinal demographics, prior medical histories, medications, procedures, and clinical specific information such as testing locations and recent laboratory values are also incorporated into the deidentified data set. With the standardization of data values and careful deidentification of the ARMD: UTSW data set, researchers globally will be able to improve patient outcomes, increase awareness, and add to the collective knowledge regarding antibiotic resistance.
Methods
Our Antibiotic Resistance Microbiology Dataset: UTSW (ARMD: UTSW) data set comprises a longitudinal collection of Epic based EHR data from the University of Texas Southwestern Medical Center (UTSW) for adults (≥18 years old) from 2005 to 2025. It includes deidentified microbiological laboratory results for urine, blood, and respiratory cultures. Also included is patient demographics, comorbidities, socioeconomic factors via the area deprivation index, and prior exposure to antibiotics and procedures. All data was collected from UTSW’s Epic Clarity database via Microsoft’s T-SQL based SQL Server Management Studio.
The raw data was then transformed into standardized values to assist with future research uses. This includes standardizing gender into two deidentified values of “0” or “1” (Null was used for missing gender data), bucketing patient ages at the time when the culture was taken into age ranges such as 18-24, 25-34, etc., and standardizing medication names into generic names for consistency. Additionally, susceptibility results have been standardized to values of “susceptible”, “intermediate”, “resistant”, “synergism”, and “inconclusive” to account for the various reporting means from different laboratories. Culture positivity was also standardized to a binary indicator of “0” or “1” based on the inclusion of susceptibility results for positive cultures. In addition, we accounted for patients with an active infection that might have multiple cultures taken in a short time period by excluding patients with prior microbiological cultures within the two weeks before the encounter.
Deidentification was completed according to Safe Harbor regulations. All patient identifiers were either not included or were anonymized. Examples of this include the anonymization of patient identification numbers such as patient ID numbers, encounter ID numbers, and culture order ID numbers through a randomization process for each while keeping continuity between all individual patient records. While patient zip codes were used to identify values for the area deprivation index, they were removed prior to the final data set. As mentioned previously, patient ages were aggregated into age ranges and gender has been concealed with either “0”, “1”, or “Null”. All date and time information has been shifted with a randomly assigned offset for each patient. This allows for consistent offset values while still being able to account for possible temporal relationships within the dataset.
创建时间:
2025-09-26



