Anonymized Query Log Dataset for the AKTIN Infrastructure

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14509530

下载链接

链接失效反馈

官方服务：

资源简介：

General Description This dataset contains anonymized log data derived from the operations of the AKTIN Federated Data Access Authorization System. The AKTIN infrastructure facilitates secure, federated access to electronic health records (EHRs) from emergency departments (EDs) in Germany. The data reflect query operations conducted over the AKTIN Broker middleware and are published to accompany the manuscript: "Pioneering Federated Data Access for a Learning Healthcare System: Implementation Report of the Federated Data Access Authorization System of the German National Emergency Department Data Registry". The dataset is suitable for analyses of query efficiency, query success rates, system performance evaluation, and understanding federated data access workflows. Content of the Dataset The dataset aktin_broker_query_metadata_anonymized.csv includes anonymized query-level metadata. It was derived from raw broker log files and processed to ensure privacy protection. Specifically: request_id: A sequential, anonymized query identifier replacing the original request ID. node_id: Anonymized unique identifier for the ED nodes (participating emergency departments). last_status: The last recorded status of a query (e.g., completed, rejected, failed). time_until_rejection (days): Time (in days) between query retrieval and rejection, if applicable. processing_time (seconds): Time (in seconds) taken to execute and process the query. time_until_completed (days): Time (in days) between query retrieval and successful completion. year: The calendar year in which the query was executed. automatic_rule: Logical flag (TRUE/FALSE) indicating whether the query followed an automatic rule (e.g., periodic or pre-approved queries). Data Origin Source: Raw log data from the AKTIN Broker middleware, capturing the communication between the AKTIN Broker and individual ED nodes. Anonymization Process The dataset has been anonymized using R syntax. Specifically: request_id and node_id were replaced with sequential numeric values to ensure data privacy. Sensitive timestamps were converted into derived time intervals (e.g., time_until_completed, time_until_rejection). The calculation of these derived intervals was performed with high precision using R's difftime and mutate functions. Example calculation for time_until_completed: time_until_completed = round(as.numeric(difftime(completed, retrieved, units = "days")), 1) Purpose This dataset can be used for: Performance evaluation of federated query systems. Analysis of query completion times and patterns. Understanding the adoption of automatic query rules. Supporting future research on distributed health data infrastructures. File Information File Name: data_analysis_anonymous.csv File Type: CSV (Comma Separated Values) Encoding: UTF-8 Usage and License This dataset is shared under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Proper attribution is required when using this dataset. Contact Information Author: Jonas Bienzeisler

创建时间：

2024-12-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集