five

Anonymized Query Log Dataset for the AKTIN Infrastructure

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14509530
下载链接
链接失效反馈
官方服务:
资源简介:
General Description This dataset contains anonymized log data derived from the operations of the AKTIN Federated Data Access Authorization System. The AKTIN infrastructure facilitates secure, federated access to electronic health records (EHRs) from emergency departments (EDs) in Germany. The data reflect query operations conducted over the AKTIN Broker middleware and are published to accompany the manuscript: "Pioneering Federated Data Access for a Learning Healthcare System: Implementation Report of the Federated Data Access Authorization System of the German National Emergency Department Data Registry". The dataset is suitable for analyses of query efficiency, query success rates, system performance evaluation, and understanding federated data access workflows.   Content of the Dataset The dataset aktin_broker_query_metadata_anonymized.csv includes anonymized query-level metadata. It was derived from raw broker log files and processed to ensure privacy protection. Specifically: request_id: A sequential, anonymized query identifier replacing the original request ID. node_id: Anonymized unique identifier for the ED nodes (participating emergency departments). last_status: The last recorded status of a query (e.g., completed, rejected, failed). time_until_rejection (days): Time (in days) between query retrieval and rejection, if applicable. processing_time (seconds): Time (in seconds) taken to execute and process the query. time_until_completed (days): Time (in days) between query retrieval and successful completion. year: The calendar year in which the query was executed. automatic_rule: Logical flag (TRUE/FALSE) indicating whether the query followed an automatic rule (e.g., periodic or pre-approved queries). Data Origin  Source: Raw log data from the AKTIN Broker middleware, capturing the communication between the AKTIN Broker and individual ED nodes. Anonymization Process The dataset has been anonymized using R syntax. Specifically: request_id and node_id were replaced with sequential numeric values to ensure data privacy. Sensitive timestamps were converted into derived time intervals (e.g., time_until_completed, time_until_rejection). The calculation of these derived intervals was performed with high precision using R's difftime and mutate functions. Example calculation for time_until_completed: time_until_completed = round(as.numeric(difftime(completed, retrieved, units = "days")), 1) Purpose This dataset can be used for: Performance evaluation of federated query systems. Analysis of query completion times and patterns. Understanding the adoption of automatic query rules. Supporting future research on distributed health data infrastructures. File Information File Name: data_analysis_anonymous.csv File Type: CSV (Comma Separated Values) Encoding: UTF-8 Usage and License This dataset is shared under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Proper attribution is required when using this dataset. Contact Information Author: Jonas Bienzeisler
创建时间:
2024-12-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作