Anonymized Query Log Dataset for the AKTIN Infrastructure
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14509530
下载链接
链接失效反馈官方服务:
资源简介:
General Description
This dataset contains anonymized log data derived from the operations of the AKTIN Federated Data Access Authorization System. The AKTIN infrastructure facilitates secure, federated access to electronic health records (EHRs) from emergency departments (EDs) in Germany. The data reflect query operations conducted over the AKTIN Broker middleware and are published to accompany the manuscript:
"Pioneering Federated Data Access for a Learning Healthcare System: Implementation Report of the Federated Data Access Authorization System of the German National Emergency Department Data Registry".
The dataset is suitable for analyses of query efficiency, query success rates, system performance evaluation, and understanding federated data access workflows.
Content of the Dataset
The dataset aktin_broker_query_metadata_anonymized.csv includes anonymized query-level metadata. It was derived from raw broker log files and processed to ensure privacy protection. Specifically:
request_id:
A sequential, anonymized query identifier replacing the original request ID.
node_id:
Anonymized unique identifier for the ED nodes (participating emergency departments).
last_status:
The last recorded status of a query (e.g., completed, rejected, failed).
time_until_rejection (days):
Time (in days) between query retrieval and rejection, if applicable.
processing_time (seconds):
Time (in seconds) taken to execute and process the query.
time_until_completed (days):
Time (in days) between query retrieval and successful completion.
year:
The calendar year in which the query was executed.
automatic_rule:
Logical flag (TRUE/FALSE) indicating whether the query followed an automatic rule (e.g., periodic or pre-approved queries).
Data Origin
Source: Raw log data from the AKTIN Broker middleware, capturing the communication between the AKTIN Broker and individual ED nodes.
Anonymization Process
The dataset has been anonymized using R syntax. Specifically:
request_id and node_id were replaced with sequential numeric values to ensure data privacy.
Sensitive timestamps were converted into derived time intervals (e.g., time_until_completed, time_until_rejection).
The calculation of these derived intervals was performed with high precision using R's difftime and mutate functions.
Example calculation for time_until_completed:
time_until_completed = round(as.numeric(difftime(completed, retrieved, units = "days")), 1)
Purpose
This dataset can be used for:
Performance evaluation of federated query systems.
Analysis of query completion times and patterns.
Understanding the adoption of automatic query rules.
Supporting future research on distributed health data infrastructures.
File Information
File Name: data_analysis_anonymous.csv
File Type: CSV (Comma Separated Values)
Encoding: UTF-8
Usage and License
This dataset is shared under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Proper attribution is required when using this dataset.
Contact Information
Author: Jonas Bienzeisler
创建时间:
2024-12-17



