Dataset for Evaluating Semantic Evolution Handling in MellowDB: Query Rewriting vs. Data Preprocessing
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/ytfdw568kk
下载链接
链接失效反馈官方服务:
资源简介:
This dataset supports a research article investigating how to manage semantic heterogeneity in long-term data analysis using MellowDB, a system designed to handle semantic evolution. It enables reproducible experiments comparing two alternative approaches—query rewriting and data preprocessing—under real-world conditions.
The dataset includes:
Original Dataset: Data collected from the Brazilian public health system, used as the baseline prior to semantic changes.
Semantic Evolution Operations: A curated sequence of transformations simulating realistic evolution scenarios.
Raw Experimental Results: Outputs from performance evaluations for both query rewriting and data preprocessing under various evolution stages, with and without indexing.
R Analysis Script: A reproducible script that generates the plots and tables used in the article, highlighting comparative performance results.
The results show that both approaches enable users to query data without being aware of previous semantic changes, provided that the evolution operations are stored. Preprocessing generally yields faster query times but at a higher storage cost, while rewriting defers semantic handling to query time and is more efficient for write-heavy scenarios. The experiments demonstrate that both approaches are viable for production use, and that the choice between them should depend on the query-to-insertion ratio. Indexing strategies significantly affect performance, with optimal configurations reducing query overhead to as little as 0.01 seconds.
The results can be reproduced by using the original dataset and the semantic evolution operations. Full system code together with the simulation scripts may be found in https://github.com/pisn/semantic_heterogeneous_database
创建时间:
2025-06-05



