five

Dataset for Evaluating Semantic Evolution Handling in MellowDB: Query Rewriting vs. Data Preprocessing

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/ytfdw568kk
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset supports a research article investigating how to manage semantic heterogeneity in long-term data analysis using MellowDB, a system designed to handle semantic evolution. It enables reproducible experiments comparing two alternative approaches—query rewriting and data preprocessing—under real-world conditions. The dataset includes: Original Dataset: Data collected from the Brazilian public health system, used as the baseline prior to semantic changes. Semantic Evolution Operations: A curated sequence of transformations simulating realistic evolution scenarios. Raw Experimental Results: Outputs from performance evaluations for both query rewriting and data preprocessing under various evolution stages, with and without indexing. R Analysis Script: A reproducible script that generates the plots and tables used in the article, highlighting comparative performance results. The results show that both approaches enable users to query data without being aware of previous semantic changes, provided that the evolution operations are stored. Preprocessing generally yields faster query times but at a higher storage cost, while rewriting defers semantic handling to query time and is more efficient for write-heavy scenarios. The experiments demonstrate that both approaches are viable for production use, and that the choice between them should depend on the query-to-insertion ratio. Indexing strategies significantly affect performance, with optimal configurations reducing query overhead to as little as 0.01 seconds. The results can be reproduced by using the original dataset and the semantic evolution operations. Full system code together with the simulation scripts may be found in https://github.com/pisn/semantic_heterogeneous_database
创建时间:
2025-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作