five

Resources of IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10171156
下载链接
链接失效反馈
官方服务:
资源简介:
IncRML resources This Zenodo dataset contains all the resources of the paper 'IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources' submitted to the Semantic Web Journal's Special Issue on Knowledge Graph Construction. This resource aims to make the paper experiments fully reproducible through our experiment tool written in Python which was already used before in the Knowledge Graph Construction Challenge by the ESWC 2023 Workshop on Knowledge Graph Construction. The exact Java JAR file of the RMLMapper (rmlmapper.jar) is also provided in this dataset which was used to execute the experiments. This JAR file was executed with Java OpenJDK 11.0.20.1 on Ubuntu 22.04.1 LTS (Linux 5.15.0-53-generic). Each experiment was executed 5 times and the median values are reported together with the standard deviation of the measurements. Datasets We provide both dataset dumps of the GTFS-Madrid-Benchmark and of real-life use cases from Open Data in Belgium.GTFS-Madrid-Benchmark dumps are used to analyze the impact on execution time and resources, while the real-life use cases aim to verify the approach on different types of datasets since the GTFS-Madrid-Benchmark is a single type of dataset which does not advertise changes at all. Benchmarks GTFS-Madrid-Benchmark: change types with fixed data size and amount of changes: additions-only, modifications-only, deletions-only (11 versions) GTFS-Madrid-Benchmark: amount of changes with fixed data size: 0%, 25%, 50%, 75%, and 100% changes (11 versions) GTFS-Madrid-Benchmark: data size with fixed amount of changes: scales 1, 10, 100 (11 versions) Real-world datasets Traffic control center Vlaams Verkeerscentrum (Belgium): traffic board messages data (1 day, 28760 versions) Meteorological institute KMI (Belgium): weather sensor data (1 day, 144 versions) Public transport agency NMBS (Belgium): train schedule data (1 week, 7 versions) Public transport agency De Lijn (Belgium): busses schedule data (1 week, 7 versions) Bike-sharing company BlueBike (Belgium): bike-sharing availability data (1 day, 1440 versions) Bike-sharing company JCDecaux (EU): bike-sharing availability data (1 day, 1440 versions) OpenStreetMap (World): geographical map data (1 day, 1440 versions) Ingestion Real-world datasets LDES output was converted into SPARQL UPDATE queries and executed against Virtuoso to have an estimate for non-LDES clients how incremental generation impacted ingestion into triplestores. Remarks The first version of each dataset is always used as a baseline. All next versions are applied as an update on the existing version. The reported results are only focusing on the updates since these are the actual incremental generation. GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz datasets are not uploaded as GTFS-Madrid-Benchmark scale 100 because both share the same parameters (50% changes, scale 100). Please use GTFS-Scale-100-{ALL, CHANGE}.tar.xz for GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz All datasets are compressed with XZ and provided as a TAR archive, be aware that you need sufficient space to decompress these archives! 2 TB of free space is advised to decompress all benchmarks and use cases. The expected output is provided as a ZIP file in each TAR archive, decompressing these requires even more space (4 TB). Reproducing By using our experiment tool, you can easily reproduce the experiments as followed: Download one of the TAR.XZ archives and unpack them. Clone the GitHub repository of our experiment tool and install the Python dependencies with 'pip install -r requirements.txt'. Download the rmlmapper.jar JAR file from this Zenodo dataset and place it inside the experiment tool root folder. Execute the tool by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive --runs=5 run'. The argument '--runs=5' is used to perform the experiment 5 times. Once executed, you can generate the statistics by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive stats'. Testcases Testcases to verify the integration of RML and LDES with IncRML, see https://doi.org/10.5281/zenodo.10171394
创建时间:
2024-12-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作