five

Data underlying the publication: A Ground Truth Approach for Assessing Process Mining Techniques

收藏
4TU.ResearchData2025-02-04 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/bc43e334-74e1-44ff-abf1-ed32847250c9/1
下载链接
链接失效反馈
官方服务:
资源简介:
This folder contains the synthetically generated dataset (process model and event logs) containing process data of a synthetically designed package delivery process, as described in [1]. The event logs present simulations of a process model, each with an incorporated issue, be it a behavioral deviation, i.e., where the process is differently exhibited with regard to the expected behavior described by the process model, or a recording error, i.e., where the execution of the process is recorded differently with regard to how it is exhibited. Each issue is added to the process model through a model transformation providing ground truth to the discrepancies introduced in the simulated event log.<br>The package delivery process starts with the choice of home or depot delivery, after which the package queues for a warehouse employee to pick and load it into a van. In case of home delivery, a courier drives off and rings a door after which he continues to either immediately hand over the package, or deliver it at the corresponding depot after registration, where it is left for collection. Alternatively, for depot delivery, "ringing" and therefore also "deliver at home" is omitted in the subprocess.models/delivery_base_model.json contains the specification of the process model that incorporates this "expected behavior", and is depicted in models/delivery_base_model.pdf.<br>On top of this, six patterns of behavioral deviations (BI) and six patterns of recording errors (RI) are applied to the base model:BI5: Overtaking in the FIFO queue for picking packages;BI7: Switching roles from a courier to that of a warehouse employee;BI10: Batching is ignored, leaving with a delivery van before it was fully loaded;BI3: Skipping the activity of ringing, modeling behavior where e.g., the door was already opened upon arrival;BI9: Different resource memory where the package is delivered to a different depot than where it is registered;BI2: Multitasking of couriers during the delivery of multiple packages, modeling interruption of a delivery;RI1: Incorrect event, recording an order for depot delivery when it was intended for home delivery;RI2: Incorrect event, vice versa, i.e., recording an order for home delivery when it was intended for depot delivery;RI3: Missing event for the activity of loading a package in a truck;RI4: Missing object of the involved van for loading, e.g., due to a temporary connection failure of a recording device;RI5: Incorrect object of the involved courier when ringing, e.g., due to not logging out by the courier on the previous shift;RI6: Missing positions for the recording of the delivery and the collection at a depot, e.g., due to coarse timestamp logging.<br>The behavior of each deviation pattern is added separately to the base model, resulting in twelve process models, accordingly named models/package_delivery_&lt;deviation&gt;.json.Each model is simulated resulting in twelve logs, accordingly named logs/package_delivery_&lt;deviation&gt;.json. Each log is a partially ordered set of transition firings, of which the elements are denoted by the list M, with partial order relation as specified by the matrix r, such that r[i][j] = 1 iff M[I] &lt; M[j]. A transition firing in M is formatted as follows: [transition_name, transition_label, binding, subtracted_marking, added_marking, timestamp]. Note that the log is composed of the labels of only the labeled transition firings, i.e., with transition_label != null. However, having the complete execution of the process model with transition names provides ground truth to which issues are introduced in the simulated event log.<br>All models and corresponding generated logs with the applied patterns are also available at gitlab.com/dominiquesommers/mira/-/tree/main/mira/simulation, which additionally includes scripts to load and process the data.<br>We refer to [1] for more information on the dataset.<br>[1] Dominique Sommers, Natalia Sidorova, Boudewijn F. van Dongen. A ground truth approach for assessing process mining techniques. arXiv preprint, https://doi.org/10.48550/arXiv.2501.14345, 2025.

本文件夹包含由[1]中所述的合成生成的数据集(流程模型(process model)与事件日志(event logs)),该数据集涵盖了人工设计的包裹递送流程的流程数据。 事件日志为流程模型的仿真结果,每个日志均包含一种预设问题:要么是行为偏差(behavioral deviation),即流程的实际执行与流程模型定义的预期行为不符;要么是记录错误(recording error),即流程的实际执行情况与记录的内容不一致。所有问题均通过模型变换添加至基础流程模型中,为仿真事件日志中引入的偏差提供真实基准(ground truth)。 包裹递送流程始于选择送货上门或送货至自提点,随后包裹将排队等待仓库员工取件并装载至配送货车。若选择送货上门,配送员将驾车出发并按响门铃,之后可选择两种操作:要么直接将包裹交付给收件人,要么在完成登记后将包裹送至对应的自提点留存待取。若选择送货至自提点,则子流程中将省略按门铃以及送货上门的环节。models/delivery_base_model.json文件包含了体现该“预期行为”的流程模型规范,其可视化图示见models/delivery_base_model.pdf。 在此基础上,我们将6种行为偏差模式(BI)与6种记录错误模式(RI)应用于基础流程模型: BI5:包裹取件先进先出(FIFO)队列中的插队行为; BI7:配送员与仓库员工的角色互换; BI10:忽略批量装载流程,在货车未完全装满的情况下发车; BI3:省略按门铃环节,模拟如收件人家门已敞开的场景; BI9:资源记忆偏差,将包裹送至与登记地址不符的自提点; BI2:配送员同时处理多件包裹的多任务行为,模拟配送任务被中断的场景; RI1:事件记录错误,将原本应为送货上门的订单记录为送货至自提点; RI2:事件记录错误,与RI1相反,即将原本应为送货至自提点的订单记录为送货上门; RI3:缺失货车装载包裹的活动记录; RI4:缺失配送货车的关联对象记录,例如因记录设备临时连接故障导致; RI5:按门铃环节的关联配送员记录错误,例如因上一班次配送员未登出系统所致; RI6:缺失送货及自提点取件的位置记录,例如因时间戳记录精度不足所致。 每种偏差模式均单独添加至基础流程模型,共生成12个流程模型,命名格式为models/package_delivery_<deviation>.json。对每个模型进行仿真后,共得到12份事件日志,命名格式为logs/package_delivery_<deviation>.json。每份日志由变迁触发(transition firing)的偏序集(partially ordered set)构成,其元素由列表M表示,偏序关系由矩阵r定义:当且仅当M[i] < M[j]时,r[i][j] = 1。列表M中的变迁触发格式如下:[变迁名称(transition_name), 变迁标签(transition_label), 绑定(binding), 移除标记(subtracted_marking), 添加标记(added_marking), 时间戳(timestamp)]。需注意,日志仅包含带有非空标签的变迁触发对应的标签,即transition_label ≠ null。但完整的流程模型执行记录(包含变迁名称)可为仿真事件日志中引入的具体问题提供真实基准。 所有应用了上述模式的流程模型与生成的事件日志,以及用于加载和处理该数据集的脚本,均可在gitlab.com/dominiquesommers/mira/-/tree/main/mira/simulation获取。 有关该数据集的更多信息,请参见[1]。 [1] Dominique Sommers, Natalia Sidorova, Boudewijn F. van Dongen. 用于评估流程挖掘技术的真实基准方法. arXiv预印本, https://doi.org/10.48550/arXiv.2501.14345, 2025.
创建时间:
2025-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作