five

BPI Challenge 2019

收藏
DataCite Commons2025-05-01 更新2024-08-31 收录
下载链接:
https://data.4tu.nl/articles/_/12715853/1
下载链接
链接失效反馈
官方服务:
资源简介:
This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents. The event log is fully IEEE-XES compliant and is structured as follows. The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

本数据集源自一家总部位于荷兰、主营涂料业务的大型跨国企业,邀请参与者对其60余家子公司的部分采购订单处理流程开展研究,流程负责人提出了合规性相关疑问。每份采购订单(或采购凭证)包含一个或多个行项目,每个行项目对应数据中的四类典型流转流程: 1. 三向匹配(货到后开票):此类行项目需将收货报文的金额与发票报文金额,以及行项目创建时的金额进行比对(需同时将基于收货的发票校验(GR-based)标志和收货标志设为真)。 2. 三向匹配(货到前开票):此类采购行项目需要收货报文,但无需基于收货的发票校验(GR-based IV标志设为假,收货标志设为真)。此类采购项可在货物收货前录入发票,但发票会被冻结直至货物收货完成。解冻操作可由人工执行,或通过定期批量作业完成。仅当货物已收货且发票金额与收货金额、行项目创建时的金额均匹配时,方可完成发票核销。 3. 两向匹配(无需收货):此类行项目仅需发票金额与行项目创建时的金额匹配(全额或部分匹配,直至采购订单金额耗尽),无需单独的收货报文(GR-based标志和收货标志均设为假)。 4. 寄售模式:此类行项目无采购订单层级的发票,相关流程由独立流程完全处理。此类行项目的收货标志设为真,但GR-IV标志设为假,且通过行项目类型(寄售)可知无需针对该项目开具发票。 本数据集的复杂度远超上述四类划分范畴。每个采购行项目可能对应多条收货报文及后续对应的多张已支付发票。例如房租支付流程:某采购凭证包含一条房租支付行项目,但对应12条收货报文及已核销的发票,每张发票金额为总金额的1/12;对于物流服务,单个行项目甚至可能对应数百条收货报文。总体而言,为确保流程合规,每个行项目的金额、收货报文(如适用)及发票金额必须匹配。 本日志已完成匿名化处理,但保留部分语义信息:例如,用户分为批量用户与普通用户,以其名称标识。批量用户指由不同系统执行的自动化流程;普通用户指流程中的人工操作者。数据中的货币金额通过保留0值的线性变换进行匿名化,即单个行项目的多张发票总金额应与原项目总金额一致(因数值计算可能存在微小舍入误差)。企业、供应商、系统及凭证名称与ID均在日志中以统一方式完成匿名化。该企业拥有密钥,因此可将分析结果还原为针对真实客户与真实采购凭证的业务洞察。 本事件日志完全符合IEEE-XES标准,结构如下: 案例ID由采购凭证与采购行项目组合而成。本次数据集共包含76349份采购凭证,总计251734个行项目,即共251734个案例。这些案例中共包含1595923个事件,对应42类活动,由627名用户执行(其中607名为人工用户,20名为批量用户)。部分场景下用户字段为空或为"NONE",表示源系统未记录相关用户。 针对每个采购行项目(或案例),记录以下属性: - "concept:name":采购凭证ID与行项目ID的组合 - "Purchasing Document":采购凭证ID - "Item":行项目ID - "Item Type":行项目类型 - "GR-Based Inv. Verif.":标识是否需要基于收货的发票校验(详见前文) - "Goods Receipt":标识是否需要三向匹配(详见前文) - "Source":该采购项的源系统 - "Doc. Category name":采购凭证类别的名称 - "Company":发起采购的企业子公司 - "Spend classification text":说明采购行项目类别的文本 - "Spend area text":说明采购行项目所属领域的文本 - "Sub spend area text":进一步说明采购行项目所属领域的文本 - "Vendor":采购凭证对应的供应商 - "Name":供应商名称 - "Document Type":凭证类型 - "Item Category":行项目类别(如前文所述的:带基于收货发票校验的三向匹配、不带基于收货发票校验的三向匹配、两向匹配、寄售模式)
提供机构:
4TU.Centre for Research Data
创建时间:
2019-01-31
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
BPI Challenge 2019数据集是一个匿名化的业务流程智能事件日志数据集,包含251,734个采购订单案例和1,595,923个事件,用于研究采购流程的合规性问题。数据集覆盖2018年的数据,采用CC BY 4.0许可,适用于流程挖掘和合规性检查研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作