five

VynFi/ocel-manufacturing

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/VynFi/ocel-manufacturing
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - tabular-classification tags: - synthetic - financial-data - vynfi - process-mining - ocel - manufacturing size_categories: - 10K<n<100K configs: - config_name: events data_files: "events/*.parquet" - config_name: objects data_files: "objects/*.parquet" - config_name: journal_entries data_files: "journal_entries/*.parquet" - config_name: anomaly_labels data_files: "anomaly_labels/*.parquet" --- # VynFi ocel-manufacturing Regenerated with **DataSynth 3.1.1** (2026-04-19). What's new vs. prior releases: - **Behavioral fraud biases now fire** on every `is_fraud` path — weekend ×32, round-dollar ×170, post-close ×3,106 lift measured on fraud-labeled entries (vs ~1× pre-3.1.1). - **Document→JE fraud propagation** correctly sets `is_fraud_propagated` + `fraud_source_document_id` (was broken in 3.1.0, now verified on every doc-flow JE). - **AML typology coverage** reaches the 0.80 evaluator threshold (0.000 → 0.857). - **OCEL timestamps** are now microsecond-precision — pandas `to_datetime(..., utc=True)` retains 100 % of events (was losing 95 %). - **Audit artifacts** (`audit/audit_opinions.json`, `audit/key_audit_matters.json`) + `process_variant_summary.json` always ship in the archive. ## Configs | Config | Records | |--------|---------| | `events` | 37,285 | | `objects` | 9,713 | | `journal_entries` | 90,834 | | `anomaly_labels` | 112 | ## Fraud breakdown (DS 3.1.1) - **Total fraud-labeled JEs:** 222 - **Scheme-propagated** (from fraudulent source documents): 18 (8.1%) - **Direct injection** (line-level anomaly): 204 `is_fraud_propagated` is set by DataSynth 3.1+ when a fraudulent source document fans out to its derived journal entries. Use this to split ring-level (cross-document scheme) from slip-level (isolated anomaly) fraud populations when training detection models. ## Quick start ```python from datasets import load_dataset ds = load_dataset("VynFi/ocel-manufacturing", name="events", split="train") print(ds.features) print(ds[0]) ``` Or via the VynFi Python SDK (v1.5.1): ```python import os from vynfi import VynFi client = VynFi(api_key=os.environ["VYNFI_API_KEY"]) job = client.jobs.generate_config(config={...}) # see https://github.com/VynFi/VynFi-python/tree/main/examples ``` See the SDK cookbook for worked examples: - `examples/document_level_fraud.py` - `examples/behavioral_fraud_patterns.py` - `examples/sector_dag_presets.py` - `examples/audit_opinions_kam.py` ## License Apache 2.0. Entirely synthetic — no real individuals, companies, or transactions.
提供机构:
VynFi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作