VynFi/ocel-manufacturing
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/VynFi/ocel-manufacturing
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- tabular-classification
tags:
- synthetic
- financial-data
- vynfi
- process-mining
- ocel
- manufacturing
size_categories:
- 10K<n<100K
configs:
- config_name: events
data_files: "events/*.parquet"
- config_name: objects
data_files: "objects/*.parquet"
- config_name: journal_entries
data_files: "journal_entries/*.parquet"
- config_name: anomaly_labels
data_files: "anomaly_labels/*.parquet"
---
# VynFi ocel-manufacturing
Regenerated with **DataSynth 3.1.1** (2026-04-19). What's new vs. prior releases:
- **Behavioral fraud biases now fire** on every `is_fraud` path — weekend ×32, round-dollar ×170, post-close ×3,106 lift measured on fraud-labeled entries (vs ~1× pre-3.1.1).
- **Document→JE fraud propagation** correctly sets `is_fraud_propagated` + `fraud_source_document_id` (was broken in 3.1.0, now verified on every doc-flow JE).
- **AML typology coverage** reaches the 0.80 evaluator threshold (0.000 → 0.857).
- **OCEL timestamps** are now microsecond-precision — pandas `to_datetime(..., utc=True)` retains 100 % of events (was losing 95 %).
- **Audit artifacts** (`audit/audit_opinions.json`, `audit/key_audit_matters.json`) + `process_variant_summary.json` always ship in the archive.
## Configs
| Config | Records |
|--------|---------|
| `events` | 37,285 |
| `objects` | 9,713 |
| `journal_entries` | 90,834 |
| `anomaly_labels` | 112 |
## Fraud breakdown (DS 3.1.1)
- **Total fraud-labeled JEs:** 222
- **Scheme-propagated** (from fraudulent source documents): 18 (8.1%)
- **Direct injection** (line-level anomaly): 204
`is_fraud_propagated` is set by DataSynth 3.1+ when a fraudulent source
document fans out to its derived journal entries. Use this to split
ring-level (cross-document scheme) from slip-level (isolated anomaly)
fraud populations when training detection models.
## Quick start
```python
from datasets import load_dataset
ds = load_dataset("VynFi/ocel-manufacturing", name="events", split="train")
print(ds.features)
print(ds[0])
```
Or via the VynFi Python SDK (v1.5.1):
```python
import os
from vynfi import VynFi
client = VynFi(api_key=os.environ["VYNFI_API_KEY"])
job = client.jobs.generate_config(config={...}) # see https://github.com/VynFi/VynFi-python/tree/main/examples
```
See the SDK cookbook for worked examples:
- `examples/document_level_fraud.py`
- `examples/behavioral_fraud_patterns.py`
- `examples/sector_dag_presets.py`
- `examples/audit_opinions_kam.py`
## License
Apache 2.0. Entirely synthetic — no real individuals, companies, or transactions.
提供机构:
VynFi



