iMAKS (industrial Multi-Agent Knowledge extraction Synthetic dataset)

Name: iMAKS (industrial Multi-Agent Knowledge extraction Synthetic dataset)
Creator: Zenodo
Published: 2026-05-04 13:28:15
License: 暂无描述

DataCite Commons2026-05-04 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.19519975

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset named iMAKS (industrial Multi-Agent Knowledge extraction Synthetic dataset) supports the evaluation of a multi-agent Knowledge Graph extraction pipeline and an agentic Digital Twin applied to an industrial manufacturing facility. 1 - Context Modern industrial facilities generate knowledge in at least four distinct forms simultaneously. Sensors stream continuous numerical measurements. Standard Operating Procedures encode alarm thresholds, maintenance rules, and access policies in natural language documents. Manufacturer datasheets specify physical limits that may or may not align with in-house procedures. And the interactions among these sources, a current drift at one station that correlates with a speed reduction at another ninety minutes later, linking three independent documents, constitute a fourth, more elusive kind of knowledge that no single source captures on its own. iMAKS was built to make this complexity tractable for evaluation. It simulates five days of operations at a nine-station food packaging facility — Production Line A — generating a dataset that spans all four knowledge forms and provides exact, triple-level ground truth for each one. The facility is synthetic but physically plausible: stations are thermally and electrically coupled the way real production lines are, sensor noise follows autoregressive models calibrated to the inertial properties of each sensor class, and anomaly events are distributed at densities consistent with industrial alarm management standards. 1.1 The Facility The core of the facility is a four-station production chain in the Production Area, connected in sequence: ST01_FILLING → ST02_SEALING → ST03_LABELLING → ST04_PACKAGING Liquid product is filled at ST01, thermally sealed at ST02, labelled at ST03, and boxed and conveyed at ST04. The four stations are physically and electrically coupled: a fault at ST02 propagates downstream to ST04 within 90 minutes. This coupling is the basis of the dataset’s central evaluation challenge (GT-0009). Five additional stations monitor auxiliary zones: SRV01_SERVERROOM (IT infrastructure), WRH01_WAREHOUSE (cold storage, 4°C), CHM01_CHEMICALSTORAGE (hazardous materials), RND01_RDLAB (precision climate, R&D), CAF01_CAFETERIA (occupant comfort). 44 workers (20 operators, 10 technicians, 8 supervisors, 4 managers, 2 security) are distributed across 7 zones in balanced shifts: 22 Morning / 22 Afternoon. Access control rules are encoded in SOP-004. 1.2 Dataset Statistics Metric Value Simulation period 5 days (2026-01-06 to 2026-01-10) Sampling cadence 30 seconds Stations / Zones / Sensors 9 / 7 / 22 Workers (Morning / Afternoon) 44 (22 / 22) Sensor timeseries rows 211,200 GT anomaly events (density) 14 (0.691%, ISA-18.2 compliant) Dev set / Test set 9 events (days 1–3) / 5 events (days 4–5) MQTT IoT payloads 21,600 SOP PDF documents 4 Sensor datasheet PDFs 3 WiFi CSI files 1,100 Overcrowding events ~50 (3 fixed + 47 random) Unauthorised access events 8 SafetyEvent instances (FALL / IMMOBILITY) 10 (6 / 4) KG seed nodes / edges 115 / 341 Coherence checks 30/30 1.3 Layer Architecture Layer 1 — Process Monitoring. 211,200 sensor readings at 30-second cadence from 22 sensors across 9 stations. Four SOP documents encode all extractable knowledge. Layer 2 — Human Presence. 44 workers at 30-second cadence. Includes ~50 overcrowding events, 8 unauthorised access events, and 10 SafetyEvent instances (6 FALL, 4 IMMOBILITY) across all 7 zones. Layer 3 — WiFi CSI Biometric. 1,100 synthetic CSV files, 44 subjects, 5 gesture types (walking, falling, picking, sit_stand, standing), 128 channels at ~60 Hz. Per-person body parameters create stable biometric signatures. Layer 4 — Sensor Datasheets. Three manufacturer PDFs, each containing one quantitative value that conflicts with a threshold in SOP-002, forming controlled inter-document tensions for conflict resolution evaluation. 2 Ground truth and ABox / anomalies Two ground truth files, two tasks: - ground_truth.csv (86 rules) is the reference for the rule extraction task — evaluating whether an LLM correctly extracts operational rules from SOP text. - nodes.csv + edges.csv are the KG ABox — the graph structure of the facility (sensors, stations, events, persons). The two representations are complementary, not redundant: ground_truth.csv encodes extractable rule semantics from SOP text; nodes.csv/edges.csv encode entity and event structure for graph-based reasoning. MAINT-01..08 appear in both files with different fields. The primary evaluation reference for rule extraction is ground_truth.csv, a manually annotated file of 86 operational rules across four semantic classes. These rules represent all extractable knowledge from the four SOP documents, including access control and occupancy rules which are equally valid extraction targets from natural language text. On AccessRule representation: in the KG seed (nodes.csv + edges.csv), access permissions are encoded as authorized_for edges (218 total) rather than as rule nodes. Both representations are correct at their respective layers. The 20 AccessRule in ground_truth.csv are included as first-class evaluation targets because LLM extraction from SOP text should be evaluated on all rule types present in the source documents. 2.1 Rules Ground Truth Table 1 — Ground truth distribution by rule class and source SOP Rule class SOP-001 SOP-002 SOP-003 SOP-004 Total OperationalRule 28 — — — 28 ThresholdRule — 27 — — 27 MaintenanceRule — — 11 — 11 AccessRule 1 * — — 19 20 Total 29 27 11 19 86 * RULE-CHM01-03 appears in SOP-001 but is classified as AccessRule (chemical storage access restriction). 2.2 Rule schema — 14 fields ruleId — unique identifier (e.g. RULE-ST01-01, RULE-THR-ST01-TMP-CRIT, MAINT-01, RULE-ACCESS-01) class — OperationalRule | ThresholdRule | MaintenanceRule | AccessRule station — canonical station ID (e.g. ST01_FILLING, SRV01_SERVERROOM) sensor — sensor ID following pattern STATION_TYPE (e.g. ST01_FILLING_TMP) sensorType — measurement type code (TMP, PRS, FLW, CUR, VIB, SPD, HUM, TEN, CNT) condition — operational trigger condition (text extracted or inferred from SOP) action — response action when condition fires severity — CRITICAL | WARNING | MANDATORY | HIGH | MEDIUM | LOW critHi, warnHi, warnLo, critLo — numeric thresholds (ThresholdRule only, empty otherwise) unit — measurement unit (°C, bar, L/min, A, mm/s, ...) source — SOP document of origin (SOP-001 to SOP-004) 31 of the 86 rules have a ruleId explicitly written in the SOP text (explicit). The remaining 55 are implicit — the ruleId is an annotator convention not present in the text. All 27 ThresholdRule are implicit. This distinction is relevant for F1_strict: no LLM can exceed 31 TP_strict regardless of extraction quality, because the other 55 ruleIds are not present in the source text. 2.3 Operational state Ground Truth The Knowledge graph ABox (dataset/kg_seed/nodes.csv + edges.cs ) contains 115 nodes and 341 edges representing the operational state of the Digital Twin with concrete events that occurred during the 4-day snapshot (6–9 January 2026). Node label Count Person 44 Sensor 22 AnomalyEvent 14 SafetyEvent 10 Component 9 Maintenance 8 Zone 7 System 1 Total 115 Edge type Count authorized_for 218 monitors 44 triggers 22 contains 16 involves / detected_in 20 part_of / resolves 17 feeds_into 3 correlates_with 1 Total 341 2.3.1 Anomalies 14 anomaly events are injected deterministically. Development set — days 1–3 (GT-0001 to GT-0009, all 5 anomaly types present): ID Day Type Severity Sensor Magnitude Duration GT-0001 1 SPIKE CRITICAL ST02_SEALING_TMP +28.0 °C 3 min GT-0002 1 DRIFT WARNING ST01_FILLING_PRS +0.60 bar 45 min GT-0003 2 DRIFT WARNING ST04_PACKAGING_VIB +0.18 mm/s 90 min GT-0004 2 STUCK WARNING ST03_LABELLING_TEN — 12 min GT-0005 2 OUT_OF_RANGE CRITICAL SRV01_SERVERROOM_TMP +5.5 °C 20 min GT-0006 3 OUT_OF_RANGE WARNING CAF01_CAFETERIA_TMP +5.5 °C 30 min GT-0007 4 SPIKE WARNING ST01_FILLING_FLW −30.0 L/min 2 min GT-0008 4 DRIFT WARNING ST02_SEALING_CUR +2.1 A 120 min GT-0009 4 CORRELATED WARNING ST04_PACKAGING_SPD −0.15 m/s 60 min Test set — days 4–5 (GT-0010 to GT-0014, auxiliary zones): ID Day Type Severity Sensor Magnitude Duration GT-0010 5 OUT_OF_RANGE CRITICAL ST03_LABELLING_CNT −18 pcs/min 25 min GT-0011 5 DRIFT CRITICAL WRH01_WAREHOUSE_TMP +6.0 °C 180 min GT-0012 5 SPIKE CRITICAL ST04_PACKAGING_VIB +0.45 mm/s 1 min GT-0013 5 OUT_OF_RANGE CRITICAL CHM01_CHEMICALSTORAGE_TMP +7.0 °C 45 min GT-0014 5 DRIFT WARNING RND01_RDLAB_HUM +18.0 %RH 90 min 2.3.2 Anomalies types: SPIKE — Gaussian-shaped transient, peak in first 20% of window, duration < 5 min. Challenge: short duration, easily missed at 30-second cadence. DRIFT — Monotonic linear ramp across the full event window. Challenge: gradual onset, threshold crossing delayed relative to root cause. STUCK — Value frozen at window onset. Challenge: zero variance, indistinguishable from constant process without historical context. OUT_OF_RANGE — Fixed sustained offset above WARN_HI or below WARN_LO. Challenge: persistent violation requiring temporal windowing to avoid false positives. CORRELATED — Causal deviation linked to another sensor with a fixed temporal lag. Challenge: cannot be detected from a single sensor or single document. GT-0009 — The correlated event (primary evaluation discriminator) GT-0009 is the only CORRELATED event and the central scientific challenge of iMAKS. It represents a speed reduction at ST04_PACKAGING caused by a current drift at ST02_SEALING through shared electrical coupling, with a 90-minute temporal lag. This triple cannot be extracted from any single source. It requires explicit three-way evidence fusion: (1) a temporal co-occurrence in the timeseries between ST02_SEALING_CUR DRIFT and ST04_PACKAGING_SPD reduction; (2) RULE-ST02-04 in SOP-001 §4.2 asserting the causal correlation; (3) the fault procedure in SOP-003 §4 naming the ST02→ST04 propagation pattern. No single source is individually sufficient. A pipeline that correctly extracts this edge has demonstrated architecture-level multi-source fusion capability. This binary result is evaluated separately from aggregate F1. 3 Evaluation Methodology 3.1 Two-phase evaluation protocol Table 2 — Two-phase evaluation protocol Phase Question Reference file Metrics Phase 1Extraction Did the LLM extract the correct rules from the SOP PDFs? ground_truth.csv86 annotated rules(28 OP + 27 THR + 11 MNT + 20 ACC) Precision, Recall, F1_content per rule class.F1_content: Hungarian assignment + SBERT (all-MiniLM-L6-v2, θ=0.6). Phase 2Coverage Do the extracted rules cover the events that actually occurred? nodes.csv + edges.csv14 AnomalyEvent + 8 Maintenance COVERED / GAP per ABox event.Coverage fraction — min 70%.GT-0009 binary (multi-source fusion) 3.2 Phase 1 metrics F1_strict: exact ruleId match after normalisation (punctuation and capitalisation removed). RULE-ST01-01 == rule_st01_01. Hard ceiling at 31 TP due to implicit rules. F1_fuzzy: adds token overlap on the ruleId. Captures approximate naming. Shares the limitation of F1_strict: ignores field content entirely. F1_content (primary metric): Hungarian assignment matches GT rows to extracted rows by content score, independent of ruleId. Score per pair is a class-weighted average of: (a) exact match on categorical fields (class, station, sensor, severity); (b) tolerance score on numeric fields: 1 − |gt−llm| / |gt|; (c) SBERT cosine similarity on text fields (condition, action), model all-MiniLM-L6-v2. A pair is a TP if aggregated score ≥ 0.6. 3.3 Phase 2 metrics Phase 2 asks whether the extracted rules cover the events that actually occurred in the simulated facility. For each AnomalyEvent and Maintenance node in the ABox, a COVERED/GAP classification is assigned based on whether at least one extracted rule is linked via COVERS or VIOLATES edges in Neo4j. GT-0009 is evaluated separately as a binary pass/fail. GT-0009 is the only CORRELATED event and the central scientific challenge of iMAKS. It represents a speed reduction at ST04_PACKAGING caused by a current drift at ST02_SEALING through shared electrical coupling, with a 90-minute temporal lag.This triple cannot be extracted from any single source. It requires explicit three-way evidence fusion: (1) temporal co-occurrence in the timeseries between ST02_SEALING_CUR DRIFT and ST04_PACKAGING_SPD reduction; (2) RULE-ST02-04 in SOP-001 §4.2 asserting the causal correlation; (3) the fault procedure in SOP-003 §4 naming the ST02→ST04 propagation pattern. No single source is individually sufficient. A pipeline that correctly extracts this edge has demonstrated architecture-level multi-source fusion capability. This binary result is reported separately from aggregate F1. 4 Dataset Possible Uses Multi-agent KG extraction evaluation — primary use. Run an extraction pipeline on the four SOP PDFs and compare against the ground truth Industrial timeseries anomaly detection — a standalone benchmark. 22 sensors, 5 anomaly types, 14 labelled events. Human activity recognition from WiFi CSI — Layer 3 (1,100 files, 44 subjects, 5 gesture types) is a standalone HAR benchmark. Digital Twin knowledge population — KG seed loads directly into Neo4j as the initial state of a factory digital twin. MQTT payloads simulate real-time telemetry. Access control anomaly detection — 8 controlled unauthorised access events and ~50 overcrowding events across 7 zones. Conflict resolution in knowledge engineering — 3 controlled inter-document tensions in Layer 4 for systems that must represent conflicting claims with provenance metadata. 5 File Structure imaks_v2026/├── dataset/│ ├── rules/ ← SOP PDFs (TBox source)│ │ ├── SOP_001_OperatingProcedures.pdf│ │ ├── SOP_002_AlarmThresholds.pdf│ │ ├── SOP_003_MaintenanceRules.pdf│ │ └── SOP_004_PersonnelZoneAccess.pdf│ ├── kg_seed/│ │ ├── ground_truth.csv ← TBox ground truth: 86 rules (Phase 1 ref)│ │ ├── nodes.csv ← ABox: 115 nodes, 5 operating days (Phase 2 ref)│ │ └── edges.csv ← ABox: 341 edges│ ├── sensors/ ← timeseries_raw.csv, timeseries_annotated.csv, ground_truth_labels.csv, mqtt_payloads.json│ ├── human/ ← person_registry.csv, occupancy_timeseries.csv, access_events.csv, alarm_response_log.csv│ ├── csi/ ← 1,100 WiFi CSI files (44 subjects × 5 gestures × 5 reps)│ └── datasheets/ ← 3 sensor manufacturer PDFs with controlled inter-document tensions Quick Start pip install langgraph langchain-ollama pymupdf pydantic tqdm ollama serve && ollama pull qwen2.5:14b python3 pipeline/validate_pipeline.py python3 pipeline/imaks_pipeline.py

提供机构：

Zenodo

创建时间：

2026-04-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集