iMAKS (industrial Multi-Agent Knowledge extraction Synthetic dataset)
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19519975
下载链接
链接失效反馈官方服务:
资源简介:
This dataset named iMAKS (industrial Multi-Agent Knowledge extraction Synthetic dataset) supports the evaluation of a multi-agent Knowledge Graph extraction pipeline and an agentic Digital Twin applied to an industrial manufacturing facility.
1 - Context
Modern industrial facilities generate knowledge in at least four distinct forms simultaneously. Sensors stream continuous numerical measurements. Standard Operating Procedures encode alarm thresholds, maintenance rules, and access policies in natural language documents. Manufacturer datasheets specify physical limits that may or may not align with in-house procedures. And the interactions among these sources, a current drift at one station that correlates with a speed reduction at another ninety minutes later, linking three independent documents, constitute a fourth, more elusive kind of knowledge that no single source captures on its own.
iMAKS was built to make this complexity tractable for evaluation. It simulates five days of operations at a nine-station food packaging facility — Production Line A — generating a dataset that spans all four knowledge forms and provides exact, triple-level ground truth for each one. The facility is synthetic but physically plausible: stations are thermally and electrically coupled the way real production lines are, sensor noise follows autoregressive models calibrated to the inertial properties of each sensor class, and anomaly events are distributed at densities consistent with industrial alarm management standards.
1.1 The Facility
The core of the facility is a four-station production chain in the Production Area, connected in sequence:
ST01_FILLING → ST02_SEALING → ST03_LABELLING → ST04_PACKAGING
Liquid product is filled at ST01, thermally sealed at ST02, labelled at ST03, and boxed and conveyed at ST04. The four stations are physically and electrically coupled: a fault at ST02 propagates downstream to ST04 within 90 minutes. This coupling is the basis of the dataset’s central evaluation challenge (GT-0009).
Five additional stations monitor auxiliary zones:
SRV01_SERVERROOM (IT infrastructure), WRH01_WAREHOUSE (cold storage, 4°C), CHM01_CHEMICALSTORAGE (hazardous materials), RND01_RDLAB (precision climate, R&D), CAF01_CAFETERIA (occupant comfort).
44 workers (20 operators, 10 technicians, 8 supervisors, 4 managers, 2 security) are distributed across 7 zones in balanced shifts: 22 Morning / 22 Afternoon. Access control rules are encoded in SOP-004.
1.2 Dataset Statistics
Metric
Value
Simulation period
5 days (2026-01-06 to 2026-01-10)
Sampling cadence
30 seconds
Stations / Zones / Sensors
9 / 7 / 22
Workers (Morning / Afternoon)
44 (22 / 22)
Sensor timeseries rows
211,200
GT anomaly events (density)
14 (0.691%, ISA-18.2 compliant)
Dev set / Test set
9 events (days 1–3) / 5 events (days 4–5)
MQTT IoT payloads
21,600
SOP PDF documents
4
Sensor datasheet PDFs
3
WiFi CSI files
1,100
Overcrowding events
~50 (3 fixed + 47 random)
Unauthorised access events
8
SafetyEvent instances (FALL / IMMOBILITY)
10 (6 / 4)
KG seed nodes / edges
115 / 341
Coherence checks
30/30
1.3 Layer Architecture
Layer 1 — Process Monitoring. 211,200 sensor readings at 30-second cadence from 22 sensors across 9 stations. Four SOP documents encode all extractable knowledge.
Layer 2 — Human Presence. 44 workers at 30-second cadence. Includes ~50 overcrowding events, 8 unauthorised access events, and 10 SafetyEvent instances (6 FALL, 4 IMMOBILITY) across all 7 zones.
Layer 3 — WiFi CSI Biometric. 1,100 synthetic CSV files, 44 subjects, 5 gesture types (walking, falling, picking, sit_stand, standing), 128 channels at ~60 Hz. Per-person body parameters create stable biometric signatures.
Layer 4 — Sensor Datasheets. Three manufacturer PDFs, each containing one quantitative value that conflicts with a threshold in SOP-002, forming controlled inter-document tensions for conflict resolution evaluation.
2 Ground truth and ABox / anomalies
Two ground truth files, two tasks:
- ground_truth.csv (86 rules) is the reference for the rule extraction task — evaluating whether an LLM correctly extracts operational rules from SOP text.
- nodes.csv + edges.csv are the KG ABox — the graph structure of the facility (sensors, stations, events, persons).
The two representations are complementary, not redundant: ground_truth.csv encodes extractable rule semantics from SOP text; nodes.csv/edges.csv encode entity and event structure for graph-based reasoning. MAINT-01..08 appear in both files with different fields.
The primary evaluation reference for rule extraction is ground_truth.csv, a manually annotated file of 86 operational rules across four semantic classes. These rules represent all extractable knowledge from the four SOP documents, including access control and occupancy rules which are equally valid extraction targets from natural language text.
On AccessRule representation: in the KG seed (nodes.csv + edges.csv), access permissions are encoded as authorized_for edges (218 total) rather than as rule nodes. Both representations are correct at their respective layers. The 20 AccessRule in ground_truth.csv are included as first-class evaluation targets because LLM extraction from SOP text should be evaluated on all rule types present in the source documents.
2.1 Rules Ground Truth
Table 1 — Ground truth distribution by rule class and source SOP
Rule class
SOP-001
SOP-002
SOP-003
SOP-004
Total
OperationalRule
28
—
—
—
28
ThresholdRule
—
27
—
—
27
MaintenanceRule
—
—
11
—
11
AccessRule
1 *
—
—
19
20
Total
29
27
11
19
86
* RULE-CHM01-03 appears in SOP-001 but is classified as AccessRule (chemical storage access restriction).
2.2 Rule schema — 14 fields
ruleId — unique identifier (e.g. RULE-ST01-01, RULE-THR-ST01-TMP-CRIT, MAINT-01, RULE-ACCESS-01)
class — OperationalRule | ThresholdRule | MaintenanceRule | AccessRule
station — canonical station ID (e.g. ST01_FILLING, SRV01_SERVERROOM)
sensor — sensor ID following pattern STATION_TYPE (e.g. ST01_FILLING_TMP)
sensorType — measurement type code (TMP, PRS, FLW, CUR, VIB, SPD, HUM, TEN, CNT)
condition — operational trigger condition (text extracted or inferred from SOP)
action — response action when condition fires
severity — CRITICAL | WARNING | MANDATORY | HIGH | MEDIUM | LOW
critHi, warnHi, warnLo, critLo — numeric thresholds (ThresholdRule only, empty otherwise)
unit — measurement unit (°C, bar, L/min, A, mm/s, ...)
source — SOP document of origin (SOP-001 to SOP-004)
31 of the 86 rules have a ruleId explicitly written in the SOP text (explicit). The remaining 55 are implicit — the ruleId is an annotator convention not present in the text. All 27 ThresholdRule are implicit. This distinction is relevant for F1_strict: no LLM can exceed 31 TP_strict regardless of extraction quality, because the other 55 ruleIds are not present in the source text.
2.3 Operational state Ground Truth
The Knowledge graph ABox (dataset/kg_seed/nodes.csv + edges.cs ) contains 115 nodes and 341 edges representing the operational state of the Digital Twin with concrete events that occurred during the 4-day snapshot (6–9 January 2026).
Node label
Count
Person
44
Sensor
22
AnomalyEvent
14
SafetyEvent
10
Component
9
Maintenance
8
Zone
7
System
1
Total
115
Edge type
Count
authorized_for
218
monitors
44
triggers
22
contains
16
involves / detected_in
20
part_of / resolves
17
feeds_into
3
correlates_with
1
Total
341
2.3.1 Anomalies
14 anomaly events are injected deterministically.
Development set — days 1–3 (GT-0001 to GT-0009, all 5 anomaly types present):
ID
Day
Type
Severity
Sensor
Magnitude
Duration
GT-0001
1
SPIKE
CRITICAL
ST02_SEALING_TMP
+28.0 °C
3 min
GT-0002
1
DRIFT
WARNING
ST01_FILLING_PRS
+0.60 bar
45 min
GT-0003
2
DRIFT
WARNING
ST04_PACKAGING_VIB
+0.18 mm/s
90 min
GT-0004
2
STUCK
WARNING
ST03_LABELLING_TEN
—
12 min
GT-0005
2
OUT_OF_RANGE
CRITICAL
SRV01_SERVERROOM_TMP
+5.5 °C
20 min
GT-0006
3
OUT_OF_RANGE
WARNING
CAF01_CAFETERIA_TMP
+5.5 °C
30 min
GT-0007
4
SPIKE
WARNING
ST01_FILLING_FLW
−30.0 L/min
2 min
GT-0008
4
DRIFT
WARNING
ST02_SEALING_CUR
+2.1 A
120 min
GT-0009
4
CORRELATED
WARNING
ST04_PACKAGING_SPD
−0.15 m/s
60 min
Test set — days 4–5 (GT-0010 to GT-0014, auxiliary zones):
ID
Day
Type
Severity
Sensor
Magnitude
Duration
GT-0010
5
OUT_OF_RANGE
CRITICAL
ST03_LABELLING_CNT
−18 pcs/min
25 min
GT-0011
5
DRIFT
CRITICAL
WRH01_WAREHOUSE_TMP
+6.0 °C
180 min
GT-0012
5
SPIKE
CRITICAL
ST04_PACKAGING_VIB
+0.45 mm/s
1 min
GT-0013
5
OUT_OF_RANGE
CRITICAL
CHM01_CHEMICALSTORAGE_TMP
+7.0 °C
45 min
GT-0014
5
DRIFT
WARNING
RND01_RDLAB_HUM
+18.0 %RH
90 min
2.3.2 Anomalies types:
SPIKE — Gaussian-shaped transient, peak in first 20% of window, duration < 5 min. Challenge: short duration, easily missed at 30-second cadence.
DRIFT — Monotonic linear ramp across the full event window. Challenge: gradual onset, threshold crossing delayed relative to root cause.
STUCK — Value frozen at window onset. Challenge: zero variance, indistinguishable from constant process without historical context.
OUT_OF_RANGE — Fixed sustained offset above WARN_HI or below WARN_LO. Challenge: persistent violation requiring temporal windowing to avoid false positives.
CORRELATED — Causal deviation linked to another sensor with a fixed temporal lag. Challenge: cannot be detected from a single sensor or single document.
GT-0009 — The correlated event (primary evaluation discriminator)
GT-0009 is the only CORRELATED event and the central scientific challenge of iMAKS. It represents a speed reduction at ST04_PACKAGING caused by a current drift at ST02_SEALING through shared electrical coupling, with a 90-minute temporal lag.
This triple cannot be extracted from any single source. It requires explicit three-way evidence fusion: (1) a temporal co-occurrence in the timeseries between ST02_SEALING_CUR DRIFT and ST04_PACKAGING_SPD reduction; (2) RULE-ST02-04 in SOP-001 §4.2 asserting the causal correlation; (3) the fault procedure in SOP-003 §4 naming the ST02→ST04 propagation pattern. No single source is individually sufficient.
A pipeline that correctly extracts this edge has demonstrated architecture-level multi-source fusion capability. This binary result is evaluated separately from aggregate F1.
3 Evaluation Methodology
3.1 Two-phase evaluation protocol
Table 2 — Two-phase evaluation protocol
Phase
Question
Reference file
Metrics
Phase 1Extraction
Did the LLM extract the correct rules from the SOP PDFs?
ground_truth.csv86 annotated rules(28 OP + 27 THR + 11 MNT + 20 ACC)
Precision, Recall, F1_content per rule class.F1_content: Hungarian assignment + SBERT (all-MiniLM-L6-v2, θ=0.6).
Phase 2Coverage
Do the extracted rules cover the events that actually occurred?
nodes.csv + edges.csv14 AnomalyEvent + 8 Maintenance
COVERED / GAP per ABox event.Coverage fraction — min 70%.GT-0009 binary (multi-source fusion)
3.2 Phase 1 metrics
F1_strict: exact ruleId match after normalisation (punctuation and capitalisation removed). RULE-ST01-01 == rule_st01_01. Hard ceiling at 31 TP due to implicit rules.
F1_fuzzy: adds token overlap on the ruleId. Captures approximate naming. Shares the limitation of F1_strict: ignores field content entirely.
F1_content (primary metric): Hungarian assignment matches GT rows to extracted rows by content score, independent of ruleId. Score per pair is a class-weighted average of: (a) exact match on categorical fields (class, station, sensor, severity); (b) tolerance score on numeric fields: 1 − |gt−llm| / |gt|; (c) SBERT cosine similarity on text fields (condition, action), model all-MiniLM-L6-v2. A pair is a TP if aggregated score ≥ 0.6.
3.3 Phase 2 metrics
Phase 2 asks whether the extracted rules cover the events that actually occurred in the simulated facility. For each AnomalyEvent and Maintenance node in the ABox, a COVERED/GAP classification is assigned based on whether at least one extracted rule is linked via COVERS or VIOLATES edges in Neo4j.
GT-0009 is evaluated separately as a binary pass/fail.
GT-0009 is the only CORRELATED event and the central scientific challenge of iMAKS. It represents a speed reduction at ST04_PACKAGING caused by a current drift at ST02_SEALING through shared electrical coupling, with a 90-minute temporal lag.This triple cannot be extracted from any single source. It requires explicit three-way evidence fusion: (1) temporal co-occurrence in the timeseries between ST02_SEALING_CUR DRIFT and ST04_PACKAGING_SPD reduction; (2) RULE-ST02-04 in SOP-001 §4.2 asserting the causal correlation; (3) the fault procedure in SOP-003 §4 naming the ST02→ST04 propagation pattern. No single source is individually sufficient. A pipeline that correctly extracts this edge has demonstrated architecture-level multi-source fusion capability. This binary result is reported separately from aggregate F1.
4 Dataset Possible Uses
Multi-agent KG extraction evaluation — primary use. Run an extraction pipeline on the four SOP PDFs and compare against the ground truth
Industrial timeseries anomaly detection — a standalone benchmark. 22 sensors, 5 anomaly types, 14 labelled events.
Human activity recognition from WiFi CSI — Layer 3 (1,100 files, 44 subjects, 5 gesture types) is a standalone HAR benchmark.
Digital Twin knowledge population — KG seed loads directly into Neo4j as the initial state of a factory digital twin. MQTT payloads simulate real-time telemetry.
Access control anomaly detection — 8 controlled unauthorised access events and ~50 overcrowding events across 7 zones.
Conflict resolution in knowledge engineering — 3 controlled inter-document tensions in Layer 4 for systems that must represent conflicting claims with provenance metadata.
5 File Structure
imaks_v2026/├── dataset/│ ├── rules/ ← SOP PDFs (TBox source)│ │ ├── SOP_001_OperatingProcedures.pdf│ │ ├── SOP_002_AlarmThresholds.pdf│ │ ├── SOP_003_MaintenanceRules.pdf│ │ └── SOP_004_PersonnelZoneAccess.pdf│ ├── kg_seed/│ │ ├── ground_truth.csv ← TBox ground truth: 86 rules (Phase 1 ref)│ │ ├── nodes.csv ← ABox: 115 nodes, 5 operating days (Phase 2 ref)│ │ └── edges.csv ← ABox: 341 edges│ ├── sensors/ ← timeseries_raw.csv, timeseries_annotated.csv, ground_truth_labels.csv, mqtt_payloads.json│ ├── human/ ← person_registry.csv, occupancy_timeseries.csv, access_events.csv, alarm_response_log.csv│ ├── csi/ ← 1,100 WiFi CSI files (44 subjects × 5 gestures × 5 reps)│ └── datasheets/ ← 3 sensor manufacturer PDFs with controlled inter-document tensions
Quick Start
pip install langgraph langchain-ollama pymupdf pydantic tqdm
ollama serve && ollama pull qwen2.5:14b
python3 pipeline/validate_pipeline.py
python3 pipeline/imaks_pipeline.py
提供机构:
Zenodo
创建时间:
2026-04-11



