aeriesec/orgforge-insider-threat
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aeriesec/orgforge-insider-threat
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
task_categories:
- text-classification
- token-classification
- tabular-classification
task_ids:
- multi-label-classification
- multi-class-classification
- fact-checking
- document-retrieval
- extractive-qa
- open-domain-qa
tags:
- security
- insider-threat
- synthetic
- orgforge
- siem
- detection
- behavioral-analytics
- anomaly-detection
- time-series
- security-analytics
pretty_name: "OrgForge Insider Threat Detection Benchmark"
size_categories:
- 1K<n<10K
---
# OrgForge Insider Threat Detection Benchmark
> Structured security telemetry for benchmarking LLM-based insider threat
> detection. No embedder required — the corpus is pre-structured JSONL/Parquet.
> Ground truth is derived deterministically from the simulation's event log.
## Dataset Summary
This dataset was produced by **OrgForge**, an event-driven organisation
simulator, with the insider threat module enabled. A corporate simulation runs
for 51 days; configured threat subjects exhibit realistic anomalous
behaviors across multiple artifact surfaces. Detection agents read the
observable telemetry stream and must correlate signals across days and record
types without access to ground truth labels.
| Property | Value |
| ------------------------------ | ------------------------------------- |
| Company | Apex Athletics |
| Industry | sports technology |
| Simulation days | 51 |
| Threat subjects | 3 |
| DLP noise ratio | 0.4 |
| Log format | jsonl |
| Total observable records | 2,904 |
| True positive records | 106 |
| False positive (noise) records | 2,798 |
| Baseline (clean) records | 3,930 |
| Best leaderboard verdict F1 | 0.8 (us.anthropic.claude-opus-4-6-v1) |
## Threat Subjects
| Name | Class | Onset day | Behaviors |
| ------ | ----------- | --------- | ----------------------------------------------------------------------------------------------------------------------- |
| Jordan | negligent | 5 | secret_in_commit |
| Tasha | disgruntled | 10 | sentiment_drift, cross_dept_snooping, unusual_hours_access |
| Jax | malicious | 18 | data_exfil_email, excessive_repo_cloning, unusual_hours_access, sentiment_drift, host_data_hoarding, social_engineering |
Subjects behave normally before their onset day. Pre-onset records are clean
true negatives and form the detection baseline.
## Telemetry Files
All files are in `telemetry/`. The observable stream contains no ground truth
labels — `true_positive`, `threat_class`, and `behavior` fields appear only in
`ground_truth/`.
| File | Description | Rows |
| ---------------------------------- | ---------------------------- | ----- |
| `access_log-00000.parquet` | Full observable stream | 2,904 |
| `baseline_telemetry-00000.parquet` | Pre-onset clean records only | 3,930 |
| `idp_auth-00000.parquet` | IDP authentication events | 2,831 |
| `host_events-00000.parquet` | Host-level staging events | 6 |
### Record types in the observable stream
| `idp_auth` | 2,831 |
| `slack_message` | 30 |
| `repo_access` | 29 |
| `email_send` | 7 |
| `host_event` | 6 |
| `phone_call` | 1 |
### Telemetry schema
Core fields present on every record:
| Column | Type | Description |
| ------------- | ---- | --------------------------------------- |
| `record_type` | str | Event category (see table above) |
| `day` | int | Simulation day (1-indexed) |
| `date` | str | ISO date |
| `timestamp` | str | ISO datetime (SimClock-accurate) |
| `actor` | str | Employee name — no threat annotation |
| `extra` | str | JSON string of additional detail fields |
Selected promoted detail columns (present when applicable):
| Column | Type | Description |
| -------------------------- | ---- | ------------------------------------------------------------------------------------------------ |
| `auth_result` | str | IDP: `success` or `mfa_failure` |
| `src_ip` | str | IDP: source IP address |
| `new_device` | bool | IDP: device not in employee's known profile |
| `anomalous_ip` | bool | IDP: IP outside corporate range |
| `ghost_login` | bool | IDP: disgruntled ghost login pattern |
| `preceded_by_call_record` | bool | IDP: auth preceded by phone_call (vishing) |
| `action` | str | Host: `bulk_file_copy`, `archive_creation`, `archive_move` |
| `hoarding_trail_start_day` | int | Host: links phase 3 back to phase 1 |
| `destination_type` | str | Host: `cloud_sync_dir` or `removable_media` |
| `outside_business_hours` | bool | All: access outside 09:00–18:00 |
| `pattern` | str | Social engineering: `spear_phishing`, `slack_pretexting`, `vishing_breadcrumb`, `trust_building` |
| `preceded_by_call_record` | bool | IDP: marks vishing post-auth events |
## Ground Truth
`ground_truth/ground_truth-00000.parquet` contains all records from the
observable stream with three additional columns:
| Column | Type | Description |
| --------------- | ---- | ------------------------------------------------- |
| `true_positive` | bool | True if this record is a genuine threat signal |
| `threat_class` | str | `negligent`, `disgruntled`, `malicious`, or null |
| `behavior` | str | Behavior name, or null for false positive records |
**This file is held out during evaluation.** Detection agents read only the
telemetry files. Ground truth is used for scoring only.
### Behaviors present in ground truth
| `unusual_hours_access` | 62 |
| `sentiment_drift` | 27 |
| `data_exfil_email` | 7 |
| `host_data_hoarding` | 6 |
| `social_engineering` | 4 |
### Threat class distribution
| `malicious` | 61 |
| `disgruntled` | 45 |
## Detection Scenarios
The dataset is designed to benchmark reasoning under noise, not keyword
matching. Key scenarios:
**Ghost login (disgruntled):** `idp_auth` success present but no Slack,
Jira, or email artifacts follow in the same time window.
`corroborating_activity_expected: false` marks these records.
**Vishing (malicious):** A `phone_call` record is followed within minutes by
an `idp_auth` success filed under the **target's** actor name, not the
attacker's. `preceded_by_call_record: true` and `call_to_auth_gap_minutes`
link them. Per-actor triage agents miss this entirely — cross-actor
correlation is required.
**Host data hoarding (malicious/disgruntled):** Three-phase trail spread
across consecutive days: `bulk_file_copy` → `archive_creation` →
`archive_move`. The `hoarding_trail_start_day` field in phase 3 records
links back to phase 1. Single-day triage agents will always miss the
complete trail.
**Trust building (social engineering):** A benign inbound email with
`sender_in_known_contacts: false` precedes a follow-up attack 3–5 days
later. The first contact is a clean false negative in isolation.
## Leaderboard
`leaderboard/insider_threat_leaderboard.csv` contains a frozen snapshot of
all model runs against this export. The full JSON version is also included.
Columns:
| Column | Description | Better |
| -------------------------- | ---------------------------------------------- | ------ |
| `triage_f1` | F1 on escalation decisions | ↑ |
| `verdict_f1` | F1 on full case verdicts | ↑ |
| `baseline_fp_rate` | FP rate on clean baseline period | ↓ |
| `onset_sensitivity` | Fraction of pre-onset escalations | ↓ |
| `vishing_detected` | Did the agent correlate phone_call → idp_auth? | ✓ |
| `host_trail_reconstructed` | Did the agent cite all 3 hoarding phases? | ✓ |
To add a row, run `eval_insider_threat.py` against this export and append
the output to these files.
## Evaluation Pipeline
```bash
# Build baseline (pre-onset clean records)
python build_baseline_telemetry.py --export-dir ./export
# Run detection pipeline (one command per model)
python eval_insider_threat.py \
--model anthropic.claude-opus-4-5-20251101-v1:0 \
--export-dir ./export
# Launch leaderboard UI
python app.py # (insider threat Gradio app)
```
No embedder, no MongoDB, no vector database required. Credentials: AWS
Bedrock (standard credential chain).
## Citation
```bibtex
@misc{orgforge_it2026,
title = {OrgForge Insider Threat Detection Benchmark},
author = {Jeffrey Flynt},
year = {2026},
note = {Synthetic benchmark generated by the OrgForge insider threat simulator}
}
```
## Related Paper
https://arxiv.org/abs/2603.22499
## License
MIT. The simulation engine that produced this dataset is independently licensed; see the OrgForge repository for details.
提供机构:
aeriesec



