five

aeriesec/orgforge-insider-threat

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aeriesec/orgforge-insider-threat
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - text-classification - token-classification - tabular-classification task_ids: - multi-label-classification - multi-class-classification - fact-checking - document-retrieval - extractive-qa - open-domain-qa tags: - security - insider-threat - synthetic - orgforge - siem - detection - behavioral-analytics - anomaly-detection - time-series - security-analytics pretty_name: "OrgForge Insider Threat Detection Benchmark" size_categories: - 1K<n<10K --- # OrgForge Insider Threat Detection Benchmark > Structured security telemetry for benchmarking LLM-based insider threat > detection. No embedder required — the corpus is pre-structured JSONL/Parquet. > Ground truth is derived deterministically from the simulation's event log. ## Dataset Summary This dataset was produced by **OrgForge**, an event-driven organisation simulator, with the insider threat module enabled. A corporate simulation runs for 51 days; configured threat subjects exhibit realistic anomalous behaviors across multiple artifact surfaces. Detection agents read the observable telemetry stream and must correlate signals across days and record types without access to ground truth labels. | Property | Value | | ------------------------------ | ------------------------------------- | | Company | Apex Athletics | | Industry | sports technology | | Simulation days | 51 | | Threat subjects | 3 | | DLP noise ratio | 0.4 | | Log format | jsonl | | Total observable records | 2,904 | | True positive records | 106 | | False positive (noise) records | 2,798 | | Baseline (clean) records | 3,930 | | Best leaderboard verdict F1 | 0.8 (us.anthropic.claude-opus-4-6-v1) | ## Threat Subjects | Name | Class | Onset day | Behaviors | | ------ | ----------- | --------- | ----------------------------------------------------------------------------------------------------------------------- | | Jordan | negligent | 5 | secret_in_commit | | Tasha | disgruntled | 10 | sentiment_drift, cross_dept_snooping, unusual_hours_access | | Jax | malicious | 18 | data_exfil_email, excessive_repo_cloning, unusual_hours_access, sentiment_drift, host_data_hoarding, social_engineering | Subjects behave normally before their onset day. Pre-onset records are clean true negatives and form the detection baseline. ## Telemetry Files All files are in `telemetry/`. The observable stream contains no ground truth labels — `true_positive`, `threat_class`, and `behavior` fields appear only in `ground_truth/`. | File | Description | Rows | | ---------------------------------- | ---------------------------- | ----- | | `access_log-00000.parquet` | Full observable stream | 2,904 | | `baseline_telemetry-00000.parquet` | Pre-onset clean records only | 3,930 | | `idp_auth-00000.parquet` | IDP authentication events | 2,831 | | `host_events-00000.parquet` | Host-level staging events | 6 | ### Record types in the observable stream | `idp_auth` | 2,831 | | `slack_message` | 30 | | `repo_access` | 29 | | `email_send` | 7 | | `host_event` | 6 | | `phone_call` | 1 | ### Telemetry schema Core fields present on every record: | Column | Type | Description | | ------------- | ---- | --------------------------------------- | | `record_type` | str | Event category (see table above) | | `day` | int | Simulation day (1-indexed) | | `date` | str | ISO date | | `timestamp` | str | ISO datetime (SimClock-accurate) | | `actor` | str | Employee name — no threat annotation | | `extra` | str | JSON string of additional detail fields | Selected promoted detail columns (present when applicable): | Column | Type | Description | | -------------------------- | ---- | ------------------------------------------------------------------------------------------------ | | `auth_result` | str | IDP: `success` or `mfa_failure` | | `src_ip` | str | IDP: source IP address | | `new_device` | bool | IDP: device not in employee's known profile | | `anomalous_ip` | bool | IDP: IP outside corporate range | | `ghost_login` | bool | IDP: disgruntled ghost login pattern | | `preceded_by_call_record` | bool | IDP: auth preceded by phone_call (vishing) | | `action` | str | Host: `bulk_file_copy`, `archive_creation`, `archive_move` | | `hoarding_trail_start_day` | int | Host: links phase 3 back to phase 1 | | `destination_type` | str | Host: `cloud_sync_dir` or `removable_media` | | `outside_business_hours` | bool | All: access outside 09:00–18:00 | | `pattern` | str | Social engineering: `spear_phishing`, `slack_pretexting`, `vishing_breadcrumb`, `trust_building` | | `preceded_by_call_record` | bool | IDP: marks vishing post-auth events | ## Ground Truth `ground_truth/ground_truth-00000.parquet` contains all records from the observable stream with three additional columns: | Column | Type | Description | | --------------- | ---- | ------------------------------------------------- | | `true_positive` | bool | True if this record is a genuine threat signal | | `threat_class` | str | `negligent`, `disgruntled`, `malicious`, or null | | `behavior` | str | Behavior name, or null for false positive records | **This file is held out during evaluation.** Detection agents read only the telemetry files. Ground truth is used for scoring only. ### Behaviors present in ground truth | `unusual_hours_access` | 62 | | `sentiment_drift` | 27 | | `data_exfil_email` | 7 | | `host_data_hoarding` | 6 | | `social_engineering` | 4 | ### Threat class distribution | `malicious` | 61 | | `disgruntled` | 45 | ## Detection Scenarios The dataset is designed to benchmark reasoning under noise, not keyword matching. Key scenarios: **Ghost login (disgruntled):** `idp_auth` success present but no Slack, Jira, or email artifacts follow in the same time window. `corroborating_activity_expected: false` marks these records. **Vishing (malicious):** A `phone_call` record is followed within minutes by an `idp_auth` success filed under the **target's** actor name, not the attacker's. `preceded_by_call_record: true` and `call_to_auth_gap_minutes` link them. Per-actor triage agents miss this entirely — cross-actor correlation is required. **Host data hoarding (malicious/disgruntled):** Three-phase trail spread across consecutive days: `bulk_file_copy` → `archive_creation` → `archive_move`. The `hoarding_trail_start_day` field in phase 3 records links back to phase 1. Single-day triage agents will always miss the complete trail. **Trust building (social engineering):** A benign inbound email with `sender_in_known_contacts: false` precedes a follow-up attack 3–5 days later. The first contact is a clean false negative in isolation. ## Leaderboard `leaderboard/insider_threat_leaderboard.csv` contains a frozen snapshot of all model runs against this export. The full JSON version is also included. Columns: | Column | Description | Better | | -------------------------- | ---------------------------------------------- | ------ | | `triage_f1` | F1 on escalation decisions | ↑ | | `verdict_f1` | F1 on full case verdicts | ↑ | | `baseline_fp_rate` | FP rate on clean baseline period | ↓ | | `onset_sensitivity` | Fraction of pre-onset escalations | ↓ | | `vishing_detected` | Did the agent correlate phone_call → idp_auth? | ✓ | | `host_trail_reconstructed` | Did the agent cite all 3 hoarding phases? | ✓ | To add a row, run `eval_insider_threat.py` against this export and append the output to these files. ## Evaluation Pipeline ```bash # Build baseline (pre-onset clean records) python build_baseline_telemetry.py --export-dir ./export # Run detection pipeline (one command per model) python eval_insider_threat.py \ --model anthropic.claude-opus-4-5-20251101-v1:0 \ --export-dir ./export # Launch leaderboard UI python app.py # (insider threat Gradio app) ``` No embedder, no MongoDB, no vector database required. Credentials: AWS Bedrock (standard credential chain). ## Citation ```bibtex @misc{orgforge_it2026, title = {OrgForge Insider Threat Detection Benchmark}, author = {Jeffrey Flynt}, year = {2026}, note = {Synthetic benchmark generated by the OrgForge insider threat simulator} } ``` ## Related Paper https://arxiv.org/abs/2603.22499 ## License MIT. The simulation engine that produced this dataset is independently licensed; see the OrgForge repository for details.
提供机构:
aeriesec
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作