five

DeL-TaiseiOzaki/enronhop-timesplit_2001_0418

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: other task_categories: - question-answering tags: - enron - multi-hop - email-search - time-split --- # EnronHop — Time Split (cutoff: 2001-04-18) Time-based re-split of the [`miluki/enronhop`](https://huggingface.co/datasets/miluki/enronhop) multi-hop Q&A benchmark, enabling temporal generalization evaluation. ## Split policy Items from the combined `miluki/enronhop` train+test pool (1,947 items total) are re-partitioned at cutoff `2001-04-18 00:00:00`: - **train** (1,360 items, 69.85%): `max(evidence_dates) < cutoff` - **test** (587 items, 30.15%): `max(evidence_dates) >= cutoff` `evidence_dates` are looked up per item by joining `evidence_mails` (mail_ids) with the date field in [`miluki/enronhop-corpus/emails.jsonl`](https://huggingface.co/datasets/miluki/enronhop-corpus) (the kaminski-v custodian subset, 11,062 emails). ## Files - `train.jsonl` — 1,360 items - `test.jsonl` — 587 items - `manifest.json` — detailed provenance Item schema matches `miluki/enronhop` exactly (same columns, just re-partitioned). ## Companion corpus (for leak-free training) A date-filtered corpus subset is published alongside for proper temporal evaluation: | File | Emails | Use | |---|---|---| | [`DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.jsonl`](https://huggingface.co/datasets/DeL-TaiseiOzaki/enronhop-corpus) | 11,062 | Full search index (for test-time eval) | | [`DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl`](https://huggingface.co/datasets/DeL-TaiseiOzaki/enronhop-corpus) | 6,784 | Pre-cutoff search index (for train-time) | During **training**, point the agent's search index to the `pre_cutoff` file to prevent future-data leakage. For **test evaluation**, use the full kaminski_v corpus. ## Usage with `enronhop_env` No environment code changes are needed — `enronhop_env.load_environment()` already accepts `train_data_file`, `test_data_file`, `corpus_data_file`, and `db_path` as arguments. Override them in your Prime RL config or direct Python call: ### Prime RL config (TOML) ```toml # Training env — leak-free pre-cutoff corpus [[env]] id = "miluki/enronhop_env" args = { max_turns = 15, judge_model = "gpt-5-nano", train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl", test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl", corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl", db_path = "~/.cache/enronhop_env/enronhop_corpus.pre_cutoff.db" } # Eval env — full corpus so test items' future evidence is reachable [[eval.env]] id = "miluki/enronhop_env" args = { max_turns = 10, judge_model = "gpt-5-nano", train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl", test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl", corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.jsonl", db_path = "~/.cache/enronhop_env/enronhop_corpus.full.db" } ``` Important: **always use distinct `db_path` values** for the train and eval envs, or the SQLite cache built from the first corpus will be reused by the second one (silent corpus mismatch). ### Direct Python ```python from enronhop_env import load_environment env = load_environment( train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl", test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl", corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl", db_path = "~/.cache/enronhop_env/enronhop_corpus.pre_cutoff.db", ) ``` ## Hop distribution | hop | train | test | test % of hop | |-----|------:|-----:|--------------:| | 1 | 465 | 124 | 21.1% | | 2 | 531 | 235 | 30.7% | | 3 | 244 | 126 | 34.1% | | 4 | 120 | 102 | 45.9% | | **total** | **1,360** | **587** | **30.15%** | Multi-hop items skew toward test because their evidence can span larger date ranges, so `max(evidence_dates)` lands post-cutoff more often. ## Integrity - All 1,947 items annotated successfully (100% evidence coverage in corpus). - Train items: 0 evidence mails missing from pre_cutoff corpus (full access at train time). - Test items: all 587 have at least one evidence mail removed by the pre_cutoff filter (319 with ALL evidence post-cutoff, 268 partial). ## Citation Derived from [`miluki/enronhop`](https://huggingface.co/datasets/miluki/enronhop). Original pipeline and generation methodology: see the source project.
提供机构:
DeL-TaiseiOzaki
二维码
社区交流群
二维码
科研交流群
商业服务