DeL-TaiseiOzaki/enronhop-timesplit_2001_0418
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
task_categories:
- question-answering
tags:
- enron
- multi-hop
- email-search
- time-split
---
# EnronHop — Time Split (cutoff: 2001-04-18)
Time-based re-split of the [`miluki/enronhop`](https://huggingface.co/datasets/miluki/enronhop) multi-hop Q&A benchmark, enabling temporal generalization evaluation.
## Split policy
Items from the combined `miluki/enronhop` train+test pool (1,947 items total) are re-partitioned at cutoff `2001-04-18 00:00:00`:
- **train** (1,360 items, 69.85%): `max(evidence_dates) < cutoff`
- **test** (587 items, 30.15%): `max(evidence_dates) >= cutoff`
`evidence_dates` are looked up per item by joining `evidence_mails` (mail_ids) with the date field in [`miluki/enronhop-corpus/emails.jsonl`](https://huggingface.co/datasets/miluki/enronhop-corpus) (the kaminski-v custodian subset, 11,062 emails).
## Files
- `train.jsonl` — 1,360 items
- `test.jsonl` — 587 items
- `manifest.json` — detailed provenance
Item schema matches `miluki/enronhop` exactly (same columns, just re-partitioned).
## Companion corpus (for leak-free training)
A date-filtered corpus subset is published alongside for proper temporal evaluation:
| File | Emails | Use |
|---|---|---|
| [`DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.jsonl`](https://huggingface.co/datasets/DeL-TaiseiOzaki/enronhop-corpus) | 11,062 | Full search index (for test-time eval) |
| [`DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl`](https://huggingface.co/datasets/DeL-TaiseiOzaki/enronhop-corpus) | 6,784 | Pre-cutoff search index (for train-time) |
During **training**, point the agent's search index to the `pre_cutoff` file to prevent future-data leakage. For **test evaluation**, use the full kaminski_v corpus.
## Usage with `enronhop_env`
No environment code changes are needed — `enronhop_env.load_environment()` already
accepts `train_data_file`, `test_data_file`, `corpus_data_file`, and `db_path` as
arguments. Override them in your Prime RL config or direct Python call:
### Prime RL config (TOML)
```toml
# Training env — leak-free pre-cutoff corpus
[[env]]
id = "miluki/enronhop_env"
args = { max_turns = 15, judge_model = "gpt-5-nano",
train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl",
test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl",
corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl",
db_path = "~/.cache/enronhop_env/enronhop_corpus.pre_cutoff.db" }
# Eval env — full corpus so test items' future evidence is reachable
[[eval.env]]
id = "miluki/enronhop_env"
args = { max_turns = 10, judge_model = "gpt-5-nano",
train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl",
test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl",
corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.jsonl",
db_path = "~/.cache/enronhop_env/enronhop_corpus.full.db" }
```
Important: **always use distinct `db_path` values** for the train and eval envs,
or the SQLite cache built from the first corpus will be reused by the second one
(silent corpus mismatch).
### Direct Python
```python
from enronhop_env import load_environment
env = load_environment(
train_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/train.jsonl",
test_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-timesplit_2001_0418/test.jsonl",
corpus_data_file = "hf://datasets/DeL-TaiseiOzaki/enronhop-corpus/emails.kaminski_v.pre_cutoff.jsonl",
db_path = "~/.cache/enronhop_env/enronhop_corpus.pre_cutoff.db",
)
```
## Hop distribution
| hop | train | test | test % of hop |
|-----|------:|-----:|--------------:|
| 1 | 465 | 124 | 21.1% |
| 2 | 531 | 235 | 30.7% |
| 3 | 244 | 126 | 34.1% |
| 4 | 120 | 102 | 45.9% |
| **total** | **1,360** | **587** | **30.15%** |
Multi-hop items skew toward test because their evidence can span larger date ranges, so `max(evidence_dates)` lands post-cutoff more often.
## Integrity
- All 1,947 items annotated successfully (100% evidence coverage in corpus).
- Train items: 0 evidence mails missing from pre_cutoff corpus (full access at train time).
- Test items: all 587 have at least one evidence mail removed by the pre_cutoff filter (319 with ALL evidence post-cutoff, 268 partial).
## Citation
Derived from [`miluki/enronhop`](https://huggingface.co/datasets/miluki/enronhop). Original pipeline and generation methodology: see the source project.
提供机构:
DeL-TaiseiOzaki



