searchsim/cognitive-traces-aol
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/searchsim/cognitive-traces-aol
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-classification
- token-classification
language:
- en
tags:
- information-retrieval
- user-simulation
- cognitive-modeling
- information-foraging-theory
- search-logs
pretty_name: "Cognitive Traces — AOL User Session Collection"
size_categories:
- 100K<n<1M
---
# Cognitive Traces — AOL User Session Collection
## Dataset Description
This dataset contains **cognitive trace annotations** for the AOL User Session Collection, produced by the multi-agent annotation framework described in:
> **Beyond the Click: A Framework for Inferring Cognitive Traces in Search**
> Saber Zerhoudi, Michael Granitzer. ECIR 2026.
Each user event (query, click, SERP view) is annotated with a cognitive label from **Information Foraging Theory (IFT)**, along with the full annotation chain (analyst, critic, judge) and confidence scores.
## Dataset Statistics
| Metric | Value |
|--------|-------|
| Sessions | 22,039 |
| Events | 245,786 |
| Action Types | QUERY, CLICK, SERP_VIEW (3) |
| Cognitive Labels | 6 (FollowingScent, ApproachingSource, ForagingSuccess, DietEnrichment, PoorScent, LeavingPatch) |
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("searchsim/cognitive-traces-aol")
# Access the data
print(ds["train"][0])
# Filter by cognitive label
struggling = ds["train"].filter(lambda x: x["cognitive_label"] == "PoorScent")
print(f"Events with PoorScent: {len(struggling)}")
# Get all events for a session
session = ds["train"].filter(lambda x: x["session_id"] == "1000004_185757")
for event in session:
print(f" {event["action_type"]}: {event["cognitive_label"]}")
```
## Column Schema
| Column | Type | Description |
|--------|------|-------------|
| `session_id` | string | Unique session identifier |
| `event_id` | string | Unique event identifier |
| `event_timestamp` | string | ISO timestamp |
| `action_type` | string | User action type (QUERY, CLICK, SERP_VIEW) |
| `content` | string | Event content (query text, clicked URL, or SERP results) |
| `cognitive_label` | string | Final IFT cognitive label |
| `analyst_label` | string | Analyst agent's proposed label |
| `analyst_justification` | string | Analyst's reasoning |
| `critic_label` | string | Critic agent's proposed label |
| `critic_agreement` | string | Whether Critic agreed with Analyst |
| `critic_justification` | string | Critic's reasoning |
| `judge_justification` | string | Judge's final decision reasoning |
| `confidence_score` | float | Framework confidence (0–1) |
| `disagreement_score` | float | Analyst–Critic disagreement (0–1) |
| `flagged_for_review` | bool | Whether flagged for human review |
| `pipeline_mode` | string | Annotation pipeline mode |
## IFT Cognitive Labels
| Label | IFT Concept | Interpretation |
|-------|-------------|----------------|
| FollowingScent | Information scent following | User pursuing a promising trail |
| ApproachingSource | Source approaching | User converging on target information |
| ForagingSuccess | Successful foraging | User found desired information |
| DietEnrichment | Diet enrichment | User broadening information intake |
| PoorScent | Poor information scent | Trail quality deteriorating |
| LeavingPatch | Patch leaving | User abandoning current direction |
## Source Dataset
Based on the AOL User Session Collection (aol-ia variant, MacAvaney et al., ECIR 2022). Contains web search queries, clicks, and SERP views from 2006, linked to Internet Archive snapshots. We use the standard anonymized collection and report only aggregate findings.
## Citation
```bibtex
@inproceedings{zerhoudi2026beyond,
title={Beyond the Click: A Framework for Inferring Cognitive Traces in Search},
author={Zerhoudi, Saber and Granitzer, Michael},
booktitle={Proceedings of the 48th European Conference on Information Retrieval (ECIR)},
year={2026}
}
```
## License
CC-BY-4.0. The cognitive annotations are released under Creative Commons Attribution 4.0. The underlying source datasets have their own licenses — please refer to the original dataset providers.
## Links
- [Paper](https://traces.searchsim.org/)
- [GitHub Repository](https://github.com/searchsim-org/cognitive-traces)
- [Annotation Tool](https://github.com/searchsim-org/cognitive-traces)
提供机构:
searchsim



