five

searchsim/cognitive-traces-aol

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/searchsim/cognitive-traces-aol
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-classification - token-classification language: - en tags: - information-retrieval - user-simulation - cognitive-modeling - information-foraging-theory - search-logs pretty_name: "Cognitive Traces — AOL User Session Collection" size_categories: - 100K<n<1M --- # Cognitive Traces — AOL User Session Collection ## Dataset Description This dataset contains **cognitive trace annotations** for the AOL User Session Collection, produced by the multi-agent annotation framework described in: > **Beyond the Click: A Framework for Inferring Cognitive Traces in Search** > Saber Zerhoudi, Michael Granitzer. ECIR 2026. Each user event (query, click, SERP view) is annotated with a cognitive label from **Information Foraging Theory (IFT)**, along with the full annotation chain (analyst, critic, judge) and confidence scores. ## Dataset Statistics | Metric | Value | |--------|-------| | Sessions | 22,039 | | Events | 245,786 | | Action Types | QUERY, CLICK, SERP_VIEW (3) | | Cognitive Labels | 6 (FollowingScent, ApproachingSource, ForagingSuccess, DietEnrichment, PoorScent, LeavingPatch) | ## Quick Start ```python from datasets import load_dataset ds = load_dataset("searchsim/cognitive-traces-aol") # Access the data print(ds["train"][0]) # Filter by cognitive label struggling = ds["train"].filter(lambda x: x["cognitive_label"] == "PoorScent") print(f"Events with PoorScent: {len(struggling)}") # Get all events for a session session = ds["train"].filter(lambda x: x["session_id"] == "1000004_185757") for event in session: print(f" {event["action_type"]}: {event["cognitive_label"]}") ``` ## Column Schema | Column | Type | Description | |--------|------|-------------| | `session_id` | string | Unique session identifier | | `event_id` | string | Unique event identifier | | `event_timestamp` | string | ISO timestamp | | `action_type` | string | User action type (QUERY, CLICK, SERP_VIEW) | | `content` | string | Event content (query text, clicked URL, or SERP results) | | `cognitive_label` | string | Final IFT cognitive label | | `analyst_label` | string | Analyst agent's proposed label | | `analyst_justification` | string | Analyst's reasoning | | `critic_label` | string | Critic agent's proposed label | | `critic_agreement` | string | Whether Critic agreed with Analyst | | `critic_justification` | string | Critic's reasoning | | `judge_justification` | string | Judge's final decision reasoning | | `confidence_score` | float | Framework confidence (0–1) | | `disagreement_score` | float | Analyst–Critic disagreement (0–1) | | `flagged_for_review` | bool | Whether flagged for human review | | `pipeline_mode` | string | Annotation pipeline mode | ## IFT Cognitive Labels | Label | IFT Concept | Interpretation | |-------|-------------|----------------| | FollowingScent | Information scent following | User pursuing a promising trail | | ApproachingSource | Source approaching | User converging on target information | | ForagingSuccess | Successful foraging | User found desired information | | DietEnrichment | Diet enrichment | User broadening information intake | | PoorScent | Poor information scent | Trail quality deteriorating | | LeavingPatch | Patch leaving | User abandoning current direction | ## Source Dataset Based on the AOL User Session Collection (aol-ia variant, MacAvaney et al., ECIR 2022). Contains web search queries, clicks, and SERP views from 2006, linked to Internet Archive snapshots. We use the standard anonymized collection and report only aggregate findings. ## Citation ```bibtex @inproceedings{zerhoudi2026beyond, title={Beyond the Click: A Framework for Inferring Cognitive Traces in Search}, author={Zerhoudi, Saber and Granitzer, Michael}, booktitle={Proceedings of the 48th European Conference on Information Retrieval (ECIR)}, year={2026} } ``` ## License CC-BY-4.0. The cognitive annotations are released under Creative Commons Attribution 4.0. The underlying source datasets have their own licenses — please refer to the original dataset providers. ## Links - [Paper](https://traces.searchsim.org/) - [GitHub Repository](https://github.com/searchsim-org/cognitive-traces) - [Annotation Tool](https://github.com/searchsim-org/cognitive-traces)
提供机构:
searchsim
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作