ethanning/deepresearchgym-agentic-search-logs
收藏Hugging Face2026-01-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ethanning/deepresearchgym-agentic-search-logs
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-retrieval
- text-generation
language:
- en
tags:
- information-retrieval
- agentic-search
- search-logs
- query-rewriting
- query-log
- search-behavior
- llm-agents
- deep-research
pretty_name: DeepResearchGym Agentic Search Logs
size_categories:
- 10M<n<100M
---
## DeepResearchGym Agentic Search Logs
This repository hosts the dataset accompanying the paper **“Agentic Search in the Wild”** (arXiv: https://arxiv.org/abs/2601.17617).
The dataset contains **14M+** search queries collected via **DeepResearchGym (DRGym)**, an open-source search API designed for DeepResearch-style agentic search. For more background on DRGym, see: https://arxiv.org/abs/2505.19253.
All records have been **anonymized** and **shuffled** to prevent re-identification, and we additionally applied basic cleaning to remove clearly abnormal queries (e.g., empty/degenerate inputs and obvious artifacts). The final release is **sessionized**, where queries are grouped into search sessions according to the procedure described in the paper.
For full details on data collection, processing, anonymization, and sessionization, please refer to the paper.
## Dataset Structure
| Field | Description |
|-------|-------------|
| `session_id` | Anonymized session identifier |
| `session_len` | Number of queries in this session |
| `query_id` | 1-indexed position within session |
| `query` | Query text |
| `time_offset` | Seconds since first query in session |
| `retrieval_depth` | Number of documents requested (k) |
## License
This dataset is released under **CC BY 4.0**. We encourage use for research and development with appropriate attribution.
## Citation
If you use this dataset, please cite the corresponding paper (arXiv: https://arxiv.org/abs/2601.17617).
```bibtex
@misc{ning2026agenticsearchwildintents,
title={Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests},
author={Jingjie Ning and João Coelho and Yibo Kong and Yunfan Long and Bruno Martins and João Magalhães and Jamie Callan and Chenyan Xiong},
year={2026},
eprint={2601.17617},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2601.17617},
}
```
## Contact
For questions, issues, or collaboration, please contact the paper authors.
提供机构:
ethanning



