five

ethanning/deepresearchgym-agentic-search-logs

收藏
Hugging Face2026-01-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ethanning/deepresearchgym-agentic-search-logs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-retrieval - text-generation language: - en tags: - information-retrieval - agentic-search - search-logs - query-rewriting - query-log - search-behavior - llm-agents - deep-research pretty_name: DeepResearchGym Agentic Search Logs size_categories: - 10M<n<100M --- ## DeepResearchGym Agentic Search Logs This repository hosts the dataset accompanying the paper **“Agentic Search in the Wild”** (arXiv: https://arxiv.org/abs/2601.17617). The dataset contains **14M+** search queries collected via **DeepResearchGym (DRGym)**, an open-source search API designed for DeepResearch-style agentic search. For more background on DRGym, see: https://arxiv.org/abs/2505.19253. All records have been **anonymized** and **shuffled** to prevent re-identification, and we additionally applied basic cleaning to remove clearly abnormal queries (e.g., empty/degenerate inputs and obvious artifacts). The final release is **sessionized**, where queries are grouped into search sessions according to the procedure described in the paper. For full details on data collection, processing, anonymization, and sessionization, please refer to the paper. ## Dataset Structure | Field | Description | |-------|-------------| | `session_id` | Anonymized session identifier | | `session_len` | Number of queries in this session | | `query_id` | 1-indexed position within session | | `query` | Query text | | `time_offset` | Seconds since first query in session | | `retrieval_depth` | Number of documents requested (k) | ## License This dataset is released under **CC BY 4.0**. We encourage use for research and development with appropriate attribution. ## Citation If you use this dataset, please cite the corresponding paper (arXiv: https://arxiv.org/abs/2601.17617). ```bibtex @misc{ning2026agenticsearchwildintents, title={Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests}, author={Jingjie Ning and João Coelho and Yibo Kong and Yunfan Long and Bruno Martins and João Magalhães and Jamie Callan and Chenyan Xiong}, year={2026}, eprint={2601.17617}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2601.17617}, } ``` ## Contact For questions, issues, or collaboration, please contact the paper authors.
提供机构:
ethanning
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作