five

AgenticSearchQueryset/ASQ

收藏
Hugging Face2026-04-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AgenticSearchQueryset/ASQ
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - text-generation language: - en tags: - agent - ir - retrieval - search - rag - text-generation pretty_name: '🤖 ASQ: Agentic Search Queryset' size_categories: - 100M<n<1B viewer: false --- <div align="center"> <h1>🤖 ASQ: Agentic Search Queryset </h1> <p><strong> A dataset capturing RAG agents' search behaviours.</strong></p> </div> <div align="center"> [![Paper](https://img.shields.io/badge/arXiv-2602.17518-B31B1B.svg)](https://arxiv.org/pdf/2602.17518) [![Repository](https://img.shields.io/badge/GitHub-ASQ-181717?style=flat-square&logo=github)](https://github.com/fpezzuti/ASQ) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) </div> --- ## 📖 Dataset Description **ASQ** (Agentic Search Queryset) is a dataset designed to capture the **search behaviors** of the **RAG agents**. It collects intermediate **synthetic queries**, **retrieved documents**, and **thoughts** (reasoning descriptions) produced or consumed by agents. ### 📊 Dataset Statistics - 615k traces (0.12% incomplete) - 614k answers - 680k synthetic queries - 680k retrieved ranked lists - 3 diverse agent settings - 2 diverse retrieval settings --- ## 🏗️ Data Construction Details about **dataset construction** are available on the **paper**: [arXiv preprint](https://arxiv.org/pdf/2602.17518). See our **GitHub repository** to **reproduce** the construction of ASQ, or to **extend** it:[github.com/fpezzuti/ASQ](https://github.com/fpezzuti/ASQ). --- ## 📂 Dataset Organisation The ASQ dataset is organised hierarchically under the `traces/` directory: ```markdown traces/ └── dataset/ └── retriever_config/ └── agent_family/ └── model/ ├── answers.tsv ├── iter_queries.tsv ├── retrieved_docs.tsv └── thoughts.tsv ``` Placeholders: - dataset: the base dataset from which traces were collected (e.g., HotpotQA-test). - retriever_config: retrieval pipeline's configuration (e.g., BM25_k100_electra_k3). - agent_family: type of agent generating the traces (e.g., Autorefine). - model: generator variant (e.g., Qwen-7B). --- ## 📦 Artifacts Each collection of traces comprises four TSV artifact files, with rows associated with a qid of the organic query: | Artifact | Description | |-------|-------------| | `answers.tsv` | Answers generated by the agent for each query. | | `iter_queries.tsv` | Synthetic queries generated by the agent per query and iteration. | | `retrieved_docs.tsv` | Ranked list of documents retrieved per query and iteration. | | `thoughts.tsv` | CoT reasoning "thoughts" of the agent per query and iteration.| ### 🏁 Answers Columns: - `qid` (string): qid of the organic query. - `answer` (string): answer generated by the agent. ### ✍️ Synthetic Queries Columns: - `qid` (string): qid of the organic query. - `iteration` (integer): iteration number within agent's inference loop. - `llm_query` (string): synthetic query generated at that iteration. ### 🔍 Retrieved Documents Columns: - `qid` (string): qid of the organic query. - `iteration` (integer): iteration number within the agent's inference loop. - `docid` (integer): identifier of the retrieved document. - `rank` (integer): rank of the document in the ranked list retrieved during that iteration. ### 💭 Thoughts Columns: - `qid` (string): qid of the organic query. - `iteration` (integer): iteration number within the agent's inference loop. - `thought` (string): chain-of-thought thought produced by the agent at that iteration. --- ## 🚀 Usage Please see our **GitHub repository**:[github.com/fpezzuti/ASQ](https://github.com/fpezzuti/ASQ). --- ## ⚖️ License The ASQ dataset is released under the [*MIT License*](https://opensource.org/license/mit). Individual source datasets may have their own licenses. --- ## 🛠️ Ethics Statement ASQ is derived from publicly available datasets and is intended solely for research on agentic search behaviour. The authors do not endorse or assume responsibility for the content or any biases present in the traces. The contents of these traces should not be interpreted as representing the views of the researchers or their institutions. Users are advised to apply safety and content filters when using them. --- ## 🔗 Citation If you find our work useful, please cite it as follows: ```bibtex @misc{fpezzuti2026asq, title={A Picture of Agentic Search}, author={Pezzuti, Francesca and Frieder, Ophir and Silvestri, Fabrizio and MacAvaney, Sean and Tonellotto, Nicola}, year={2026}, eprint={2602.17518}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/pdf/2602.17518}, } ```
提供机构:
AgenticSearchQueryset
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作