Trustworthy-Information-Access/LLM-annotation-msmarco-nq

Name: Trustworthy-Information-Access/LLM-annotation-msmarco-nq
Creator: Trustworthy-Information-Access
Published: 2026-04-01 15:16:28
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Trustworthy-Information-Access/LLM-annotation-msmarco-nq

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en tags: - IR - Retrieval - RAG - Annotation size_categories: - 10K<n<100K --- <div align="center"> <h1> Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation </h1> </div> <p align="center"> 📖 <a href="https://aclanthology.org/2025.emnlp-main.88/"><strong>arXiv Paper</strong></a> (Accepted to EMNLP 2025 Main 🎉) | 🤗 <a href="https://huggingface.co/hengranZhang/Utility_focused_annotation"><strong>Model</strong></a> | 🤗 <a href="https://huggingface.co/datasets/fnlp/OmniAction"><strong>Dataset</strong></a> | 🛠️ <a href="https://github.com/Trustworthy-Information-Access/Utility-Focused-LLM-Annotation"><strong>Github</strong></a> | </p> --- We explore the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations. We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. Using the Qwen2.5-32B model and Qwen3-32B, we annotate utility on the MS MARCO dataset and NQ dataset. ## 📦 Utility-Focused Annotation for IR and RAG Dataset ![Framework](https://raw.githubusercontent.com/Trustworthy-Information-Access/Utility-Focused-LLM-Annotation/main/framework.jpg) We introduce Utility-Focused Annotation for IR and RAG, a large-scale LLM-annotated retrieval dataset. - **MS MARCO**: About 500K queries - **NQ**: About 50K queries. ## ⭐️ Architecture annotation_positive.tsv: query_id \t pos_d1,pos_d2,pos_d3,... ## 👋 Citation If you find our paper and code useful in your research, please cite our paper. ```bibtex @inproceedings{zhang2025utility, title={Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation}, author={Zhang, Hengran and Tang, Minghao and Bi, Keping and Guo, Jiafeng and Liu, Shihao and Shi, Daiting and Yin, Dawei and Cheng, Xueqi}, booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, pages={1683--1702}, year={2025} } ```

提供机构：

Trustworthy-Information-Access

5,000+

优质数据集

54 个

任务类型

进入经典数据集