Trustworthy-Information-Access/LLM-annotation-msmarco-nq
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Trustworthy-Information-Access/LLM-annotation-msmarco-nq
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
tags:
- IR
- Retrieval
- RAG
- Annotation
size_categories:
- 10K<n<100K
---
<div align="center">
<h1>
Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation
</h1>
</div>
<p align="center">
📖 <a href="https://aclanthology.org/2025.emnlp-main.88/"><strong>arXiv Paper</strong></a> (Accepted to EMNLP 2025 Main 🎉) |
🤗 <a href="https://huggingface.co/hengranZhang/Utility_focused_annotation"><strong>Model</strong></a> |
🤗 <a href="https://huggingface.co/datasets/fnlp/OmniAction"><strong>Dataset</strong></a> |
🛠️ <a href="https://github.com/Trustworthy-Information-Access/Utility-Focused-LLM-Annotation"><strong>Github</strong></a> |
</p>
---
We explore the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations.
We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. Using the Qwen2.5-32B model and Qwen3-32B, we annotate utility on the MS MARCO dataset and NQ dataset.
## 📦 Utility-Focused Annotation for IR and RAG Dataset

We introduce Utility-Focused Annotation for IR and RAG, a large-scale LLM-annotated retrieval dataset.
- **MS MARCO**: About 500K queries
- **NQ**: About 50K queries.
## ⭐️ Architecture
annotation_positive.tsv: query_id \t pos_d1,pos_d2,pos_d3,...
## 👋 Citation
If you find our paper and code useful in your research, please cite our paper.
```bibtex
@inproceedings{zhang2025utility,
title={Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation},
author={Zhang, Hengran and Tang, Minghao and Bi, Keping and Guo, Jiafeng and Liu, Shihao and Shi, Daiting and Yin, Dawei and Cheng, Xueqi},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={1683--1702},
year={2025}
}
```
提供机构:
Trustworthy-Information-Access



