Lala8383/ms-marco-qa-10k
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lala8383/ms-marco-qa-10k
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
license_name: microsoft-research-license
license_link: https://microsoft.github.io/msmarco/
tags:
- question-answering
- ms-marco
- reading-comprehension
source_datasets:
- microsoft/ms_marco
task_categories:
- question-answering
dataset_info:
config_name: default
pretty_name: MS MARCO QA Subset (10K)
---
# MS MARCO QA Subset (10K)
This is a **subset** of the [MS MARCO v1.1 dataset](https://huggingface.co/datasets/microsoft/ms_marco) by Microsoft, sampled for lightweight experimentation.
## Source
- **Original dataset**: [microsoft/ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) (v1.1)
- **Original paper**: [MS MARCO: A Human Generated MAchine Reading COmprehension Dataset](https://arxiv.org/abs/1611.09268)
- **Original authors**: Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng (Microsoft)
## What was changed
- Randomly sampled **10,000** examples from the train split (seed=42)
- Randomly sampled **1,000** examples from the validation split (seed=42)
- No other modifications were made to the data
## License
This dataset is derived from MS MARCO, which is released under the
[Microsoft Research License](https://microsoft.github.io/msmarco/).
Please refer to the original license terms before use.
## Citation
If you use this dataset, please cite the original MS MARCO paper:
```bibtex
@article{nguyen2016ms,
title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset},
author={Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li},
journal={arXiv preprint arXiv:1611.09268},
year={2016}
}
```
提供机构:
Lala8383



