five

Lala8383/ms-marco-qa-10k

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lala8383/ms-marco-qa-10k
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: other license_name: microsoft-research-license license_link: https://microsoft.github.io/msmarco/ tags: - question-answering - ms-marco - reading-comprehension source_datasets: - microsoft/ms_marco task_categories: - question-answering dataset_info: config_name: default pretty_name: MS MARCO QA Subset (10K) --- # MS MARCO QA Subset (10K) This is a **subset** of the [MS MARCO v1.1 dataset](https://huggingface.co/datasets/microsoft/ms_marco) by Microsoft, sampled for lightweight experimentation. ## Source - **Original dataset**: [microsoft/ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) (v1.1) - **Original paper**: [MS MARCO: A Human Generated MAchine Reading COmprehension Dataset](https://arxiv.org/abs/1611.09268) - **Original authors**: Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng (Microsoft) ## What was changed - Randomly sampled **10,000** examples from the train split (seed=42) - Randomly sampled **1,000** examples from the validation split (seed=42) - No other modifications were made to the data ## License This dataset is derived from MS MARCO, which is released under the [Microsoft Research License](https://microsoft.github.io/msmarco/). Please refer to the original license terms before use. ## Citation If you use this dataset, please cite the original MS MARCO paper: ```bibtex @article{nguyen2016ms, title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset}, author={Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li}, journal={arXiv preprint arXiv:1611.09268}, year={2016} } ```
提供机构:
Lala8383
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作