LMGQS
收藏arXiv2023-05-22 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.13086v1
下载链接
链接失效反馈官方服务:
资源简介:
LMGQS是一个大规模的查询焦点摘要数据集,由微软亚洲研究院创建。该数据集包含超过113万个文档-查询-摘要三元组,覆盖了广泛的文档和问题类型。创建过程中,研究人员利用大规模预训练语言模型InstructGPT从四个通用摘要基准中提取隐藏查询,高效地扩展了标注。LMGQS数据集的应用领域主要集中在查询焦点摘要任务,旨在解决现有QFS数据集规模不足的问题,推动QFS领域的研究发展。
LMGQS is a large-scale query-focused summarization dataset created by Microsoft Research Asia. It contains over 1.13 million document-query-summary triples, covering a wide range of document and question types. During its creation, researchers leveraged the large-scale pre-trained language model InstructGPT to extract hidden queries from four generic summarization benchmarks, enabling efficient annotation scaling. The LMGQS dataset is primarily targeted at the query-focused summarization (QFS) task, aiming to address the insufficient scale of existing QFS datasets and advance research in the QFS domain.
提供机构:
微软亚洲研究院
创建时间:
2023-05-22



