five

mASNQ, mWikiQA, mTRECQA

收藏
arXiv2024-06-15 更新2024-06-18 收录
下载链接:
https://huggingface.co/datasets/matteogabburo/mASNQ, https://huggingface.co/datasets/matteogabburo/mWikiQA, https://huggingface.co/datasets/matteogabburo/mTRECQA
下载链接
链接失效反馈
官方服务:
资源简介:
本研究介绍了三个新的多语言答案句子选择(AS2)数据集:mASNQ、mWikiQA和mTRECQA,涵盖法语、德语、意大利语、葡萄牙语和西班牙语五种欧洲语言,总计超过1亿对问题-答案。这些数据集通过监督自动机器翻译(AMT)从现有的英语AS2数据集如ASNQ、WikiQA和TREC-QA转换而来,使用大型语言模型(LLM)。创建过程包括翻译和质量评估,旨在解决低资源语言在问答系统中的性能差距问题,为多语言AS2模型的训练提供高质量数据。

This study presents three novel multilingual answer sentence selection (AS2) datasets: mASNQ, mWikiQA, and mTRECQA, which cover five European languages including French, German, Italian, Portuguese, and Spanish, with a total of over 100 million question-answer pairs. These datasets are derived from existing English AS2 datasets such as ASNQ, WikiQA, and TREC-QA via supervised automatic machine translation (AMT) utilizing large language models (LLMs). The creation process includes translation and quality assessment, aiming to address the performance gap of low-resource languages in question answering systems and provide high-quality training data for multilingual AS2 models.
提供机构:
特伦托大学
创建时间:
2024-06-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作