five

Pirá

收藏
arXiv2022-02-05 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2202.02398v1
下载链接
链接失效反馈
官方服务:
资源简介:
Pirá数据集是由圣保罗大学创建的一个双语(葡萄牙语-英语)问题回答数据集,专注于海洋和巴西海岸。该数据集包含2261个经过精心策划的问题/答案(QA)集,涵盖4074个文本,内容涉及海洋数据、生物多样性和气候变化。数据集的创建过程涉及收集两个不同的语料库:与巴西海岸相关的科学论文摘要和联合国关于海洋的报告摘录,由254名志愿者手动生成QA集。Pirá数据集旨在支持自然语言处理中的多项任务,如问题回答、信息检索和机器翻译,特别适用于研究双语对话代理面临的挑战。

The Pirá Dataset is a bilingual (Portuguese-English) question answering dataset focused on marine and Brazilian coastal topics, created by the University of São Paulo. It contains 2,261 meticulously curated question-answering (QA) pairs derived from 4,074 texts covering themes including marine data, biodiversity, and climate change. The development of the Pirá Dataset involved compiling two distinct corpora: abstracts of scientific papers related to the Brazilian coast, and excerpts from United Nations reports on marine affairs, with 254 volunteers manually generating all the QA pairs. This dataset is designed to support a range of natural language processing (NLP) tasks, including question answering, information retrieval, and machine translation, and is particularly suited for researching the challenges faced by bilingual dialogue AI agents.
提供机构:
圣保罗大学
创建时间:
2022-02-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作