five

CoQCat

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10362294
下载链接
链接失效反馈
官方服务:
资源简介:
CoQCat is a dataset for Conversational Question Answering in Catalan. It is based on CoQA dataset. CoQCat comprises 89,364 question-answer pairs, sourced from conversations related to 6,000 text passages from six different domains. The questions and responses are designed to maintain a conversational tone. The answers are presented in a free-form text format, with evidence highlighted from the passage. For the development and test sets, an additional 2 responses to each question have been collected. This work is licensed under a CC BY-NC-ND 4.0 International License. In this repository you'll find the following items: dataset: folder with the dataset as published in HuggingFace reports: folder with the reports done as feedback to the annotators during the annoation process stats: folder with some statistics about each batch, taken into account to write the reports coqcat_guidelines.pdf: the guidelines provided to the annotation team This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA. Thanks to M. Carme Marí and the VilaWeb team for allowing us to use their texts. And also to all the Catalan Wikipedia and Gutenberg Project volunteers all their work.

CoQCat是一款面向加泰罗尼亚语会话式问答任务的数据集,其构建依托于CoQA数据集。 CoQCat共计包含89364条问答对,数据源自覆盖6个不同领域的6000段文本所衍生的会话内容。 该数据集的问题与回答均以会话式交互语气进行设计。 回答采用自由文本格式呈现,并会从源文本中高亮标注出作为作答依据的佐证内容。 针对开发集与测试集,研究人员为每个问题额外收集了2条补充回答。 本数据集采用CC BY-NC-ND 4.0国际许可协议进行授权。 本仓库包含以下内容: dataset文件夹:存储已在HuggingFace平台发布的官方数据集文件 reports文件夹:存储标注流程中向标注人员反馈的各类标注报告 stats文件夹:包含各批次数据的相关统计信息,用于撰写前述标注报告 coqcat_guidelines.pdf:提供给标注团队的标注指南文档 本研究工作由加泰罗尼亚自治区政府副主席事务与数字政策及领土部(Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya)在Projecte AINA项目框架下资助完成。 感谢M. Carme Marí与VilaWeb团队授权我们使用其文本资源,同时感谢所有加泰罗尼亚维基百科及古登堡计划(Gutenberg Project)的志愿者们的辛勤付出。
创建时间:
2024-02-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作