dragosnicolae555/RoITD
收藏Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dragosnicolae555/RoITD
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
language:
- ro-RO
license:
- cc-by-4.0
multilinguality:
- monolingual
pretty_name: 'RoITD: Romanian IT Question Answering Dataset'
size_categories:
- unknown
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- extractive-qa
---
## Dataset Summary
We introduce a Romanian IT Dataset (RoITD) resembling SQuAD 1.1. RoITD consists of 9575 Romanian QA pairs formulated by crowd workers. QA pairs are based on 5043 articles from Romanian Wikipedia articles describing IT and household products. Of the total number of questions, 5103 are possible (i.e. the correct answer can be found within the paragraph) and 4472 are not possible (i.e. the given answer is a "plausible answer" and not correct)
## Dataset Structure
The data structure follows the format of SQuAD, which contains several attributes such as **question**, **id**, **text**, `**answer_start**, **is_impossible** and **context**. The paragraph provided to crowd sourcing workers is stored in the field **context**. This incorporates manually-selected paragraphs from Wikipedia. The field **id** is comprised of a randomly assigned unique identification number for the answer-question pair. Only the numbers "0" and "1" are allowed in the **is_impossible** field. The category "A" is assigned the value "0", indicating that the answer is correct. The value "1" corresponds to the category "U", indicating a plausible answer. The question posed by the source crowd source worker is represented by the field **question**. The field **answer_start** keeps track of the character index marking the beginning of an answer.
提供机构:
dragosnicolae555
原始信息汇总
数据集概述
基本信息
- 名称: RoITD: Romanian IT Question Answering Dataset
- 语言: 罗马尼亚语 (ro-RO)
- 许可证: CC-BY-4.0
- 多语言性: 单语种
- 来源: 原始数据集
- 任务类别: 问答
- 任务ID: 抽取式问答 (extractive-qa)
数据集结构
- 数据格式: 遵循SQuAD格式
- 主要字段:
- question: 问题
- id: 唯一识别号
- text: 文本
- answer_start: 答案开始位置
- is_impossible: 是否为不可能答案(0: 可能, 1: 不可能)
- context: 上下文,来自手动选取的Wikipedia段落
数据集内容
- QA对数量: 9575对
- 文章来源: 5043篇罗马尼亚语Wikipedia文章,描述IT和家用产品
- 问题分类:
- 可能问题: 5103个
- 不可能问题: 4472个



