pig4431/HeQ_v1

Name: pig4431/HeQ_v1
Creator: pig4431
Published: 2023-08-16 13:13:16
License: 暂无描述

Hugging Face2023-08-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/pig4431/HeQ_v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - question-answering language: - he size_categories: - 1K<n<10K --- # Dataset Card for HeQ_v1 ## Dataset Description - **Homepage:** [HeQ - Hebrew Question Answering Dataset](https://github.com/NNLP-IL/Hebrew-Question-Answering-Dataset) - **Repository:** [GitHub Repository](https://github.com/NNLP-IL/Hebrew-Question-Answering-Dataset) - **Paper:** [HeQ: A Dataset for Hebrew Question Answering](https://u.cs.biu.ac.il/~yogo/heq.pdf) - **Leaderboard:** N/A ### Dataset Summary HeQ is a question answering dataset in Modern Hebrew, consisting of 30,147 questions. It follows the format and crowdsourcing methodology of SQuAD and ParaShoot, with paragraphs sourced from Hebrew Wikipedia and Geektime. ### Supported Tasks and Leaderboards - **Task:** Question Answering ### Languages - Hebrew (he) ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields - **ID:** `string` - **Title:** `string` - **Source:** `string` - **Context:** `string` - **Question:** `string` - **Answers:** `string` - **Is_Impossible:** `bool` - **WH_Question:** `string` - **Question_Quality:** `string` ### Data Splits - **Train:** 27,142 examples - **Test:** 1,504 examples - **Validation:** 1,501 examples ## Dataset Creation ### Curation Rationale The dataset was created to provide a resource for question answering research in Hebrew. ### Source Data #### Initial Data Collection and Normalization Paragraphs were sourced from Hebrew Wikipedia and Geektime. #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process A team of crowdworkers formulated and answered reading comprehension questions. #### Who are the annotators? crowdsourced ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information License: cc-by-4.0 ### Citation Information [More Information Needed] ### Contributions Contributions and additional information are welcome.

提供机构：

pig4431

原始信息汇总

数据集卡片 for HeQ_v1

数据集描述

数据集概述： HeQ 是一个现代希伯来语的问答数据集，包含 30,147 个问题。它遵循 SQuAD 和 ParaShoot 的格式和众包方法，段落来源于希伯来语维基百科和 Geektime。
支持的任务： 问答
语言： 希伯来语 (he)

数据集结构

数据实例

[更多信息需要]

数据字段

ID: string
Title: string
Source: string
Context: string
Question: string
Answers: string
Is_Impossible: bool
WH_Question: string
Question_Quality: string

数据分割

训练集： 27,142 个样本
测试集： 1,504 个样本
验证集： 1,501 个样本

数据集创建

策划理由

该数据集的创建旨在为希伯来语问答研究提供资源。

源数据

初始数据收集和规范化

段落来源于希伯来语维基百科和 Geektime。

源语言生产者

[更多信息需要]

标注

标注过程

一个众包团队制定了阅读理解问题并进行回答。

标注者

众包

个人和敏感信息

[更多信息需要]

使用数据的考虑

数据集的社会影响

[更多信息需要]

偏见的讨论

[更多信息需要]

其他已知限制

[更多信息需要]

附加信息

数据集策展人

[更多信息需要]

许可信息

许可：cc-by-4.0

引用信息

[更多信息需要]

贡献

欢迎贡献和提供更多信息。

5,000+

优质数据集

54 个

任务类型

进入经典数据集