pig4431/HeQ_v1
收藏Hugging Face2023-08-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pig4431/HeQ_v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- question-answering
language:
- he
size_categories:
- 1K<n<10K
---
# Dataset Card for HeQ_v1
## Dataset Description
- **Homepage:** [HeQ - Hebrew Question Answering Dataset](https://github.com/NNLP-IL/Hebrew-Question-Answering-Dataset)
- **Repository:** [GitHub Repository](https://github.com/NNLP-IL/Hebrew-Question-Answering-Dataset)
- **Paper:** [HeQ: A Dataset for Hebrew Question Answering](https://u.cs.biu.ac.il/~yogo/heq.pdf)
- **Leaderboard:** N/A
### Dataset Summary
HeQ is a question answering dataset in Modern Hebrew, consisting of 30,147 questions. It follows the format and crowdsourcing methodology of SQuAD and ParaShoot, with paragraphs sourced from Hebrew Wikipedia and Geektime.
### Supported Tasks and Leaderboards
- **Task:** Question Answering
### Languages
- Hebrew (he)
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
- **ID:** `string`
- **Title:** `string`
- **Source:** `string`
- **Context:** `string`
- **Question:** `string`
- **Answers:** `string`
- **Is_Impossible:** `bool`
- **WH_Question:** `string`
- **Question_Quality:** `string`
### Data Splits
- **Train:** 27,142 examples
- **Test:** 1,504 examples
- **Validation:** 1,501 examples
## Dataset Creation
### Curation Rationale
The dataset was created to provide a resource for question answering research in Hebrew.
### Source Data
#### Initial Data Collection and Normalization
Paragraphs were sourced from Hebrew Wikipedia and Geektime.
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
A team of crowdworkers formulated and answered reading comprehension questions.
#### Who are the annotators?
crowdsourced
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
License: cc-by-4.0
### Citation Information
[More Information Needed]
### Contributions
Contributions and additional information are welcome.
提供机构:
pig4431
原始信息汇总
数据集卡片 for HeQ_v1
数据集描述
- 数据集概述: HeQ 是一个现代希伯来语的问答数据集,包含 30,147 个问题。它遵循 SQuAD 和 ParaShoot 的格式和众包方法,段落来源于希伯来语维基百科和 Geektime。
- 支持的任务: 问答
- 语言: 希伯来语 (he)
数据集结构
数据实例
[更多信息需要]
数据字段
- ID:
string - Title:
string - Source:
string - Context:
string - Question:
string - Answers:
string - Is_Impossible:
bool - WH_Question:
string - Question_Quality:
string
数据分割
- 训练集: 27,142 个样本
- 测试集: 1,504 个样本
- 验证集: 1,501 个样本
数据集创建
策划理由
该数据集的创建旨在为希伯来语问答研究提供资源。
源数据
初始数据收集和规范化
段落来源于希伯来语维基百科和 Geektime。
源语言生产者
[更多信息需要]
标注
标注过程
一个众包团队制定了阅读理解问题并进行回答。
标注者
众包
个人和敏感信息
[更多信息需要]
使用数据的考虑
数据集的社会影响
[更多信息需要]
偏见的讨论
[更多信息需要]
其他已知限制
[更多信息需要]
附加信息
数据集策展人
[更多信息需要]
许可信息
许可:cc-by-4.0
引用信息
[更多信息需要]
贡献
欢迎贡献和提供更多信息。



