kakooch/persian-poetry-qa
收藏Hugging Face2023-10-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kakooch/persian-poetry-qa
下载链接
链接失效反馈官方服务:
资源简介:
---
name: Persian Poetry QA Dataset
description: |
This dataset is structured in a question-answering format derived from a rich collection of Persian poems along with metadata about the poets and the verses.
It is designed to be utilized for various Natural Language Processing and analysis tasks related to Persian poetry, such as Question Answering, Text Generation, Language Modeling, and Style Analysis.
license: gpl-2.0
url: https://github.com/ganjoor/desktop/releases/tag/v2.81
citation: |
Persian Poetry QA Dataset. Collected by Kakooch from the Ganjoor Project.
Available at: https://huggingface.co/datasets/persian_poetry
size: "Custom"
language:
- fa
splits:
train:
description: "This split contains Persian poems structured for QA, where each row asks for a sample poem from a specific poet with the poem or verse as the answer."
validation:
description: "This split contains random selection of 1% of Persian poems in original dataset."
features:
context:
description: "A static string which is 'Persian Poetry or شعر فارسی'."
type: "string"
question:
description: "A string that asks for a sample poem from a specific poet in the format 'یک نمونه از شعر [POET_NAME]'."
type: "string"
answer:
description: "Text of a hemistich or verse."
type: "string"
answer_start:
description: "The starting character index of `answer` within `context` (Note: this is always -1 in the current dataset as `answer` is not a substring of `context`)."
type: "int32"
configs:
- config_name: default
data_files:
- split: train
path: poems-qa.csv
---
# Persian Poetry Dataset
## Dataset Description
### Overview
This dataset contains a collection of Persian poems structured in a question-answering format. The dataset is derived from various Persian poets and their poems, providing a rich source for exploring Persian poetry in a structured manner suitable for machine learning applications, especially in natural language processing tasks like question answering.
### Data Collection
- **Data Collection Source:** The data is sourced from the [Ganjoor project](https://github.com/ganjoor/). The specific database file can be found in the [releases section](https://github.com/ganjoor/desktop/releases/tag/v2.81) of their GitHub repository.
- **Time Period:** Oct-12-2023
- **Collection Methods:** The data was collected by downloading the raw database file from the Ganjoor project's GitHub repository.
### Data Structure
The dataset is structured into a CSV file with the following columns:
- `context`: A static string which is "Persian Poetry or شعر فارسی".
- `question`: A string that asks for a sample poem from a specific poet in the format "یک نمونه از شعر [POET_NAME]".
- `answer`: Text of a hemistich or verse. Verses of a hemistich are TAB SEPARATED
- `answer_start`: The starting character index of `answer` within `context` (Note: this is always -1 in the current dataset as `answer` is not a substring of `context`).
### Data Example
```plaintext
context,question,answer,answer_start
Persian Poetry,یک نمونه از شعر صائب تبریزی,خار نتواند گرفتن دامن ریگ روان رهنورد شوق، افسردن نمی داند که چیست,-1
```
## Dataset Usage
### Use Cases
This dataset can be utilized for various Natural Language Processing and analysis tasks related to Persian poetry, such as:
- Question Answering
- Text Generation
- Language Modeling
- Style Analysis
### Challenges & Limitations
- The `answer_start` field is always -1 as the `answer` is not a substring of `context`. Depending on your use-case, you might need to adjust how `context` and `answer_start` are determined.
- The dataset does not contain long verses that are over 100 characters.
### License
GPL-2 (GNU General Public License) ingerited from original ganjoor project
## Additional Information
### Citation
```
Persian Poetry Dataset. Collected by Kakooch from the Ganjoor Project. Available at: https://huggingface.co/datasets/persian_poetry
```
### Dataset Link
[Download the dataset from Hugging Face](https://huggingface.co/datasets/persian_poetry)
### Contact
Email: [kakooch@gmail.com](mailto:kakooch@gmail.com) | GitHub: [kakooch](https://github.com/kakooch)
---
*This README was generated by Kakooch.*
提供机构:
kakooch
原始信息汇总
Persian Poetry QA Dataset
数据集描述
概述
该数据集包含以问答格式组织的波斯诗歌集合。数据集源自多位波斯诗人和他们的诗歌,为探索波斯诗歌提供了丰富的资源,适用于机器学习应用,特别是在自然语言处理任务中,如问答。
数据结构
数据集结构化为CSV文件,包含以下列:
context: 一个静态字符串,即“Persian Poetry or شعر فارسی”。question: 一个字符串,以“یک نمونه از شعر [POET_NAME]”格式询问特定诗人的一首诗。answer: 半行诗或诗句的文本。answer_start:answer在context中的起始字符索引(注意:在当前数据集中始终为-1,因为answer不是context的子字符串)。
数据示例
plaintext context,question,answer,answer_start Persian Poetry,یک نمونه از شعر صائب تبریزی,خار نتواند گرفتن دامن ریگ روان رهنورد شوق، افسردن نمی داند که چیست,-1
数据集使用
使用场景
该数据集可用于与波斯诗歌相关的多种自然语言处理和分析任务,例如:
- 问答
- 文本生成
- 语言建模
- 风格分析
挑战与限制
answer_start字段始终为-1,因为answer不是context的子字符串。根据您的使用场景,您可能需要调整context和answer_start的确定方式。- 数据集不包含超过100个字符的长诗句。
许可证
GPL-2(GNU通用公共许可证)继承自原始的Ganjoor项目。
附加信息
引用
Persian Poetry Dataset. Collected by Kakooch from the Ganjoor Project. Available at: https://huggingface.co/datasets/persian_poetry



