Catsandmeowreuploads/metrec-reupload
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Catsandmeowreuploads/metrec-reupload
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- ar
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-classification
task_ids: []
paperswithcode_id: metrec
pretty_name: MetRec
tags:
- poetry-classification
dataset_info:
config_name: plain_text
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': saree
'1': kamel
'2': mutakareb
'3': mutadarak
'4': munsareh
'5': madeed
'6': mujtath
'7': ramal
'8': baseet
'9': khafeef
'10': taweel
'11': wafer
'12': hazaj
'13': rajaz
splits:
- name: train
num_bytes: 5874899
num_examples: 47124
- name: test
num_bytes: 1037573
num_examples: 8316
download_size: 3979947
dataset_size: 6912472
configs:
- config_name: plain_text
data_files:
- split: train
path: plain_text/train-*
- split: test
path: plain_text/test-*
default: true
---
# Notice
This is a reupload of metrec, incase it ever gets deleted. If you request this dataset must be deleted, please say so in the community tab.
# Dataset Card for MetRec
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Metrec](https://github.com/zaidalyafeai/MetRec)
- **Repository:** [Metrec repository](https://github.com/zaidalyafeai/MetRec)
- **Paper:** [MetRec: A dataset for meter classification of arabic poetry](https://www.sciencedirect.com/science/article/pii/S2352340920313792)
- **Point of Contact:** [Zaid Alyafeai](mailto:alyafey22@gmail.com)
### Dataset Summary
The dataset contains the verses and their corresponding meter classes.
Meter classes are represented as numbers from 0 to 13.
The dataset can be highly useful for further research in order to improve the field of Arabic poems’ meter classification.
The train dataset contains 47,124 records and the test dataset contains 8,316 records.
### Supported Tasks and Leaderboards
The dataset was published on this [paper](https://www.sciencedirect.com/science/article/pii/S2352340920313792). A benchmark is acheived on this [paper](https://www.sciencedirect.com/science/article/pii/S016786552030204X).
### Languages
The dataset is based on Arabic.
## Dataset Structure
### Data Instances
A typical data point comprises a label which is out of 13 classes and a verse part of poem.
### Data Fields
[N/A]
### Data Splits
The data is split into a training and testing. The split is organized as the following
| | train | test |
|------------|-------:|------:|
| data split | 47,124 | 8,316 |
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
The dataset was collected from [Aldiwan](https://www.aldiwan.net/).
#### Who are the source language producers?
The poems are from different poets.
### Annotations
The dataset does not contain any additional annotations.
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
```
@article{metrec2020,
title={MetRec: A dataset for meter classification of arabic poetry},
author={Al-shaibani, Maged S and Alyafeai, Zaid and Ahmad, Irfan},
journal={Data in Brief},
year={2020},
publisher={Elsevier}
}
```
### Contributions
Thanks to [@zaidalyafeai](https://github.com/zaidalyafeai) for adding this dataset.
annotations_creators:
- 无注释
language_creators:
- 采集获取
language:
- 阿拉伯语(Arabic)
license:
- 未知
multilinguality:
- 单语言
size_categories:
- 1万<n<10万
source_datasets:
- 原生数据集
task_categories:
- 文本分类
task_ids: []
paperswithcode_id: metrec
pretty_name: MetRec
tags:
- 诗歌分类
dataset_info:
config_name: plain_text
features:
- name: 文本
dtype: 字符串
- name: 标签
dtype:
class_label:
names:
'0': 萨里体(saree)
'1': 卡米勒体(kamel)
'2': 穆塔卡里布体(mutakareb)
'3': 穆塔达拉卡特体(mutadarak)
'4': 穆恩萨雷赫体(munsareh)
'5': 马迪德体(madeed)
'6': 穆吉塔斯体(mujtath)
'7': 拉马尔体(ramal)
'8': 巴西特体(baseet)
'9': 哈菲夫体(khafeef)
'10': 塔维尔体(taweel)
'11': 瓦弗尔体(wafer)
'12': 哈扎吉体(hazaj)
'13': 拉贾兹体(rajaz)
splits:
- name: 训练集
num_bytes: 5874899
num_examples: 47124
- name: 测试集
num_bytes: 1037573
num_examples: 8316
download_size: 3979947
dataset_size: 6912472
configs:
- config_name: plain_text
data_files:
- split: 训练集
path: plain_text/train-*
- split: 测试集
path: plain_text/test-*
default: true
---
# 注意事项
本文件为MetRec数据集的重新上传版本,以防原链接失效。若您要求删除该数据集,请在社区专区留言说明。
# MetRec数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与评测基准](#supported-tasks-and-leaderboards)
- [语言覆盖](#languages)
- [数据集结构](#dataset-structure)
- [数据样例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [注释信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据使用注意事项](#considerations-for-using-the-data)
- [数据集社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集描述
- **官方主页**: [MetRec](https://github.com/zaidalyafeai/MetRec)
- **代码仓库**: [MetRec仓库](https://github.com/zaidalyafeai/MetRec)
- **相关论文**: [MetRec:阿拉伯诗歌格律分类数据集](https://www.sciencedirect.com/science/article/pii/S2352340920313792)
- **联系方式**: [扎伊德·阿尔亚费伊(Zaid Alyafeai)](mailto:alyafey22@gmail.com)
### 数据集概述
本数据集包含阿拉伯诗歌诗句及其对应的格律类别,格律类别以0至13的数字进行编码。该数据集可用于推动阿拉伯诗歌格律分类领域的相关研究,具有较高的科研价值。训练集共包含47124条数据样本,测试集共包含8316条数据样本。
### 支持任务与评测基准
本数据集首发于论文[MetRec: A dataset for meter classification of arabic poetry](https://www.sciencedirect.com/science/article/pii/S2352340920313792),另有论文[https://www.sciencedirect.com/science/article/pii/S016786552030204X]基于本数据集构建了评测基准。
### 语言覆盖
本数据集的语言为阿拉伯语。
## 数据集结构
### 数据样例
典型数据样例由一条诗歌诗句与对应的13类格律标签之一组成。
### 数据字段
[N/A]
### 数据划分
数据集划分为训练集与测试集,具体划分情况如下:
| | 训练集 | 测试集 |
|------------|-------:|------:|
| 样本数量 | 47124 | 8316 |
## 数据集构建
### 构建初衷
[需补充更多信息]
### 源数据
[需补充更多信息]
#### 初始数据采集与标准化
本数据集采集自[Aldiwan诗歌网站](https://www.aldiwan.net/)。
#### 源语言内容创作者
本数据集的诗歌来自多位不同的诗人。
### 注释信息
本数据集无额外注释内容。
#### 注释流程
[需补充更多信息]
#### 注释者
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
@article{metrec2020,
title={MetRec: A dataset for meter classification of arabic poetry},
author={Al-shaibani, Maged S and Alyafeai, Zaid and Ahmad, Irfan},
journal={Data in Brief},
year={2020},
publisher={Elsevier}
}
### 贡献致谢
感谢[@zaidalyafeai](https://github.com/zaidalyafeai)为本数据集提交的贡献。
提供机构:
Catsandmeowreuploads



