Yah216/APCD-Poem_Rawiy_detection

Name: Yah216/APCD-Poem_Rawiy_detection
Creator: Yah216
Published: 2022-10-25 10:28:52
License: 暂无描述

Hugging Face2022-10-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Yah216/APCD-Poem_Rawiy_detection

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - ar task_categories: - text-classification --- # AutoTrain Dataset for project: Poem_Rawiy_detection ## Dataset Descritpion We used the APCD dataset cited hereafter for pretraining the model. The dataset has been cleaned and only the main text and the Qafiyah columns were kept: ``` @Article{Yousef2019LearningMetersArabicEnglish-arxiv, author = {Yousef, Waleed A. and Ibrahime, Omar M. and Madbouly, Taha M. and Mahmoud, Moustafa A.}, title = {Learning Meters of Arabic and English Poems With Recurrent Neural Networks: a Step Forward for Language Understanding and Synthesis}, journal = {arXiv preprint arXiv:1905.05700}, year = 2019, url = {https://github.com/hci-lab/LearningMetersPoems} } ``` ### Languages The BCP-47 code for the dataset's language is ar. ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json [ { "text": "\u0643\u0644\u0651\u064c \u064a\u064e\u0632\u0648\u0644\u064f \u0633\u064e\u0631\u064a\u0639\u0627\u064b \u0644\u0627 \u062b\u064e\u0628\u0627\u062a\u064e \u0644\u0647\u064f \u0641\u0643\u064f\u0646 \u0644\u0650\u0648\u064e\u0642\u062a\u0643\u064e \u064a\u0627 \u0645\u0650\u0633\u0643\u064a\u0646\u064f \u0645\u064f\u063a\u062a\u064e\u0646\u0650\u0645\u0627", "target": 27 }, { "text": "\u0648\u0642\u062f \u0623\u0628\u0631\u0632\u064e \u0627\u0644\u0631\u0651\u064f\u0645\u0651\u064e\u0627\u0646\u064f \u0644\u0644\u0637\u0631\u0641\u0650 \u063a\u064f\u0635\u0652\u0646\u064e\u0647\u064f \u0646\u0647\u0648\u062f\u0627\u064b \u062a\u064f\u0635\u0627\u0646\u064f \u0627\u0644\u0644\u0645\u0633\u064e \u0639\u0646 \u0643\u0641\u0651\u0650 \u0623\u062d\u0645\u0642\u0650", "target": 23 } ] ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "text": "Value(dtype='string', id=None)", "target": "ClassLabel(num_classes=35, names=['\u0621', '\u0624', '\u0627', '\u0628', '\u062a', '\u062b', '\u062c', '\u062d', '\u062e', '\u062f', '\u0630', '\u0631', '\u0632', '\u0633', '\u0634', '\u0635', '\u0636', '\u0637', '\u0637\u0646', '\u0638', '\u0639', '\u063a', '\u0641', '\u0642', '\u0643', '\u0644', '\u0644\u0627', '\u0645', '\u0646', '\u0647', '\u0647\u0640', '\u0647\u0646', '\u0648', '\u0649', '\u064a'], id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train | 1347718 | | valid | 336950 |

提供机构：

Yah216

原始信息汇总

数据集概述

数据集名称

AutoTrain Dataset for project: Poem_Rawiy_detection

语言

BCP-47代码：ar

数据集结构

数据实例

示例： json [ { "text": "u0643u0644u0651u064c u064au064eu0632u0648u0644u064f u0633u064eu0631u064au0639u0627u064b u0644u0627 u062bu064eu0628u0627u062au064e u0644u0647u064f u0641u0643u064fu0646 u0644u0650u0648u064eu0642u062au0643u064e u064au0627 u0645u0650u0633u0643u064au0646u064f u0645u064fu063au062au064eu0646u0650u0645u0627", "target": 27 }, { "text": "u0648u0642u062f u0623u0628u0631u0632u064e u0627u0644u0631u0651u064fu0645u0651u064eu0627u0646u064f u0644u0644u0637u0631u0641u0650 u063au064fu0635u0652u0646u064eu0647u064f u0646u0647u0648u062fu0627u064b u062au064fu0635u0627u0646u064f u0627u0644u0644u0645u0633u064e u0639u0646 u0643u0641u0651u0650 u0623u062du0645u0642u0650", "target": 23 } ]

数据集字段

字段： json { "text": "Value(dtype=string, id=None)", "target": "ClassLabel(num_classes=35, names=[u0621, u0624, u0627, u0628, u062a, u062b, u062c, u062d, u062e, u062f, u0630, u0631, u0632, u0633, u0634, u0635, u0636, u0637, u0637u0646, u0638, u0639, u063a, u0641, u0642, u0643, u0644, u0644u0627, u0645, u0646, u0647, u0647u0640, u0647u0646, u0648, u0649, u064a], id=None)" }

数据集分割

分割名称及样本数：

Split name Num samples

train 1347718

valid 336950

5,000+

优质数据集

54 个

任务类型

进入经典数据集