five

ARPD: The Academic Arabic Research Papers Dataset (corpus).

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10659156
下载链接
链接失效反馈
官方服务:
资源简介:
ARPD: The Academic Arabic Research Papers Dataset. This corpus/dataset contributes to the Arabic language field by providing a novel Academic dataset that can be utilized for various purposes, such as NLP models and conducting text analysis. Also, this dataset contains papers from seven science fields written in Arabic. These fields are Arabic, religion, art, law, education, and agriculture. So, the dataset consists of seven classes based on these science fields.  In addition, this dataset boasts a significant number of 2,133 documents. Also, the dataset was published in different formats, such as PDF files, Text files, and CSV files, to benefit the Arabic research area.   Arabic Article class # papers Art 303 Law 301 Business 313 Religion 304 Agricultural 308 Arabic 303 Education 301 Total 2133 The dataset contains: 1.    All the PDF files are classified into appropriate classes. (PDF.zip) 2.    The text files obtained after converting the PDF and processing. (Text file.zip) 3.    The text file is a new version after applying some processing steps. The preprocessing step includes Arabic normalization for alef, teh, and ligature, removes tashkeel, harakat, tweel, and shadda, and stops word removal. (TextWithPreprocess.zip) 4. The CSV file contains two columns: the first column of the paper and the second column, which includes the class. (ARPD.csv , ARPD-prepocess.csv)
创建时间:
2025-03-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作