five

NADCG :New Arabic dataset for text classification and generation

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/mrh6fy2dkj
下载链接
链接失效反馈
官方服务:
资源简介:
- NADCG New Arabic dataset for text classification and generation. -NADCG 2,136,311 Rows. -NADCG is a large collection of Arabic news headline, category and articles that can been used in several NLP tasks. -NADCG tasks Text generation, text classification, summarization and producing word-embedding. -NADCG fields Headline, summary, article, and category. - NADCG is larger than other data sets, as its size is 2,136,311 classified news items, in UTF-8 encoding and CSV format. - NADCG is contains vast number of Arabic news have eight categories (Politics, Economics, Sports, Health, Technology, Culture, Arts, Accidents), in general, the corpus adopted the labeling of each article as appeared in its news portal source. In summary, NADCG's large size and variety of fields make it stand out from the crowd, so it can be used for many tasks and also for training large transformer models, and it is also available for free. - NADCG_SUBSET is a balanced benchmark dataset (from NADCG) that is used in our research work (80K). It contains the training (90%), validation (5%) and testing (5%) sets. Training set size: 72000 row, Validation set size: 4000 row, and Testing set size: 4000 row.
创建时间:
2024-09-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作