five

ImruQays/Alukah-Arabic

收藏
Hugging Face2024-03-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ImruQays/Alukah-Arabic
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar license: cc-by-4.0 --- # Introduction This dataset is a comprehensive collection of articles sourced from the Alukah website, a renowned platform offering extensive content primarily in Arabic. Alukah is known for its high-quality Arabic prose, significantly surpassing the standard found in contemporary media outlets. The majority of the articles are contributed by Muslim scholars, encompassing a wide range of topics related to Islam and the Muslim community. The dataset also includes a valuable section on fatwas, which could be instrumental in developing question-answer datasets for Islamic jurisprudence. ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Language(s) (NLP):** [Arabic, minor content in other languages] - **License:** [Refer to [Alukah terms of use](https://www.alukah.net/pages/terms_of_use.aspx)] ### Dataset Sources <!-- Provide the basic links for the dataset. --> - **Website:** [https://www.alukah.net/] ## Uses The Alukah Arabic Articles Collection is particularly suitable for training large language models (LLMs) in Arabic. It offers a refined variant of the language that stands in contrast to the more commonly found less sophisticated forms in modern media. This dataset is an invaluable resource for: - Language Model Training: Enriching LLMs with high-quality Arabic data, enhancing their understanding and generation capabilities in the language. - Islamic Content Analysis: Providing a rich source of Islamic scholarly articles for research and analysis in religious studies, cultural studies, and linguistics. - Historical and Cultural Research: The dataset can be used as a reference for studying the evolution of Arabic language usage in scholarly contexts. ## Dataset Structure The dataset is organized into 9 files, each representing a distinct section of the Alukah website. It is important to note the potential for duplicate articles across these files, as some topics may overlap. ## Quality of Arabic Writing While the articles on Alukah showcase a superior level of Arabic compared to contemporary writings, it's important to acknowledge that even these articles may not fully match the exemplary standards of classical Arabic literature. For enthusiasts and researchers aiming to explore the pinnacle of Arabic literary excellence, it is recommended to refer to works that are over 200 years old or consult the book "العرنجية" for further insights into the nuances of high-quality Arabic prose.
提供机构:
ImruQays
原始信息汇总

数据集概述

简介

该数据集是从Alukah网站收集的文章集合,这是一个以阿拉伯语为主要内容的知名平台。Alukah以其高质量的阿拉伯语散文著称,远超当代媒体的标准。大部分文章由穆斯林学者撰写,涵盖了与伊斯兰教和穆斯林社区相关的广泛主题。数据集还包括一个关于法特瓦的重要部分,这对于开发伊斯兰教法的问题-答案数据集非常有用。

数据集详情

数据集描述

  • 语言(NLP): 阿拉伯语,少量内容为其他语言。
  • 许可证: 参考Alukah使用条款

数据集来源

  • 网站: [https://www.alukah.net/]

用途

Alukah阿拉伯语文章集合特别适合用于训练大型语言模型(LLMs)的阿拉伯语部分。它提供了与现代媒体中常见的较为不精致的语言形式形成鲜明对比的高质量阿拉伯语数据。该数据集对于以下方面具有重要价值:

  • 语言模型训练: 丰富LLMs的阿拉伯语数据,增强其对阿拉伯语的理解和生成能力。
  • 伊斯兰内容分析: 提供丰富的伊斯兰学者文章资源,用于宗教研究、文化研究和语言学研究。
  • 历史和文化研究: 该数据集可作为研究阿拉伯语在学术语境中使用演变的参考。

数据集结构

数据集分为9个文件,每个文件代表Alukah网站的一个不同部分。需要注意的是,这些文件中可能存在重复的文章,因为某些主题可能会有重叠。

阿拉伯语写作质量

尽管Alukah上的文章展示了比当代写作更高的阿拉伯语水平,但重要的是要承认,即使是这些文章也可能不完全符合古典阿拉伯文学的典范标准。对于那些希望探索阿拉伯文学卓越巅峰的爱好者和研究人员,建议参考200年以上的作品或查阅《العرنجية》一书,以进一步了解高质量阿拉伯语散文的细微差别。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作