five

2A2I/Arabic-OpenHermes-2.5

收藏
Hugging Face2024-03-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/2A2I/Arabic-OpenHermes-2.5
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar license: apache-2.0 size_categories: - 100K<n<1M dataset_info: features: - name: title dtype: string - name: category dtype: string - name: system_prompt dtype: string - name: topic dtype: string - name: avatarUrl dtype: string - name: model dtype: string - name: hash dtype: string - name: skip_prompt_formatting dtype: bool - name: custom_instruction dtype: bool - name: idx dtype: string - name: language dtype: string - name: views dtype: float64 - name: source dtype: string - name: model_name dtype: string - name: id dtype: string - name: user dtype: string - name: gpt dtype: string - name: conversations dtype: string splits: - name: train num_bytes: 3878191096 num_examples: 981618 download_size: 1685705250 dataset_size: 3878191096 configs: - config_name: default data_files: - split: train path: data/train-* tags: - synthetic - GPT-4 - Distillation - Compilation --- # Dataset Card for "Arabic-OpenHermes-2.5" <img src="./Arabic-OpenHermes-2.5.png" width="350" alt="Original Dataset Card of Arabic-OpenHermes-2.5 by 2A2I"> ### Dataset Sources & Infos - **Data Origin**: Derived from the original OpenHermes dataset : [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5). - **Languages**: Modern Standard Arabic (MSA) - **Applications**: `Language Modeling` - **Maintainer:** [Marwa El Kamil](https://huggingface.co/maghwa) & [Mohammed Machrouh](https://huggingface.co/medmac01) - **License:** Apache-2.0 ### Overview `Arabic-OpenHermes-2.5` is a carefully curated dataset extracted / translated from the OpenHermes-2.5 collection provided by [teknium](https://huggingface.co/teknium). ### Purpose `Arabic-OpenHermes-2.5` streamlines Arabic language research and applications by offering a high quality text resource in the conversational style to help better alignement of the Arabic Base LLMs, saving time and effort for researchers, technologists, and linguists in Arabic NLP/AI projects. - Enjoy using Arabic-OpenHermes-2.5 dataset directly for your Arabic applications and research! 😀 ### Usage This dataset serves as an essential tool for those venturing into Arabic language projects, spanning from academic research to commercial applications. By presenting a source of Arabic text, `Arabic-OpenHermes-2.5` empowers users to plunge directly into model `finetuning`, analysis, and application development, eliminating the initial challenges of synthetic data creation. #### Use with HuggingFace To load this dataset with Datasets, you'll need to install the datasets library with `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("2A2I/Arabic-OpenHermes-2.5") ``` ### Contribution and Collaborative Engagement Find 'Arabic-OpenHermes-2.5' on the Hugging Face Hub at [2A2I/Arabic-OpenHermes-2.5](https://huggingface.co/datasets/2A2I/Arabic-OpenHermes-2.5), where community contributions are welcomed. Users are invited to share feedback and propose enhancements. ### Support and Collaborate We are dedicated to cultivating an inclusive and encouraging space for Arabic AI and NLP research. For assistance, collaboration opportunities, or inquiries related to the dataset, please connect with us through the Hugging Face Hub's discussion section or contact us via [2A2I Contact Email](arabic.ai.initiative@gmail.com). --- # Original Dataset Card of OpenHermes-2.5 by teknium <img src="https://cdn-uploads.huggingface.co/production/uploads/64d5698102e58cc1fdd0b585/nWQ7oqq4fUSaGsvmNAsr2.png" width="350" alt="Original Dataset Card of OpenHermes by teknium"> ## Dataset Summary The Open Hermes 2/2.5 and Nous Hermes 2 models have recently achieved noteworthy progress in state-of-the-art language models (LLMs). These advancements are rooted in the innovative utilization of large-scale training data, specifically tailored for language modeling tasks. For further information, please visit [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5). We hope the `Arabic-OpenHermes-2.5` dataset serves your needs well and propels your Arabic NLP endeavors to new heights! ## Citation ```bibtex @misc{OpenHermes 2.5, title = {OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants}, author = {Teknium}, year = {2023}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/teknium/OpenHermes-2.5} } ``` ```bibtex @misc{Arabic OpenHermes 2.5, title = {Arabic OpenHermes 2.5: An Arabic version of Synthetic Data for Generalist Arabic LLM Assistants}, author = {Marwa El Kamil, Mohammed Machrouh}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/2A2I/Arabic-OpenHermes-2.5} } ```
提供机构:
2A2I
原始信息汇总

数据集概述

名称: Arabic-OpenHermes-2.5

来源: 该数据集是从OpenHermes-2.5集合中提取并翻译得到的,原始集合由teknium提供。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作