2A2I/Aya-Command.R-DPO

Name: 2A2I/Aya-Command.R-DPO
Creator: 2A2I
Published: 2024-05-16 13:19:51
License: 暂无描述

Hugging Face2024-05-16 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/2A2I/Aya-Command.R-DPO

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: int64 - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: model dtype: string splits: - name: train num_bytes: 12425890 num_examples: 14210 download_size: 5931222 dataset_size: 12425890 configs: - config_name: default data_files: - split: train path: data/train-* license: apache-2.0 language: - ar tags: - dpo - orpo --- # 🤗 Dataset Card for "Aya-Command.R-DPO" ### Dataset Sources & Infos - **Data Origin**: Derived from the Arabic Aya (2A) dataset : [2A2I/Arabic_Aya](https://huggingface.co/datasets/2A2I/Arabic_Aya?row=1) which is a Curated Subset of the Aya Collection [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) - **Languages**: Modern Standard Arabic (MSA) - **License:** Apache-2.0 - **Maintainers:** [Ali Elfilali](https://huggingface.co/Ali-C137) and [Mohammed Machrouh](https://huggingface.co/medmac01) ### Purpose `Aya-Command.R-DPO` is a DPO dataset designed to advance Arabic NLP by comparing human-generated responses, labeled as "chosen," with AI-generated responses, marked as "rejected." This approach helps improve the performance of Arabic language models by guiding them to produce more human-like and contextually appropriate responses. ### Usage This dataset can be used to train and evaluate Arabic NLP models, particularly in tasks requiring nuanced language understanding and generation. By utilizing this dataset, researchers and developers can refine AI models to better distinguish between high-quality, human-like responses and less effective AI-generated ones, leading to more accurate and contextually relevant language models. #### Use with HuggingFace To load this dataset with Datasets, you'll need to install the datasets library with : ``` pip install datasets --upgrade ``` and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("2A2I/Aya-Command.R-DPO") ``` ### Contribution and Collaborative Engagement Find 'Aya-Command.R-DPO' on the Hugging Face Hub at [2A2I/Aya-Command.R-DPO](https://huggingface.co/datasets/2A2I/Aya-Command.R-DPO), where community contributions are welcomed. Users are invited to share feedback and propose enhancements. ### Support and Collaborate We are dedicated to cultivating an inclusive and encouraging space for Arabic AI and NLP research. For assistance, collaboration opportunities, or inquiries related to the dataset, please connect with us through the Hugging Face Hub's discussion section or contact us via [2A2I Contact Email](arabic.ai.initiative@gmail.com).😀

提供机构：

2A2I

原始信息汇总

数据集概述

数据集基本信息

名称: Aya-Command.R-DPO
来源: 衍生自2A2I/Arabic_Aya，该数据集是CohereForAI/aya_dataset的一个精选子集。
语言: 现代标准阿拉伯语（MSA）
许可证: Apache-2.0
维护者: Ali Elfilali 和 Mohammed Machrouh

数据集特征

id: int64
prompt: string
chosen: string
rejected: string
model: string

数据集划分

训练集:
- 大小: 12425890字节
- 示例数量: 14210

数据集用途

用于比较人类生成的“chosen”响应与AI生成的“rejected”响应，以提升阿拉伯语自然语言处理模型的性能。
可用于训练和评估需要细致语言理解和生成的阿拉伯语NLP模型。

数据集加载

使用HuggingFace的datasets库加载数据集，需执行以下代码： python from datasets import load_dataset dataset = load_dataset("2A2I/Aya-Command.R-DPO")

社区参与

欢迎在Hugging Face Hub上对该数据集进行贡献和反馈。

5,000+

优质数据集

54 个

任务类型

进入经典数据集