2A2I/Aya-Command.R-DPO
收藏Hugging Face2024-05-16 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/2A2I/Aya-Command.R-DPO
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: int64
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: model
dtype: string
splits:
- name: train
num_bytes: 12425890
num_examples: 14210
download_size: 5931222
dataset_size: 12425890
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: apache-2.0
language:
- ar
tags:
- dpo
- orpo
---
# 🤗 Dataset Card for "Aya-Command.R-DPO"
### Dataset Sources & Infos
- **Data Origin**: Derived from the Arabic Aya (2A) dataset : [2A2I/Arabic_Aya](https://huggingface.co/datasets/2A2I/Arabic_Aya?row=1) which is a Curated Subset of the Aya Collection [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
- **Languages**: Modern Standard Arabic (MSA)
- **License:** Apache-2.0
- **Maintainers:** [Ali Elfilali](https://huggingface.co/Ali-C137) and [Mohammed Machrouh](https://huggingface.co/medmac01)
### Purpose
`Aya-Command.R-DPO` is a DPO dataset designed to advance Arabic NLP by comparing human-generated responses, labeled as "chosen," with
AI-generated responses, marked as "rejected." This approach helps improve the performance of Arabic language models by guiding them to produce
more human-like and contextually appropriate responses.
### Usage
This dataset can be used to train and evaluate Arabic NLP models, particularly in tasks requiring nuanced language understanding and generation. By utilizing this dataset, researchers and developers can refine AI models to better distinguish between high-quality, human-like responses and
less effective AI-generated ones, leading to more accurate and contextually relevant language models.
#### Use with HuggingFace
To load this dataset with Datasets, you'll need to install the datasets library with :
```
pip install datasets --upgrade
```
and then use the following code:
```python
from datasets import load_dataset
dataset = load_dataset("2A2I/Aya-Command.R-DPO")
```
### Contribution and Collaborative Engagement
Find 'Aya-Command.R-DPO' on the Hugging Face Hub at [2A2I/Aya-Command.R-DPO](https://huggingface.co/datasets/2A2I/Aya-Command.R-DPO), where community contributions are welcomed. Users are invited to share feedback and propose enhancements.
### Support and Collaborate
We are dedicated to cultivating an inclusive and encouraging space for Arabic AI and NLP research. For assistance, collaboration opportunities, or inquiries related to the dataset, please connect with us through the Hugging Face Hub's discussion section or contact us via [2A2I Contact Email](arabic.ai.initiative@gmail.com).😀
提供机构:
2A2I
原始信息汇总
数据集概述
数据集基本信息
- 名称: Aya-Command.R-DPO
- 来源: 衍生自2A2I/Arabic_Aya,该数据集是CohereForAI/aya_dataset的一个精选子集。
- 语言: 现代标准阿拉伯语(MSA)
- 许可证: Apache-2.0
- 维护者: Ali Elfilali 和 Mohammed Machrouh
数据集特征
- id: int64
- prompt: string
- chosen: string
- rejected: string
- model: string
数据集划分
- 训练集:
- 大小: 12425890字节
- 示例数量: 14210
数据集用途
- 用于比较人类生成的“chosen”响应与AI生成的“rejected”响应,以提升阿拉伯语自然语言处理模型的性能。
- 可用于训练和评估需要细致语言理解和生成的阿拉伯语NLP模型。
数据集加载
- 使用HuggingFace的
datasets库加载数据集,需执行以下代码: python from datasets import load_dataset dataset = load_dataset("2A2I/Aya-Command.R-DPO")
社区参与
- 欢迎在Hugging Face Hub上对该数据集进行贡献和反馈。



