five

2A2I/Aya-Command.R-DPO

收藏
Hugging Face2024-05-16 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/2A2I/Aya-Command.R-DPO
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: int64 - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: model dtype: string splits: - name: train num_bytes: 12425890 num_examples: 14210 download_size: 5931222 dataset_size: 12425890 configs: - config_name: default data_files: - split: train path: data/train-* license: apache-2.0 language: - ar tags: - dpo - orpo --- # 🤗 Dataset Card for "Aya-Command.R-DPO" ### Dataset Sources & Infos - **Data Origin**: Derived from the Arabic Aya (2A) dataset : [2A2I/Arabic_Aya](https://huggingface.co/datasets/2A2I/Arabic_Aya?row=1) which is a Curated Subset of the Aya Collection [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) - **Languages**: Modern Standard Arabic (MSA) - **License:** Apache-2.0 - **Maintainers:** [Ali Elfilali](https://huggingface.co/Ali-C137) and [Mohammed Machrouh](https://huggingface.co/medmac01) ### Purpose `Aya-Command.R-DPO` is a DPO dataset designed to advance Arabic NLP by comparing human-generated responses, labeled as "chosen," with AI-generated responses, marked as "rejected." This approach helps improve the performance of Arabic language models by guiding them to produce more human-like and contextually appropriate responses. ### Usage This dataset can be used to train and evaluate Arabic NLP models, particularly in tasks requiring nuanced language understanding and generation. By utilizing this dataset, researchers and developers can refine AI models to better distinguish between high-quality, human-like responses and less effective AI-generated ones, leading to more accurate and contextually relevant language models. #### Use with HuggingFace To load this dataset with Datasets, you'll need to install the datasets library with : ``` pip install datasets --upgrade ``` and then use the following code: ```python from datasets import load_dataset dataset = load_dataset("2A2I/Aya-Command.R-DPO") ``` ### Contribution and Collaborative Engagement Find 'Aya-Command.R-DPO' on the Hugging Face Hub at [2A2I/Aya-Command.R-DPO](https://huggingface.co/datasets/2A2I/Aya-Command.R-DPO), where community contributions are welcomed. Users are invited to share feedback and propose enhancements. ### Support and Collaborate We are dedicated to cultivating an inclusive and encouraging space for Arabic AI and NLP research. For assistance, collaboration opportunities, or inquiries related to the dataset, please connect with us through the Hugging Face Hub's discussion section or contact us via [2A2I Contact Email](arabic.ai.initiative@gmail.com).😀
提供机构:
2A2I
原始信息汇总

数据集概述

数据集基本信息

  • 名称: Aya-Command.R-DPO
  • 来源: 衍生自2A2I/Arabic_Aya,该数据集是CohereForAI/aya_dataset的一个精选子集。
  • 语言: 现代标准阿拉伯语(MSA)
  • 许可证: Apache-2.0
  • 维护者: Ali Elfilali 和 Mohammed Machrouh

数据集特征

  • id: int64
  • prompt: string
  • chosen: string
  • rejected: string
  • model: string

数据集划分

  • 训练集:
    • 大小: 12425890字节
    • 示例数量: 14210

数据集用途

  • 用于比较人类生成的“chosen”响应与AI生成的“rejected”响应,以提升阿拉伯语自然语言处理模型的性能。
  • 可用于训练和评估需要细致语言理解和生成的阿拉伯语NLP模型。

数据集加载

  • 使用HuggingFace的datasets库加载数据集,需执行以下代码: python from datasets import load_dataset dataset = load_dataset("2A2I/Aya-Command.R-DPO")

社区参与

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作