aaditya/orca_dpo_pairs-Hindi_

Name: aaditya/orca_dpo_pairs-Hindi_
Creator: aaditya
Published: 2024-02-11 16:10:49
License: 暂无描述

Hugging Face2024-02-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/aaditya/orca_dpo_pairs-Hindi_

下载链接

链接失效反馈

官方服务：

资源简介：

`aaditya/orca_dpo_pairs-Hindi`是一个开源的印地语版本数据集，基于Intel/orca_dpo_pairs。该数据集可用于学术或商业目的，主要用于训练大型语言模型（LLMs）、生成合成数据和数据增强。数据集包含英语和印地语的系统、问题、选择和拒绝回答，以及问题类型等特征。数据集分为训练集，包含10305个示例，总大小为81624654字节。数据集支持的任务包括训练LLMs、合成数据生成和数据增强，语言为印地语。数据集遵循Creative Commons Attribution-ShareAlike 3.0 Unported License许可。

aaditya/orca_dpo_pairs-Hindi is an open source Hindi version dataset based on Intel/orca_dpo_pairs. It includes features such as English and Hindi text fields, IDs, and question types. The dataset is split into a training set with 10,305 examples. It is tagged with hindi, codemix, hinglish, india, and dpo. The dataset supports tasks like training LLMs, synthetic data generation, and data augmentation. It is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.

提供机构：

aaditya

原始信息汇总

数据集概述

数据集信息

特征列表:
- en_system: 字符串类型
- en_question: 字符串类型
- en_chosen: 字符串类型
- en_rejected: 字符串类型
- id: 字符串类型
- hindi_system: 字符串类型
- hindi_question: 字符串类型
- hindi_chosen: 字符串类型
- hindi_rejected: 字符串类型
- hindi_question_type: 字符串类型
数据分割:
- train: 包含10305个样本，占用81624654字节
数据集大小:
- 下载大小: 32979686字节
- 数据集大小: 81624654字节
配置:
- default: 包含训练数据文件，路径为data/train-*
标签:
- 包含标签: hindi, codemix, hinglish, india, dpo

支持的任务

训练大型语言模型（LLMs）
合成数据生成
数据增强

语言

主要语言: 印地语

版本

版本: 1.0

引用

@misc {orca_dpo_hindi_, author = { Pal, Ankit }, title = { orca_dpo_pairs-Hindi_}, year = 2024, url = { https://huggingface.co/datasets/aaditya/orca_dpo_pairs-Hindi_ }, doi = { 10.57967/hf/1759 }, publisher = { Hugging Face } }

5,000+

优质数据集

54 个

任务类型

进入经典数据集