kpriyanshu256/databricks-dolly-15k-hi

Name: kpriyanshu256/databricks-dolly-15k-hi
Creator: kpriyanshu256
Published: 2024-01-05 05:58:52
License: 暂无描述

Hugging Face2024-01-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/kpriyanshu256/databricks-dolly-15k-hi

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: instruction dtype: string - name: context dtype: string - name: response dtype: string - name: category dtype: string splits: - name: train num_bytes: 30106504 num_examples: 15011 download_size: 11723675 dataset_size: 30106504 language: - hi size_categories: - 10K<n<100K --- # Dataset Card for "databricks-dolly-15k-hi" This dataset was created by splitting data in [dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset into sentences and then translating them using [NLLB-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) model.

This dataset was created by splitting data in the dolly-15k dataset into sentences and then translating them using the NLLB-200-3.3B model. The dataset contains four features: instruction, context, response, and category, all of which are string types. The dataset is divided into a training set, containing 15011 samples, with a total size of 30106504 bytes. The dataset is in Hindi language, with a size between 10K and 100K.

提供机构：

kpriyanshu256

原始信息汇总

数据集概述

数据集名称

databricks-dolly-15k-hi

数据集来源

该数据集是通过将dolly-15k数据集中的数据分割成句子，并使用NLLB-200-3.3B模型进行翻译创建的。

数据集配置

默认配置（default）
- 数据文件路径：data/train-*

数据集特征

特征名称及数据类型：
- instruction: string
- context: string
- response: string
- category: string

数据集分割

训练集（train）
- 字节数：30106504
- 样本数：15011

数据集大小

下载大小：11723675
实际大小：30106504

语言

印地语（hi）

数据集规模

10K<n<100K

5,000+

优质数据集

54 个

任务类型

进入经典数据集