kpriyanshu256/databricks-dolly-15k-hi
收藏Hugging Face2024-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kpriyanshu256/databricks-dolly-15k-hi
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: instruction
dtype: string
- name: context
dtype: string
- name: response
dtype: string
- name: category
dtype: string
splits:
- name: train
num_bytes: 30106504
num_examples: 15011
download_size: 11723675
dataset_size: 30106504
language:
- hi
size_categories:
- 10K<n<100K
---
# Dataset Card for "databricks-dolly-15k-hi"
This dataset was created by splitting data in [dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset into sentences and then translating them using [NLLB-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) model.
This dataset was created by splitting data in the dolly-15k dataset into sentences and then translating them using the NLLB-200-3.3B model. The dataset contains four features: instruction, context, response, and category, all of which are string types. The dataset is divided into a training set, containing 15011 samples, with a total size of 30106504 bytes. The dataset is in Hindi language, with a size between 10K and 100K.
提供机构:
kpriyanshu256
原始信息汇总
数据集概述
数据集名称
- databricks-dolly-15k-hi
数据集来源
- 该数据集是通过将dolly-15k数据集中的数据分割成句子,并使用NLLB-200-3.3B模型进行翻译创建的。
数据集配置
- 默认配置(default)
- 数据文件路径:data/train-*
数据集特征
- 特征名称及数据类型:
- instruction: string
- context: string
- response: string
- category: string
数据集分割
- 训练集(train)
- 字节数:30106504
- 样本数:15011
数据集大小
- 下载大小:11723675
- 实际大小:30106504
语言
- 印地语(hi)
数据集规模
- 10K<n<100K



