five

kpriyanshu256/databricks-dolly-15k-hi

收藏
Hugging Face2024-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kpriyanshu256/databricks-dolly-15k-hi
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: instruction dtype: string - name: context dtype: string - name: response dtype: string - name: category dtype: string splits: - name: train num_bytes: 30106504 num_examples: 15011 download_size: 11723675 dataset_size: 30106504 language: - hi size_categories: - 10K<n<100K --- # Dataset Card for "databricks-dolly-15k-hi" This dataset was created by splitting data in [dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset into sentences and then translating them using [NLLB-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) model.

This dataset was created by splitting data in the dolly-15k dataset into sentences and then translating them using the NLLB-200-3.3B model. The dataset contains four features: instruction, context, response, and category, all of which are string types. The dataset is divided into a training set, containing 15011 samples, with a total size of 30106504 bytes. The dataset is in Hindi language, with a size between 10K and 100K.
提供机构:
kpriyanshu256
原始信息汇总

数据集概述

数据集名称

  • databricks-dolly-15k-hi

数据集来源

  • 该数据集是通过将dolly-15k数据集中的数据分割成句子,并使用NLLB-200-3.3B模型进行翻译创建的。

数据集配置

  • 默认配置(default)
    • 数据文件路径:data/train-*

数据集特征

  • 特征名称及数据类型:
    • instruction: string
    • context: string
    • response: string
    • category: string

数据集分割

  • 训练集(train)
    • 字节数:30106504
    • 样本数:15011

数据集大小

  • 下载大小:11723675
  • 实际大小:30106504

语言

  • 印地语(hi)

数据集规模

  • 10K<n<100K
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作