HydraIndicLM/bengali_alpaca_dolly_67k
收藏Hugging Face2024-03-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HydraIndicLM/bengali_alpaca_dolly_67k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: id
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 116369973
num_examples: 67017
download_size: 44110061
dataset_size: 116369973
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
## About
This repo contains a 67K instruction set for Bengali, translated from Alpaca and Dolly.
## Citation
If you find this repository useful, please consider giving 👏 and citing:
```
@misc{BengaliAlpacaDolly,
author = {Sambit Sekhar and Shantipriya Parida},
title = {Bengali Instruction Set Based on Alpaca and Dolly},
year = {2023},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/OdiaGenAI}},
}
```
提供机构:
HydraIndicLM
原始信息汇总
数据集信息
特征
- instruction: 数据类型为字符串。
- input: 数据类型为字符串。
- id: 数据类型为字符串。
- output: 数据类型为字符串。
数据分割
- train: 包含116,369,973字节的数据,共有67,017个样本。
数据大小
- 下载大小: 44,110,061字节。
- 数据集大小: 116,369,973字节。
配置
- default: 包含训练数据文件,路径为
data/train-*。



