OdiaGenAI/all_combined_bengali_252k
收藏Hugging Face2023-06-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/OdiaGenAI/all_combined_bengali_252k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- text-generation
language:
- bn
pretty_name: all_combined_bengali_252K
size_categories:
- 100K<n<1M
---
# Dataset Card for all_combined_bengali_252K
## Dataset Description
- **Homepage: https://www.odiagenai.org/**
- **Repository: https://github.com/OdiaGenAI**
- **Point of Contact: Shantipriya Parida, and Sambit Sekhar**
### Dataset Summary
This dataset is a mix of Bengali instruction sets translated from open-source instruction sets:
* Dolly,
* Alpaca,
* ChatDoctor,
* Roleplay
* GSM
In this dataset Bengali instruction, input, and output strings are available.
### Supported Tasks and Leaderboards
Large Language Model (LLM)
### Languages
Bengali
## Dataset Structure
JSON
### Data Fields
output (string)
data_source (string)
instruction (string)
input (string)
### Licensing Information
This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg
### Citation Information
If you find this repository useful, please consider giving 👏 and citing:
```
@misc{OdiaGenAI,
author = {Shantipriya Parida and Sambit Sekhar and Guneet Singh Kohli and Arghyadeep Sen and Shashikanta Sahoo},
title = {Bengali Instruction Set},
year = {2023},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/OdiaGenAI}},
}
```
### Contributions
- Shantipriya Parida
- Sambit Sekhar
- Guneet Singh Kohli
- Arghyadeep Sen
- Shashikanta Sahoo
提供机构:
OdiaGenAI
原始信息汇总
数据集概述
数据集名称
- pretty_name: all_combined_bengali_252K
数据集描述
- 语言: Bengali
- 数据集内容: 包含Bengali指令集,输入和输出字符串,翻译自多个开源指令集,包括Dolly, Alpaca, ChatDoctor, Roleplay, GSM。
- 任务类别: text-generation
- 数据结构: JSON
数据集结构
- 数据字段:
- output (string)
- data_source (string)
- instruction (string)
- input (string)
许可证信息
- 许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
贡献者
- Shantipriya Parida
- Sambit Sekhar
- Guneet Singh Kohli
- Arghyadeep Sen
- Shashikanta Sahoo



