cidtd-mod-ua/slim-orca-ukrainian
收藏Hugging Face2024-02-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cidtd-mod-ua/slim-orca-ukrainian
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: response
dtype: string
splits:
- name: train
num_bytes: 819383847
num_examples: 351979
download_size: 386064112
dataset_size: 819383847
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
language:
- uk
size_categories:
- 100K<n<1M
license: mit
---
# Slim Orca(Deduped) Translated to Ukrainian 🇺🇦
## Dataset Description
A Ukrainian language dataset comprising 350,000+ records translated from the SlimOrca dataset.
This dataset is suitable for various natural language processing tasks.
Слава Україні!
## Disclaimer
Prepare data before your usage. There are some errors in texts, so be carefull.
## How to Use
This dataset can be loaded using the Hugging Face Datasets library:
```python
from datasets import load_dataset
dataset = load_dataset('cidtd-mod-ua/slim-orca-ukrainian')
```
# Citation
```bibtex
@misc{slim-orca-ukrainian,
title = {slim-orca-ukrainian - translation of SlimOrca},
author = {Center of Innovations and Defence Technologies Development of Ministry of Defence of Ukraine},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/cidtd-mod-ua/slim-orca-200k-translated}
}
```
# Citation from original SlimOrca
```bibtex
@misc{SlimOrca,
title = {SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification},
author = {Wing Lian and Guan Wang and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
year = {2023},
publisher = {HuggingFace},
url = {https://huggingface.co/Open-Orca/SlimOrca}
}
```
```bibtex
@misc{mukherjee2023orca,
title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
year={2023},
eprint={2306.02707},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```bibtex
@misc{longpre2023flan,
title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning},
author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts},
year={2023},
eprint={2301.13688},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
提供机构:
cidtd-mod-ua
原始信息汇总
Slim Orca(Deduped) Translated to Ukrainian 🇺🇦 数据集概述
数据集描述
该数据集是一个包含超过350,000条记录的乌克兰语数据集,由SlimOrca数据集翻译而来。适用于多种自然语言处理任务。
数据集信息
- 特征:
instruction: 字符串类型input: 字符串类型response: 字符串类型
- 分割:
train: 包含351,979个样本,总字节数为819,383,847
- 下载大小: 386,064,112字节
- 数据集大小: 819,383,847字节
- 配置:
default配置包含训练数据文件,路径为data/train-*
- 语言: 乌克兰语
- 大小类别: 100K < n < 1M
- 许可证: MIT
免责声明
在使用数据前请准备好数据。文本中存在一些错误,请谨慎处理。



