five

cidtd-mod-ua/slim-orca-ukrainian

收藏
Hugging Face2024-02-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cidtd-mod-ua/slim-orca-ukrainian
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: response dtype: string splits: - name: train num_bytes: 819383847 num_examples: 351979 download_size: 386064112 dataset_size: 819383847 configs: - config_name: default data_files: - split: train path: data/train-* language: - uk size_categories: - 100K<n<1M license: mit --- # Slim Orca(Deduped) Translated to Ukrainian 🇺🇦 ## Dataset Description A Ukrainian language dataset comprising 350,000+ records translated from the SlimOrca dataset. This dataset is suitable for various natural language processing tasks. Слава Україні! ## Disclaimer Prepare data before your usage. There are some errors in texts, so be carefull. ## How to Use This dataset can be loaded using the Hugging Face Datasets library: ```python from datasets import load_dataset dataset = load_dataset('cidtd-mod-ua/slim-orca-ukrainian') ``` # Citation ```bibtex @misc{slim-orca-ukrainian, title = {slim-orca-ukrainian - translation of SlimOrca}, author = {Center of Innovations and Defence Technologies Development of Ministry of Defence of Ukraine}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/cidtd-mod-ua/slim-orca-200k-translated} } ``` # Citation from original SlimOrca ```bibtex @misc{SlimOrca, title = {SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification}, author = {Wing Lian and Guan Wang and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"}, year = {2023}, publisher = {HuggingFace}, url = {https://huggingface.co/Open-Orca/SlimOrca} } ``` ```bibtex @misc{mukherjee2023orca, title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4}, author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah}, year={2023}, eprint={2306.02707}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ```bibtex @misc{longpre2023flan, title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning}, author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts}, year={2023}, eprint={2301.13688}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```
提供机构:
cidtd-mod-ua
原始信息汇总

Slim Orca(Deduped) Translated to Ukrainian 🇺🇦 数据集概述

数据集描述

该数据集是一个包含超过350,000条记录的乌克兰语数据集,由SlimOrca数据集翻译而来。适用于多种自然语言处理任务。

数据集信息

  • 特征:
    • instruction: 字符串类型
    • input: 字符串类型
    • response: 字符串类型
  • 分割:
    • train: 包含351,979个样本,总字节数为819,383,847
  • 下载大小: 386,064,112字节
  • 数据集大小: 819,383,847字节
  • 配置:
    • default配置包含训练数据文件,路径为data/train-*
  • 语言: 乌克兰语
  • 大小类别: 100K < n < 1M
  • 许可证: MIT

免责声明

在使用数据前请准备好数据。文本中存在一些错误,请谨慎处理。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作