saillab/alpaca_slovak_taco

Name: saillab/alpaca_slovak_taco
Creator: saillab
Published: 2024-09-20 22:08:45
License: 暂无描述

Hugging Face2024-09-20 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/saillab/alpaca_slovak_taco

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - sk pretty_name: Slovak alpaca-52k size_categories: - 100K<n<1M --- This repository contains the dataset used for the TaCo paper. The dataset follows the style outlined in the TaCo paper, as follows: ``` { "instruction": "instruction in xx", "input": "input in xx", "output": "Instruction in English: instruction in en , Response in English: response in en , Response in xx: response in xx " } ``` Please refer to the paper for more details: [OpenReview](https://openreview.net/forum?id=02MLWBj8HP) If you have used our dataset, please cite it as follows: **Citation** ``` @inproceedings{upadhayay2024taco, title={TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in {LLM}s through Translation-Assisted Chain-of-Thought Processes}, author={Bibek Upadhayay and Vahid Behzadan}, booktitle={5th Workshop on practical ML for limited/low resource settings, ICLR}, year={2024}, url={https://openreview.net/forum?id=02MLWBj8HP} } ``` The original dataset [(Alpaca-52K)](https://github.com/tatsu-lab/stanford_alpaca?tab=readme-ov-file#data-release) was translated using Google Translate. **Copyright and Intended Use** This dataset has been released under CC BY-NC, intended for academic and research purposes only. Please review the licenses and terms and conditions of Alpaca-52K, Dolly-15K, and Google Cloud Translation before using this dataset for any purpose other than research.

提供机构：

saillab

原始信息汇总

数据集概述

数据特征

instruction: 数据类型为字符串。
input: 数据类型为字符串。
output: 数据类型为字符串。
id: 数据类型为字符串。
text: 数据类型为字符串。

数据分割

train:
- 字节数: 181618850.24207285
- 样本数: 49601
test:
- 字节数: 45407458.757927164
- 样本数: 12401

数据大小

下载大小: 119457139 字节
数据集大小: 227026309.0 字节

配置

config_name: default
- data_files:
  - train: data/train-*
  - test: data/test-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集