saillab/alpaca_slovak_taco
收藏Hugging Face2024-09-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/saillab/alpaca_slovak_taco
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- sk
pretty_name: Slovak alpaca-52k
size_categories:
- 100K<n<1M
---
This repository contains the dataset used for the TaCo paper.
The dataset follows the style outlined in the TaCo paper, as follows:
```
{
"instruction": "instruction in xx",
"input": "input in xx",
"output": "Instruction in English: instruction in en ,
Response in English: response in en ,
Response in xx: response in xx "
}
```
Please refer to the paper for more details: [OpenReview](https://openreview.net/forum?id=02MLWBj8HP)
If you have used our dataset, please cite it as follows:
**Citation**
```
@inproceedings{upadhayay2024taco,
title={TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in {LLM}s through Translation-Assisted Chain-of-Thought Processes},
author={Bibek Upadhayay and Vahid Behzadan},
booktitle={5th Workshop on practical ML for limited/low resource settings, ICLR},
year={2024},
url={https://openreview.net/forum?id=02MLWBj8HP}
}
```
The original dataset [(Alpaca-52K)](https://github.com/tatsu-lab/stanford_alpaca?tab=readme-ov-file#data-release) was translated using Google Translate.
**Copyright and Intended Use**
This dataset has been released under CC BY-NC, intended for academic and research purposes only. Please review the licenses and terms and conditions of Alpaca-52K, Dolly-15K, and Google Cloud Translation before using this dataset for any purpose other than research.
提供机构:
saillab
原始信息汇总
数据集概述
数据特征
- instruction: 数据类型为字符串。
- input: 数据类型为字符串。
- output: 数据类型为字符串。
- id: 数据类型为字符串。
- text: 数据类型为字符串。
数据分割
- train:
- 字节数: 181618850.24207285
- 样本数: 49601
- test:
- 字节数: 45407458.757927164
- 样本数: 12401
数据大小
- 下载大小: 119457139 字节
- 数据集大小: 227026309.0 字节
配置
- config_name: default
- data_files:
- train: data/train-*
- test: data/test-*
- data_files:



