five

saillab/alpaca_ewe_taco

收藏
Hugging Face2024-09-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/saillab/alpaca_ewe_taco
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ee pretty_name: Ewe alpaca-52k size_categories: - 100K<n<1M --- This repository contains the dataset used for the TaCo paper. The dataset follows the style outlined in the TaCo paper, as follows: ``` { "instruction": "instruction in xx", "input": "input in xx", "output": "Instruction in English: instruction in en , Response in English: response in en , Response in xx: response in xx " } ``` Please refer to the paper for more details: [OpenReview](https://openreview.net/forum?id=02MLWBj8HP) If you have used our dataset, please cite it as follows: **Citation** ``` @inproceedings{upadhayay2024taco, title={TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in {LLM}s through Translation-Assisted Chain-of-Thought Processes}, author={Bibek Upadhayay and Vahid Behzadan}, booktitle={5th Workshop on practical ML for limited/low resource settings, ICLR}, year={2024}, url={https://openreview.net/forum?id=02MLWBj8HP} } ``` The original dataset [(Alpaca-52K)](https://github.com/tatsu-lab/stanford_alpaca?tab=readme-ov-file#data-release) was translated using Google Translate. **Copyright and Intended Use** This dataset has been released under CC BY-NC, intended for academic and research purposes only. Please review the licenses and terms and conditions of Alpaca-52K, Dolly-15K, and Google Cloud Translation before using this dataset for any purpose other than research.
提供机构:
saillab
原始信息汇总

数据集概述

数据集特征

  • instruction: 数据类型为字符串
  • input: 数据类型为字符串
  • output: 数据类型为字符串
  • id: 数据类型为字符串
  • text: 数据类型为字符串

数据集分割

  • 训练集 (train):
    • 示例数量: 49601
    • 数据大小: 187094534.40529338 字节
  • 测试集 (test):
    • 示例数量: 12401
    • 数据大小: 46776462.594706625 字节

数据集大小

  • 下载大小: 114482103 字节
  • 数据集总大小: 233870997.0 字节

数据文件配置

  • 默认配置 (default):
    • 训练集路径: data/train-*
    • 测试集路径: data/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作