Gustrd/dolly-15k-hippo-translated-pt-12k
收藏Hugging Face2023-08-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Gustrd/dolly-15k-hippo-translated-pt-12k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-3.0
language:
- pt
size_categories:
- 10K<n<100K
---
*Summary*
databricks-dolly-15k ( https://huggingface.co/datasets/databricks/databricks-dolly-15k/ ) is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.
This translation into Portuguese was executed utilizing a technique from the HIPPO benchmark. By employing both LibreTranslate and MarianMT, a medium quality result was achieved, reflecting a carefully balanced approach. Further details and the underlying methodology can be found at the HIPPO GitHub repository ( https://github.com/gustrd/hippo ). It's an advance version of Gustrd/dolly-15k-libretranslate-pt ( https://huggingface.co/datasets/Gustrd/dolly-15k-libretranslate-pt ).
This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.
Supported Tasks:
Training LLMs
Synthetic Data Generation
Data Augmentation
Languages: Portuguese
Version: 1.0
提供机构:
Gustrd
原始信息汇总
数据集概述
基本信息
- 名称: databricks-dolly-15k
- 语言: 葡萄牙语 (pt)
- 大小: 10K<n<100K
- 许可证: Creative Commons Attribution-ShareAlike 3.0 Unported License (cc-by-sa-3.0)
描述
- 内容: 包含数千名Databricks员工生成的指令遵循记录,涵盖行为类别如头脑风暴、分类、封闭式问答、生成、信息提取、开放式问答和总结。
- 翻译方法: 使用HIPPO基准的技术,结合LibreTranslate和MarianMT实现中等质量的葡萄牙语翻译。
用途
- 支持任务:
- 训练大型语言模型 (LLMs)
- 合成数据生成
- 数据增强
版本
- 版本: 1.0



