five

Gustrd/dolly-15k-hippo-translated-pt-12k

收藏
Hugging Face2023-08-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Gustrd/dolly-15k-hippo-translated-pt-12k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-3.0 language: - pt size_categories: - 10K<n<100K --- *Summary* databricks-dolly-15k ( https://huggingface.co/datasets/databricks/databricks-dolly-15k/ ) is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This translation into Portuguese was executed utilizing a technique from the HIPPO benchmark. By employing both LibreTranslate and MarianMT, a medium quality result was achieved, reflecting a carefully balanced approach. Further details and the underlying methodology can be found at the HIPPO GitHub repository ( https://github.com/gustrd/hippo ). It's an advance version of Gustrd/dolly-15k-libretranslate-pt ( https://huggingface.co/datasets/Gustrd/dolly-15k-libretranslate-pt ). This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License. Supported Tasks: Training LLMs Synthetic Data Generation Data Augmentation Languages: Portuguese Version: 1.0
提供机构:
Gustrd
原始信息汇总

数据集概述

基本信息

  • 名称: databricks-dolly-15k
  • 语言: 葡萄牙语 (pt)
  • 大小: 10K<n<100K
  • 许可证: Creative Commons Attribution-ShareAlike 3.0 Unported License (cc-by-sa-3.0)

描述

  • 内容: 包含数千名Databricks员工生成的指令遵循记录,涵盖行为类别如头脑风暴、分类、封闭式问答、生成、信息提取、开放式问答和总结。
  • 翻译方法: 使用HIPPO基准的技术,结合LibreTranslate和MarianMT实现中等质量的葡萄牙语翻译。

用途

  • 支持任务:
    • 训练大型语言模型 (LLMs)
    • 合成数据生成
    • 数据增强

版本

  • 版本: 1.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作