nlpai-lab/openassistant-guanaco-ko
收藏Hugging Face2023-06-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nlpai-lab/openassistant-guanaco-ko
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- question-answering
- summarization
language:
- ko
size_categories:
- 1K<n<10K
---
### Dataset Summary
Korean translation of Guanaco via the DeepL API
Note: There are cases where multilingual data has been converted to monolingual data during batch translation to Korean using the API.
Below is Guanaco's README.
----
This dataset is a subset of the Open Assistant dataset, which you can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main
This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples.
This dataset was used to train Guanaco with QLoRA.
For further information, please see the original dataset.
License: Apache 2.0
提供机构:
nlpai-lab
原始信息汇总
数据集概述
基本信息
- 许可证:Apache 2.0
- 任务类别:
- 文本生成
- 问答
- 摘要
- 语言:韩语(ko)
- 数据集大小:1K<n<10K
数据集描述
- 数据来源:该数据集是Open Assistant数据集的一个子集,原始数据集链接为:OpenAssistant/oasst1
- 数据内容:仅包含对话树中评分最高的路径,总计9,846个样本。
- 数据处理:通过DeepL API将多语言数据批量翻译为韩语,存在部分数据从多语言转换为单语言的情况。
- 用途:用于训练Guanaco模型,采用QLoRA方法。



