kuotient/Verified-Camel-KO
收藏Hugging Face2023-11-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kuotient/Verified-Camel-KO
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- conversational
- question-answering
- text-generation
language:
- ko
tags:
- Physics
- Biology
- Math
- Chemistry
- Culture
- Logic
pretty_name: Verified-Camel-KO
size_categories:
- n<1K
---
## Verified-Camel-KO
이 데이터셋은 https://huggingface.co/datasets/LDJnr/Verified-Camel 의 한국어 번역입니다.
GPT4 Turbo로 번역한 뒤, 약간의 수정을 거쳤습니다.
이 데이터에 대한 방침은 전부 원 저자의 방침을 따릅니다.
## This is the Official Verified Camel dataset. Just over 100 verified examples, and many more coming soon!
- Comprised of over 100 highly filtered and curated examples from specific portions of CamelAI stem datasets.
- These examples are verified to be true by experts in the specific related field, with atleast a bachelors degree in the subject.
- Roughly 30-40% of the originally curated data from CamelAI was found to have atleast minor errors and/or incoherent questions(as determined by experts in said field)
## Purpose?
- This dataset is not intended to be trained on by itself(besides perhaps interesting research purposes) however, the size and quality of this dataset can work wonderfully as a supplemmentary addition to virtually any multi-turn compatible dataset. I encourage this use, all I ask is proper credits given for such!
## Quality filtering and cleaning.
- Extensive cleaning was done to make sure there is no possible instances of overt AI moralizing or related behaviour, such as "As an AI language model" and "September 2021"
- This was done for the initial curation due to the responses being originally created by GPT-4.
## Future Plans & How you can help!
This is a relatively early build amongst the grand plans for the future of what I plan to work on!
In the near future we plan on leveraging the help of even more domain specific expert volunteers to eliminate any mathematically/verifiably incorrect answers from training curations of different types of datasets.
If you have at-least a bachelors in mathematics, physics, biology or chemistry and would like to volunteer even just 30 minutes of your expertise time, please contact LDJ on discord!
提供机构:
kuotient
原始信息汇总
数据集概述
基本信息
- 许可证: Apache-2.0
- 任务类别:
- 对话
- 问答
- 文本生成
- 语言: 韩语(ko)
- 标签:
- 物理学
- 生物学
- 数学
- 化学
- 文化
- 逻辑
- 数据集名称: Verified-Camel-KO
- 数据集大小: 小于1000条记录(n<1K)
数据集描述
- 来源: 该数据集是Verified-Camel的韩语翻译版本。
- 翻译与校正: 使用GPT4 Turbo进行翻译,并进行了轻微的修改。
- 数据准确性: 包含超过100个经过专家验证的示例,这些专家至少拥有相关领域的学士学位。
- 数据质量: 约30-40%的原始数据被发现存在至少轻微的错误或不连贯的问题。
- 使用目的: 主要作为补充数据集,用于增强其他多轮对话兼容数据集的质量。
- 数据清洗: 进行了彻底的清洗,确保没有AI道德化或相关行为的实例。
未来计划
- 计划利用更多领域专家的志愿者帮助,从不同类型的数据集训练中消除数学上/可验证的错误答案。



