five

kuotient/Verified-Camel-KO

收藏
Hugging Face2023-11-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kuotient/Verified-Camel-KO
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - conversational - question-answering - text-generation language: - ko tags: - Physics - Biology - Math - Chemistry - Culture - Logic pretty_name: Verified-Camel-KO size_categories: - n<1K --- ## Verified-Camel-KO 이 데이터셋은 https://huggingface.co/datasets/LDJnr/Verified-Camel 의 한국어 번역입니다. GPT4 Turbo로 번역한 뒤, 약간의 수정을 거쳤습니다. 이 데이터에 대한 방침은 전부 원 저자의 방침을 따릅니다. ## This is the Official Verified Camel dataset. Just over 100 verified examples, and many more coming soon! - Comprised of over 100 highly filtered and curated examples from specific portions of CamelAI stem datasets. - These examples are verified to be true by experts in the specific related field, with atleast a bachelors degree in the subject. - Roughly 30-40% of the originally curated data from CamelAI was found to have atleast minor errors and/or incoherent questions(as determined by experts in said field) ## Purpose? - This dataset is not intended to be trained on by itself(besides perhaps interesting research purposes) however, the size and quality of this dataset can work wonderfully as a supplemmentary addition to virtually any multi-turn compatible dataset. I encourage this use, all I ask is proper credits given for such! ## Quality filtering and cleaning. - Extensive cleaning was done to make sure there is no possible instances of overt AI moralizing or related behaviour, such as "As an AI language model" and "September 2021" - This was done for the initial curation due to the responses being originally created by GPT-4. ## Future Plans & How you can help! This is a relatively early build amongst the grand plans for the future of what I plan to work on! In the near future we plan on leveraging the help of even more domain specific expert volunteers to eliminate any mathematically/verifiably incorrect answers from training curations of different types of datasets. If you have at-least a bachelors in mathematics, physics, biology or chemistry and would like to volunteer even just 30 minutes of your expertise time, please contact LDJ on discord!
提供机构:
kuotient
原始信息汇总

数据集概述

基本信息

  • 许可证: Apache-2.0
  • 任务类别:
    • 对话
    • 问答
    • 文本生成
  • 语言: 韩语(ko)
  • 标签:
    • 物理学
    • 生物学
    • 数学
    • 化学
    • 文化
    • 逻辑
  • 数据集名称: Verified-Camel-KO
  • 数据集大小: 小于1000条记录(n<1K)

数据集描述

  • 来源: 该数据集是Verified-Camel的韩语翻译版本。
  • 翻译与校正: 使用GPT4 Turbo进行翻译,并进行了轻微的修改。
  • 数据准确性: 包含超过100个经过专家验证的示例,这些专家至少拥有相关领域的学士学位。
  • 数据质量: 约30-40%的原始数据被发现存在至少轻微的错误或不连贯的问题。
  • 使用目的: 主要作为补充数据集,用于增强其他多轮对话兼容数据集的质量。
  • 数据清洗: 进行了彻底的清洗,确保没有AI道德化或相关行为的实例。

未来计划

  • 计划利用更多领域专家的志愿者帮助,从不同类型的数据集训练中消除数学上/可验证的错误答案。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作