camel-ai/biology

Name: camel-ai/biology
Creator: camel-ai
Published: 2023-05-23 21:11:56
License: 暂无描述

Hugging Face2023-05-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/camel-ai/biology

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en tags: - instruction-finetuning pretty_name: CAMEL Biology task_categories: - text-generation arxiv: 2303.17760 extra_gated_prompt: "By using this data, you acknowledge and agree to utilize it solely for research purposes, recognizing that the dataset may contain inaccuracies due to its artificial generation through ChatGPT." extra_gated_fields: Name: text Email: text I will adhere to the terms and conditions of this dataset: checkbox --- # **CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society** - **Github:** https://github.com/lightaime/camel - **Website:** https://www.camel-ai.org/ - **Arxiv Paper:** https://arxiv.org/abs/2303.17760 ## Dataset Summary Biology dataset is composed of 20K problem-solution pairs obtained using gpt-4. The dataset problem-solutions pairs generating from 25 biology topics, 25 subtopics for each topic and 32 problems for each "topic,subtopic" pairs. We provide the data in `biology.zip`. ## Data Fields **The data fields for files in `biology.zip` are as follows:** * `role_1`: assistant role * `topic`: biology topic * `sub_topic`: biology subtopic belonging to topic * `message_1`: refers to the problem the assistant is asked to solve. * `message_2`: refers to the solution provided by the assistant. **Download in python** ``` from huggingface_hub import hf_hub_download hf_hub_download(repo_id="camel-ai/biology", repo_type="dataset", filename="biology.zip", local_dir="datasets/", local_dir_use_symlinks=False) ``` ### Citation ``` @misc{li2023camel, title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society}, author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem}, year={2023}, eprint={2303.17760}, archivePrefix={arXiv}, primaryClass={cs.AI} } ``` ## Disclaimer: This data was synthetically generated by GPT4 and might contain incorrect information. The dataset is there only for research purposes. --- license: cc-by-nc-4.0 ---

提供机构：

camel-ai

原始信息汇总

数据集概述

数据集名称

名称: CAMEL Biology
任务类别: 文本生成

数据集内容

组成: 20,000个问题-解决方案对
来源: 使用gpt-4生成
主题: 25个生物学主题，每个主题包含25个子主题，每个“主题-子主题”对包含32个问题

数据文件

文件: biology.zip
数据字段:
- role_1: 助手角色
- topic: 生物学主题
- sub_topic: 属于主题的生物学子主题
- message_1: 助手被要求解决的问题
- message_2: 助手提供的解决方案

许可证

许可证: CC-BY-NC-4.0

下载方法

python from huggingface_hub import hf_hub_download hf_hub_download(repo_id="camel-ai/biology", repo_type="dataset", filename="biology.zip", local_dir="datasets/", local_dir_use_symlinks=False)

引用信息

@misc{li2023camel, title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society}, author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem}, year={2023}, eprint={2303.17760}, archivePrefix={arXiv}, primaryClass={cs.AI} }

免责声明

数据集由GPT4合成生成，可能包含错误信息，仅供研究使用。

5,000+

优质数据集

54 个

任务类型

进入经典数据集