camel-ai/biology
收藏Hugging Face2023-05-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/camel-ai/biology
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
tags:
- instruction-finetuning
pretty_name: CAMEL Biology
task_categories:
- text-generation
arxiv: 2303.17760
extra_gated_prompt: "By using this data, you acknowledge and agree to utilize it solely for research purposes, recognizing that the dataset may contain inaccuracies due to its artificial generation through ChatGPT."
extra_gated_fields:
Name: text
Email: text
I will adhere to the terms and conditions of this dataset: checkbox
---
# **CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society**
- **Github:** https://github.com/lightaime/camel
- **Website:** https://www.camel-ai.org/
- **Arxiv Paper:** https://arxiv.org/abs/2303.17760
## Dataset Summary
Biology dataset is composed of 20K problem-solution pairs obtained using gpt-4. The dataset problem-solutions pairs generating from 25 biology topics, 25 subtopics for each topic and 32 problems for each "topic,subtopic" pairs.
We provide the data in `biology.zip`.
## Data Fields
**The data fields for files in `biology.zip` are as follows:**
* `role_1`: assistant role
* `topic`: biology topic
* `sub_topic`: biology subtopic belonging to topic
* `message_1`: refers to the problem the assistant is asked to solve.
* `message_2`: refers to the solution provided by the assistant.
**Download in python**
```
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="camel-ai/biology", repo_type="dataset", filename="biology.zip",
local_dir="datasets/", local_dir_use_symlinks=False)
```
### Citation
```
@misc{li2023camel,
title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society},
author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem},
year={2023},
eprint={2303.17760},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
## Disclaimer:
This data was synthetically generated by GPT4 and might contain incorrect information. The dataset is there only for research purposes.
---
license: cc-by-nc-4.0
---
提供机构:
camel-ai
原始信息汇总
数据集概述
数据集名称
- 名称: CAMEL Biology
- 任务类别: 文本生成
数据集内容
- 组成: 20,000个问题-解决方案对
- 来源: 使用gpt-4生成
- 主题: 25个生物学主题,每个主题包含25个子主题,每个“主题-子主题”对包含32个问题
数据文件
- 文件:
biology.zip - 数据字段:
role_1: 助手角色topic: 生物学主题sub_topic: 属于主题的生物学子主题message_1: 助手被要求解决的问题message_2: 助手提供的解决方案
许可证
- 许可证: CC-BY-NC-4.0
下载方法
python from huggingface_hub import hf_hub_download hf_hub_download(repo_id="camel-ai/biology", repo_type="dataset", filename="biology.zip", local_dir="datasets/", local_dir_use_symlinks=False)
引用信息
@misc{li2023camel, title={CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society}, author={Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem}, year={2023}, eprint={2303.17760}, archivePrefix={arXiv}, primaryClass={cs.AI} }
免责声明
- 数据集由GPT4合成生成,可能包含错误信息,仅供研究使用。



