jablonkagroup/chempile-instruction
收藏Hugging Face2025-07-30 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/jablonkagroup/chempile-instruction
下载链接
链接失效反馈官方服务:
资源简介:
ChemPile-Instruction是一个专为化学领域的大型语言模型(LLM)指令微调设计的文本数据集。它包含高质量的对话和多样的推理任务,分为三个子集:chempile-education、chempile-paper-100m和chempile-reasoning,分别针对不同水平和需求的用户。数据集在CC BY 4.0许可证下发布,是ChemPile项目的一部分。数据集的特征包括对话消息、所需技能、化学子领域和生成元数据。README还介绍了数据集的生成策略、质量控制措施和潜在用途。最后,它列出了数据集的局限性、数据处理流程以及如何在研究中引用数据集。
ChemPile-Instruction is a text-only dataset designed for instruction tuning of Large Language Models (LLMs) in the field of chemistry. It contains high-quality multi-turn conversations, each rephrased from different educational, scientific, and reasoning sources using diverse prompting strategies. The dataset is structured into three subsets: chempile-education, chempile-paper-100m, and chempile-reasoning, each with its own focus and source material. The dataset is licensed under CC BY 4.0 and is part of the larger ChemPile collection. The dataset features are described, including conversation messages, required skills, chemistry subdomains, and generation metadata. The README also provides information on the datasets generation strategy, quality control measures, and potential use cases. Finally, it includes details on the datasets limitations, data processing pipeline, and how to cite the dataset in research.
提供机构:
jablonkagroup



