SciPhi/textbooks-are-all-you-need-lite
收藏Hugging Face2023-09-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/SciPhi/textbooks-are-all-you-need-lite
下载链接
链接失效反馈官方服务:
资源简介:
随着大型语言模型(LLMs)的发展,我们可以创建一个完全开源的亚历山大图书馆。作为首次尝试,我们生成了650,000个独特的教科书样本,涵盖了从幼儿园到研究生院的多种课程。这些样本是开源的,可能遵循Llama-2许可证。它们是通过SciPhi仓库生成的。所有样本都是使用TheBloke/Phind-CodeLlama-34B-v2-AWQ模型创建的。最后,感谢Runpod提供的GPU时间,使得这一切成为可能。
With the advancement of Large Language Models (LLMs), it is now feasible to construct a fully open-source Library of Alexandria. As our initial attempt, we generated 650,000 unique textbook samples covering a wide range of curricula from kindergarten through graduate school. These samples are open-source and are likely licensed under the Llama-2 license. They were generated via the SciPhi repository. All samples were created using the TheBloke/Phind-CodeLlama-34B-v2-AWQ model. Finally, we would like to express our gratitude to Runpod for providing the GPU resources that made this work possible.
提供机构:
SciPhi
原始信息汇总
数据集概述
数据集特征
- formatted_prompt: 数据类型为字符串
- completion: 数据类型为字符串
- first_task: 数据类型为字符串
- second_task: 数据类型为字符串
- last_task: 数据类型为字符串
- notes: 数据类型为字符串
- title: 数据类型为字符串
- model: 数据类型为字符串
- temperature: 数据类型为浮点数(float64)
数据集拆分
- train:
- 数据量: 3175095649 字节
- 示例数量: 681845
数据集大小
- 下载大小: 1280399468 字节
- 数据集大小: 3175095649 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
许可证
- 许可证类型: llama2



