SciPhi/textbooks-are-all-you-need-lite

Name: SciPhi/textbooks-are-all-you-need-lite
Creator: SciPhi
Published: 2023-09-30 21:57:36
License: 暂无描述

Hugging Face2023-09-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/SciPhi/textbooks-are-all-you-need-lite

下载链接

链接失效反馈

官方服务：

资源简介：

随着大型语言模型（LLMs）的发展，我们可以创建一个完全开源的亚历山大图书馆。作为首次尝试，我们生成了650,000个独特的教科书样本，涵盖了从幼儿园到研究生院的多种课程。这些样本是开源的，可能遵循Llama-2许可证。它们是通过SciPhi仓库生成的。所有样本都是使用TheBloke/Phind-CodeLlama-34B-v2-AWQ模型创建的。最后，感谢Runpod提供的GPU时间，使得这一切成为可能。

With the advancement of Large Language Models (LLMs), it is now feasible to construct a fully open-source Library of Alexandria. As our initial attempt, we generated 650,000 unique textbook samples covering a wide range of curricula from kindergarten through graduate school. These samples are open-source and are likely licensed under the Llama-2 license. They were generated via the SciPhi repository. All samples were created using the TheBloke/Phind-CodeLlama-34B-v2-AWQ model. Finally, we would like to express our gratitude to Runpod for providing the GPU resources that made this work possible.

提供机构：

SciPhi

原始信息汇总

数据集概述

数据集特征

formatted_prompt: 数据类型为字符串
completion: 数据类型为字符串
first_task: 数据类型为字符串
second_task: 数据类型为字符串
last_task: 数据类型为字符串
notes: 数据类型为字符串
title: 数据类型为字符串
model: 数据类型为字符串
temperature: 数据类型为浮点数（float64）

数据集拆分

train:
- 数据量: 3175095649 字节
- 示例数量: 681845

数据集大小

下载大小: 1280399468 字节
数据集大小: 3175095649 字节

配置

config_name: default
data_files:
- split: train
- path: data/train-*

许可证

许可证类型: llama2

5,000+

优质数据集

54 个

任务类型

进入经典数据集