fzmnm/TinyStoriesAdv-zh
收藏Hugging Face2024-08-21 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/fzmnm/TinyStoriesAdv-zh
下载链接
链接失效反馈官方服务:
资源简介:
TinyStoriesAdv是一个约1B tokens的小学知识水平的大语言模型训练语料库,旨在提升模型的事实性知识、元认知、思维链、阅读理解RAG、逻辑推理等能力。数据集是众多数据集的集合,通过创新的提示词生成方法,生成了具有多样性和针对性的子数据集。数据集可以作为TinyStories数据集的替代品,供对人工智能感兴趣的爱好者和学生入门体验大模型的魔力。数据集覆盖了小学生的日常常识、小学百科全书以及小学语文课程内容,支持了阅读理解、问题回答等多种交互模态。使用本数据集,可以在100M参数规模下得到一个可以实现基本的小学生常识问答的大语言模型。
TinyStoriesAdv is a comprehensive large language model training corpus based on elementary school knowledge level, inspired by papers such as TinyStories and Phi2, containing various innovative prompt-generated sub-datasets with diversity and targeting. The dataset covers elementary school students daily common sense, elementary encyclopedias, and elementary Chinese curriculum content, supporting various interactive modalities such as reading comprehension and question answering. The construction method of the dataset includes multiple sub-datasets such as encyclopedias generated by GPT4o, tinystories_adv, chinese_class, math, tinygames, quizs, and tinybooks, aiming to enhance the models various capabilities, such as factual knowledge, metacognition, chain of thought, reading comprehension RAG, and logical reasoning.
提供机构:
fzmnm



