epfl-dlab/zip2zip-1B-no-split
收藏Hugging Face2025-03-10 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/epfl-dlab/zip2zip-1B-no-split
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含多个子数据集的综合数据集,每个子数据集针对不同的应用场景,如聊天、代码、知识、数学和多语言。每个子数据集都包含了文本内容和相关的特征信息,如token数量和数据来源。数据集以训练集的形式提供,包含了大量的文本数据,适用于各种自然语言处理任务。
This dataset is a comprehensive collection of sub-datasets, each tailored for different application scenarios such as chat, code, knowledge, math, and multilingual. Each sub-dataset includes text content and related feature information such as token count and data source. The dataset is provided in the form of a training set, containing a large volume of text data suitable for various natural language processing tasks.
提供机构:
epfl-dlab



