SkyPile-150B
收藏始智AI2023-12-07 更新2024-03-04 收录
下载链接:
https://wisemodel.cn/datasets/Skywork/SkyPile-150B
下载链接
链接失效反馈官方服务:
资源简介:
SkyPile-150B is a comprehensive, large-scale Chinese dataset specifically designed for the pre-training of large language models. It is derived from a broad array of publicly accessible Chinese Internet web pages.The publicly accessible portion of the SkyPile-150B dataset encompasses approximately 233 million unique web pages, includes approximately 150 billion tokens and 620 gigabytes of plain text data.
提供机构:
始智AI
创建时间:
2023-12-07



