emozilla/dolma-v1_7-30B
收藏Hugging Face2024-05-23 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/emozilla/dolma-v1_7-30B
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是Dolma v1.7的1%样本,包含大约300亿个标记,并直接上传为Hugging Face数据集。作为一个纯样本,它保持了ODC-BY许可证。
This dataset is a 1% sample of Dolma v1.7, containing approximately 3 billion tokens. It is used for text generation tasks, supports English language models, and belongs to the categories of language modeling and large language models. The dataset size is below 10 billion. Additionally, it follows the ODC-BY license.
提供机构:
emozilla



