Abigail45/Jian
收藏Hugging Face2025-12-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Abigail45/Jian
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
pretty_name: Jian Dataset
size_categories:
- n>1T
language:
- en
- fr
- es
- it
- zh
- ar
- ru
- pt
tags:
- text
- code
- web-corpus
- multilingual
- large-scale
description: >-
Jian is a trillion-token-scale multilingual corpus of web text and code
designed for large language model pretraining.
---
Jian is a trillion-token-scale multilingual corpus of web text and code designed for large language model pretraining.
提供机构:
Abigail45



