Ba2han/pt-1501-tokenized-2
收藏Hugging Face2026-01-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/pt-1501-tokenized-2
下载链接
链接失效反馈官方服务:
资源简介:
---
tags:
- tokenized
- llama-3.2
---
# Tokenized Dataset: pt-1501-tokenized-2.2
**Base Tokenizer:** `unsloth/Llama-3.2-1B`
**Max Length:** `4000`
**Last Update:** 2026-01-15 17:45:17
## Statistics
| Metric | Count |
| :--- | :--- |
| Total Input Rows | 3,414,026 |
| Deduplicated (Dropped) | 16,052 |
| Final Rows Kept | 3,188,409 |
| **Total Tokens** | **2,432,321,066** |
## Dataset Breakdown
| Dataset Source | Tokens Contributed |
| :--- | :--- |
| OpenDataArena/ODA-Mixture-100k | 27,505,162 |
| codelion/fineweb-edu-1B | 738,481,203 |
| sumukshashidhar-archive/Ultra-FineWeb-1B | 611,893,386 |
| Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b | 84,305,590 |
| Ba2han/Basic-Math_TR | 97,286,447 |
| Translations-Mix (1M) | 872,849,278 |
提供机构:
Ba2han



