Ba2han/pt-1501-tokenized-2.1
收藏Hugging Face2026-01-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/pt-1501-tokenized-2.1
下载链接
链接失效反馈官方服务:
资源简介:
---
tags:
- tokenized
- llama-3.2
---
# Tokenized Dataset: pt-1501-tokenized-2
**Base Tokenizer:** `unsloth/Llama-3.2-1B`
**Max Length:** `4000`
**Last Update:** 2026-01-15 12:13:30
## Statistics
| Metric | Count |
| :--- | :--- |
| Total Input Rows | 1,377,533 |
| Deduplicated (Dropped) | 124,266 |
| Final Rows Kept | 1,253,173 |
| **Total Tokens** | **475,705,430** |
## Dataset Breakdown
| Dataset Source | Tokens Contributed |
| :--- | :--- |
| formatted-libre-tr.parquet | 3,515,108 |
| merged_books3.parquet | 51,302,806 |
| chunked_books.parquet | 420,887,516 |
提供机构:
Ba2han



