Lumia101/Raw-C4-ko-500MT
收藏Hugging Face2026-04-12 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lumia101/Raw-C4-ko-500MT
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
language:
- ko
size_categories:
- 100K<n<1M
---
# ⚠️ Warning ⚠️
This dataset was created by the owner of this repository to validate the capabilities of [Nari-C4-ko-500MT.](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT) It is **not** recommended to use this for LLM training.
# Lumia101/Raw-C4-ko-500MT
This dataset is a dataset from which 500M tokens were extracted from the [allenai/c4](https://huggingface.co/datasets/allenai/c4) without any additional filtering.
This dataset was created to measure the effect of filtering applied to [Lumia101/Nari-C4-ko-500MT](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT), so if you intend to use this dataset for LLM training, it is **highly** recommended to use [Lumia101/Nari-C4-ko-500MT](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT) instead.
提供机构:
Lumia101



