five

Lumia101/Raw-C4-ko-500MT

收藏
Hugging Face2026-04-12 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lumia101/Raw-C4-ko-500MT
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-generation language: - ko size_categories: - 100K<n<1M --- # ⚠️ Warning ⚠️ This dataset was created by the owner of this repository to validate the capabilities of [Nari-C4-ko-500MT.](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT) It is **not** recommended to use this for LLM training. # Lumia101/Raw-C4-ko-500MT This dataset is a dataset from which 500M tokens were extracted from the [allenai/c4](https://huggingface.co/datasets/allenai/c4) without any additional filtering. This dataset was created to measure the effect of filtering applied to [Lumia101/Nari-C4-ko-500MT](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT), so if you intend to use this dataset for LLM training, it is **highly** recommended to use [Lumia101/Nari-C4-ko-500MT](https://huggingface.co/datasets/Lumia101/Nari-C4-ko-500MT) instead.
提供机构:
Lumia101
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作