five

Lumia101/ko-perplexity-corpus

收藏
Hugging Face2026-04-12 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lumia101/ko-perplexity-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 configs: - config_name: news data_files: - split: train path: news/train-* - config_name: web data_files: - split: train path: web/train-* - config_name: wiki data_files: - split: train path: wiki/train-* dataset_info: - config_name: news features: - name: id dtype: string - name: source dtype: string - name: text dtype: string splits: - name: train num_bytes: 238345522 num_examples: 80000 download_size: 125873821 dataset_size: 238345522 - config_name: web features: - name: id dtype: string - name: source dtype: string - name: text dtype: string splits: - name: train num_bytes: 377221799 num_examples: 80000 download_size: 217408740 dataset_size: 377221799 - config_name: wiki features: - name: id dtype: string - name: source dtype: string - name: text dtype: string splits: - name: train num_bytes: 159395797 num_examples: 80000 download_size: 93549926 dataset_size: 159395797 task_categories: - text-generation language: - ko size_categories: - 100K<n<1M --- # Lumia101/ko-perplexity-corpus This dataset was created to measure the perplexity of an LLM trained on a Korean dataset. # Dataset Source * [HAERAE-HUB/KOREAN-WEBTEXT](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) * [maxidl/FineNews-unfiltered](https://huggingface.co/datasets/maxidl/FineNews-unfiltered) * [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
提供机构:
Lumia101
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作