HAERAE-HUB/KOREAN-SyntheticText-1.5B
收藏Hugging Face2024-07-22 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B
下载链接
链接失效反馈官方服务:
资源简介:
KOREAN-SyntheticText数据集是KOREAN-WEBTEXT项目的后续,旨在创建高质量的韩语语料库。该数据集包含1.4B个标记,使用了100B+的开源LLM进行微调,专门用于文本生成,且尚未进行过滤。
KOREAN-SyntheticText is a successor of the KOREAN-WEBTEXT project, containing 1.4B tokens generated over 600 H100 hours following the Cosmopedia project. The dataset was generated using a 100B+ open-source LLM fine-tuned for text generation, with no filtering done yet.
提供机构:
HAERAE-HUB



