Berkesule/COSMOS-Sentetic-Turkish-Corpus-2GB-Clean
收藏Hugging Face2025-07-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/Berkesule/COSMOS-Sentetic-Turkish-Corpus-2GB-Clean
下载链接
链接失效反馈官方服务:
资源简介:
COSMOS土耳其语语料库2GB清洁版是一个经过过滤和清理的语料库,从原始的COSMOS-Sentetic-Turkish-Corpus-2GB数据集中生成。这个数据集只包含了70到350个token之间的文本,同时修正了不完整的句子,并进行了质量检查。总共有1,473,078个样本,数据集使用Apache 2.0许可。
The COSMOS Turkish Corpus 2GB Clean is a filtered and cleaned version of the original COSMOS-Sentetic-Turkish-Corpus-2GB dataset. It includes only texts with 70 to 350 tokens, corrects incomplete sentences, and has undergone quality control. There are a total of 1,473,078 samples in the dataset, which is licensed under Apache 2.0.
提供机构:
Berkesule



