turkish-nlp-suite/temiz-OSCAR
收藏Hugging Face2025-11-03 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/turkish-nlp-suite/temiz-OSCAR
下载链接
链接失效反馈官方服务:
资源简介:
Temiz OSCAR是一个由原始OSCAR语料库的清洁版本组成的语料库集合,包含OSCAR-2019、OSCAR-2109、OSCAR-2201和OSCAR-2301四个数据集。这些数据集是从互联网上爬取的文本,经过多项标准清洗和内容过滤。它是土耳其大型语料库Bella Turca的一部分,用于语言模型的训练。
Temiz OSCAR is a collection of corpora consisting of cleaned versions of the original OSCAR corpora, including datasets OSCAR-2019, OSCAR-2109, OSCAR-2201, and OSCAR-2301. These datasets are web texts crawled from the internet, cleaned by various criteria, and filtered for content. It is a part of the large-scale Turkish corpus Bella Turca, used for training language models.
提供机构:
turkish-nlp-suite



