Ba2han/merged_long-mix-031125
收藏Hugging Face2025-11-03 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/merged_long-mix-031125
下载链接
链接失效反馈官方服务:
资源简介:
合并长混合数据集(031125)是由多个土耳其-英语双语高质量语料库和单语语料库合并而成的。它包含了经过清洗的维基百科翻译数据、高质量英土句子对、以及从不同来源精选的文本数据。数据集中的每对双语文本有四种随机格式供选择,并且对文本长度进行了过滤,确保文本长度在250到8500个字符之间。
The Merged Long Mix (031125) dataset is a combination of several Turkish-English bilingual and high-quality monolingual corpora. It includes cleaned Wikipedia translation data, high-quality English-Turkish sentence pairs, and selected text data from various sources. Each bilingual pair in the dataset is available in one of four random formats, and the texts have been filtered to ensure they are between 250 and 8500 characters in length.
提供机构:
Ba2han



