quickmt/quickmt-train.cs-en
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/quickmt/quickmt-train.cs-en
下载链接
链接失效反馈官方服务:
资源简介:
quickmt cs-en训练语料库是一个用于机器翻译的语料库,包含了多种来源的cs(捷克语)到en(英语)的双语数据。这些数据经过去重和基本过滤,包括commoncrawl、europarl、news_commentary等多个子集,时间跨度从2009年到2023年。数据集具有155452513个训练样本,占用了约32GB的存储空间。
The quickmt cs-en Training Corpus is a machine translation corpus containing bilingual data from various sources from cs (Czech) to en (English). These data have been deduplicated and basic filtered, including subsets such as commoncrawl, europarl, news_commentary, etc., covering a time span from 2009 to 2023. The dataset has 155452513 training samples and occupies about 32GB of storage space.
提供机构:
quickmt



