keeve101/balanced-multi-corpora-mt
收藏Hugging Face2025-04-01 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/keeve101/balanced-multi-corpora-mt
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多语言文本数据集,包含索引、源文件名、目标语言、英文文本和其他语言的文本信息。每个配置代表一种语言或方言,共有7种配置,分别是hi、id、ms、th、tl、vi和zh,每个配置的训练集都包含136026个样本。数据集主要用于训练机器翻译或其他自然语言处理任务。
This dataset is a multilingual text dataset containing index, source filename, target language, English text, and text in other languages. Each configuration represents a language or dialect, with a total of 7 configurations: hi, id, ms, th, tl, vi, and zh. Each configurations training set contains 136,026 samples. The dataset is primarily used for training machine translation or other natural language processing tasks.
提供机构:
keeve101



