NbAiLab/bifrost-translation-source-classifier-dataset
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/NbAiLab/bifrost-translation-source-classifier-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Bifrost翻译源分类器数据集的训练数据。包含英文文本,标注了它们最初翻译自的语言,以及作为对照类的原生英文文本。所有文本均为英文,标签表示文本翻译自的源语言(或原生英文的en)。分类器通过学习可以检测出原始语言的文化和风格痕迹。数据集包含180种语言,每种语言的训练样本为10,000个,验证和测试样本各1,000个,总样本数为2,160,000个。
Training data for the Bifrost translation-source classifier. Contains English texts labeled by the language they were originally translated from, plus natively written English as a control class. All texts are in English. The label indicates the source language the text was translated from (or en for natively written English). The classifier learns to detect cultural and stylistic traces of the original language. The dataset includes 180 languages, with 10,000 train samples, 1,000 validation samples, and 1,000 test samples per language, totaling 2,160,000 samples.
提供机构:
NbAiLab



