Extrinsic Evaluation Dataset for Automatic Sentence Alignment
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=aec0de86890e4614afdba0b02510752e
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of Chinese subtitles and their Vietnamese translations downloaded from www.iq.com, which can be used for the extrinsic evaluation of automatic sentence alignment. There are three sub-corpora in the dataset: the training data SUB-Train and the validation and test data Sub-Dev and Sub-Test for the training and evaluation of machine translation systems.
本数据集包含从爱奇艺官网(www.iq.com)下载的中文字幕及其越南语译文,可用于自动句子对齐的外部评测。该数据集共包含三个子语料库:用于机器翻译系统训练与评估的训练数据集SUB-Train、验证数据集Sub-Dev以及测试数据集Sub-Test。
提供机构:
Science Data Bank
创建时间:
2023-11-10



