传思语源多语种双语对照平行语料
收藏杭州数据交易所2025-05-14 更新2025-05-15 收录
下载链接:
https://mall.hzdex.cn/data-exchange/118201000164005?from=/data-exchange
下载链接
链接失效反馈官方服务:
资源简介:
包含100个语种、30个专业领域的海量双语平行语料,可直接用于人工智能自然语言处理的研发和应用、机器翻译引擎研发、大模型研发的训练数据、语音合成训练数据,也可以用作外语教学及研究的翻译记忆库,计算机辅助翻译(CAT)记忆库及高校各专业领域的外文教育教学和研究、大数据管理教育教学及研究等。
A massive parallel bilingual corpus covering 100 languages and 30 professional domains. It can be directly utilized as training data for the research, development and deployment of artificial intelligence (AI) natural language processing (NLP) technologies, the development of machine translation engines, the training of large language models (LLMs), as well as speech synthesis. Additionally, it can serve as a translation memory for foreign language teaching and research, a computer-aided translation (CAT) memory, and support foreign language education, teaching and research across various professional fields in colleges and universities, as well as education, teaching and research related to big data management.
提供机构:
云南传思科技有限公司
创建时间:
2025-05-01
搜集汇总
数据集介绍

背景与挑战
背景概述
传思语源多语种双语对照平行语料是一个海量的双语平行文本句子对数据集,覆盖100多个语种和30个专业领域,数据量达1亿句对,适用于自然语言处理、机器翻译和大模型训练等多个应用场景。
以上内容由遇见数据集搜集并总结生成



